# Multiple Neurons

### Introduction

$z(x) = w_1x_1 + w_2x_2 + b$ as: 

$z(x) =  \begin{bmatrix}
w_1 & w_2 \\
\end{bmatrix}\cdot \begin{bmatrix}
x_1 \\
x_2 
\end{bmatrix} + b = (w \cdot x + b)$

In [1]:
import torch
# cell_area 2, cell_concavity 1
x = torch.tensor([2, 1])
             
w = torch.tensor([3, 4])

b = -8

w.dot(x) + b
# z = 3*2 + 4*1 - 8 = 2 -> neuron fires

tensor(2)

* But we want more neurons

<img src="./artificial-network.png" width="40%">

### Building a layer

Still with our **predicting cancer** example...


* one neuron that determines if a cell is overly large,
* and another neuron that determines if a cell has an abnormal shape.  

* Sort of a separation of concerns between each neuron, them operating side by side in a layer.  

* features of an observation

In [2]:
import torch
# area, perimeter, number concave points, symmetry error  
x = torch.tensor([4, 2, 3, 2]).float()

* Neuron for cells's size

In [3]:
# area, perimeter, number concave points, symmetry error  
w_size = torch.tensor([2, 3, -.5, 0])

b_size = -11.0

* Notice that we still have weights for all of the features, even though focus is on size

In [4]:
x.dot(w_size) + b_size

tensor(1.5000)

So this is our first neuron, and it is sensitive to changes in a neuron's size.

### A second neuron

* Neuron for shape 
* Our observation cell will still be the same, and so has the same four features with the same values:

> Copy same observation below.

In [5]:
# area, perimeter, number concave points, symmetry error  
x = torch.tensor([4., 2., 3., 2.])

* This time larger weights assocciated with shape

In [6]:
# area, perimeter, number concave points, symmetry error  
w_shape = torch.tensor([0, -.5, 3, 1.5])

b_shape = -14

In [7]:
x.dot(w_shape) + b_shape

tensor(-3.)

### Takeaways when using multiple neurons

#### 1. Different weights same input

* use the **same feature vector** $x$ as an input to each neuron.  The two neurons simply weigh these inputs differently.

$z_{size}(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
2 \\
3 \\
-.5 \\
0 \\
\end{bmatrix} - 12 = .5$

$z_{shape}(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} - 14 = 1.5$

> Or in Diagram

* Notice: the attributes of an observation $x_1$ through $x_4$, are fed to each of the (this time four, not two) neurons in the layer (each neuron is a white circle).  

<img src="./first_layer_2.svg" width="30%">

## Making it brief

* Previously two separate weight vectors

$g_{size}(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
1 \\
3 \\
-.5 \\
0 \\
\end{bmatrix} = 12.5$

$g_{shape}(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix}= 8.5$

But can combine these into a *matrix*:

$\begin{bmatrix}
2 & 4 & 3 & 1 \end{bmatrix} \cdot
\begin{bmatrix}
1 & 0\\
3 & -.5 \\
-.5 & 3 \\
0 & 1.5\end{bmatrix} = \begin{bmatrix}
12.5 & 8.5 \end{bmatrix}$

Or more generally: 

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_{1}  & w_{2} \\
|   & |
\end{bmatrix}  = \begin{bmatrix}
x \cdot w_{1} & x \cdot w_{2} \end{bmatrix}  = \begin{bmatrix} g_{1}(x) & g_{2}(x) \end{bmatrix}$

Let's show this in code.  We'll start by placing our feature vectors in a weight matrix $W$. 

In [19]:
w_size = torch.tensor([0., -.5, 3., 1.5])
w_shape = torch.tensor([1., 3., -.5, 0.])

# weight matrix
W = torch.stack((w_size, w_shape), dim = 0).T
W

tensor([[ 0.0000,  1.0000],
        [-0.5000,  3.0000],
        [ 3.0000, -0.5000],
        [ 1.5000,  0.0000]])

In [20]:
x @ w_size, x @ w_shape

(tensor(8.5000), tensor(12.5000))

In [21]:
result = x @ W
result

tensor([ 8.5000, 12.5000])

> **matrix:** A rectangular array of numbers arranged in rows and columns. 


In [16]:
W.shape

torch.Size([4, 2])

A vector, by contrast, consists of only one row or column, and it's size can be expressed with a single number.

In [17]:
x.shape

torch.Size([4])

### Adding the biases

Now above we've seen how we can calculate the weighted sum.

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_{1}  & w_{2} \\
|   & |
\end{bmatrix}  = \begin{bmatrix}
x \cdot w_{1} & x \cdot w_{2} \end{bmatrix}  = \begin{bmatrix} g_{1}(x) & g_{2}(x) \end{bmatrix}$


But we have still have not included the biases of our linear layer.  To complete our linear layer, we'll add a bias of $-12$ for $z_{size}$ and our bias of $-14$ for $z_{shape}$.  We do so by placing the two biases into a vector.

In [19]:
b = torch.tensor([-12, -7])

> So we have a bias term for each neuron, and collect them into a bias vector.

And then can calculate the output of passing an observation through a linear layer with the following:

In [103]:
x @ W + b

tensor([-1.0000,  1.5000])

Or summarizing the above, we can calculate the outputs of both linear functions with:

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

Which we can summarize as:

$z = x\cdot W  + b$

Where $z$ is a vector consisting of the output of each neuron's linear function.

In [74]:
x.dot(W) + b

array([0.5, 1.5])

> Your turn

Now consider that we have two different neurons that detect size and shape.  They do so with the following weights.

In [104]:
import torch

x = torch.tensor([2, 4, 3, 1])

w_size = torch.tensor([2, 0, 1, 2])
w_shape = torch.tensor([1, 1, 2, 0])

W_new = torch.stack((w_size, w_shape), dim = 0).T
W_new

tensor([[ 0.0000,  1.0000],
        [-0.5000,  3.0000],
        [ 3.0000, -0.5000],
        [ 1.5000,  0.0000]])

Try to calculate the output of $x \cdot W$, where $x$ is the following.

In [None]:
# write your answer here



Check your answer with the dot product below.

## Including the sigmoid activation layer

So above, we saw the output from the linear layer by performing matrix vector multiplication like so:

In [13]:
import torch.nn.functional as F

b = torch.tensor([-12, -7])

w_size = torch.tensor([0, -.5, 3, 1.5])
w_shape = torch.tensor([1, 3, -.5, 0])

# weight matrix
W = torch.stack((w_size, w_shape), dim = 0).T
W

z = x @ W + b
z

tensor([-1.0000,  1.5000])

And then, next we would pass these outputs to the sigmoid activation function like so:

In [15]:
torch.sigmoid(z)

tensor([0.2689, 0.8176])

Or mathematically, we write this entire hypothesis function as the following:

$ \sigma(x \cdot W + b) $

Where sigma is applied to each entry of the vector resulting from $W\cdot x + b$

$\sigma (x \cdot W + b) = \begin{bmatrix} \sigma(z_1) & \sigma(z_2) \end{bmatrix}$

Or expressing the above formula as two layers of a neural network, our linear layer and an activation layer.  We can express this as the following:

$z = (x \cdot W + b)$

$a = \sigma(z)$

Where $z$ is our linear layer and $a$ is our activation layer.

### Summary

In this lesson we saw the components to build a layer of a neural network.  A single layer is a combination of a weighted input and a sigmoid activation function.  

The weighted input can be represented by $x \cdot W + b$



$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

* The row vector $x$ represents the features of a single observation.
* Each column of the matrix W, contains the weights of a separate neuron, with the entries of $b$ as the corresponding biases.

The output of the weighted input is fed into the activation function, which applies an entrywise operation.  Here, we use the sigmoid function.  So we can summarize the operations of our entire layer as:

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

Or we can break up the above as a linear layer $z$ and an activation layer $a$ where:

$z = (x \cdot W + b)$

$a = \sigma(z)$

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="./jigsaw-icon.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [6]:
x @ W_new

array([ 9, 12])