# Multiple Neurons

### Introduction

In the last lesson, we saw that we can rewrite the hypothesis function for the sigmoid neuron by using the dot product.  Specifically, we rewrote our linear function: 

$z(x) = w_1x_1 + w_2x_2 + b$ as: 

$z(x) =  \begin{bmatrix}
w_1 & w_2 \\
\end{bmatrix}\cdot \begin{bmatrix}
x_1 \\
x_2 
\end{bmatrix} + b = (w\cdot x + b)$

Then, we wrapped our linear function $z(x)$ in our activation function, the sigmoid function, to get $h(x) = \sigma(x \cdot w + b)$.  We diagram this hypothesis function as the following:

<img src="./sigmoid-neuron.png" width="40%">

In this lesson, we'll learn what it means to make predictions with an multiple neurons simultaneously.  Let's get started.

### More features, more neurons

Ok, now let's return to our neuron that determines whether food contains sugar or not.  Let's say that the first food item contains the following attributes.

* sweet taste 2, sweet smell 4
* salty taste 3, salty smell 1

And we capture this data in the vector $x$.

In [2]:
import numpy as np
x = np.array([2, 4, 3, 1])
x

array([2, 4, 3, 1])

Now even though our neuron is only in charge of determining if something contains sugar, it still observes *all* of the features of the food.

In [3]:
import numpy as np
w_sugar = np.array([1, 3, -.5, 0])

As we can see from the corresponding weights above, in our sugar neuron there are positive weights associated with the sweetness features, and 0 or negative weights associated with saltiness.  In other words, looking at the last two weights, the more the neuron detects a salty taste, the less it seems to think there is sugar.  

Ok, now let's set the neuron's bias to $-12$.

In [4]:
b_sugar = -12

And we can calculate the output from the linear component of the neuron as:

In [37]:
x.dot(w_sugar) + b_sugar

0.5

And the hypothesis of the sugar neuron as:

In [6]:
def sigmoid(value): return 1/(1 + np.exp(-value))

In [10]:
sigmoid(x.dot(w_sugar) + b_sugar)

0.6224593312018546

So we can see that the neuron assigns a probability of $.62$ that the first observation contains sugar.

### A second neuron

Now let's feed this data into a second neuron, in charge of a different prediction - whether the food contains salt.  Our food observation is the same, and so has the same four features:

In [11]:
import numpy as np
# Sweet          Salty
# taste smell  taste smell
#  2,     4,      3,  1
x = np.array([2, 4, 3, 1])

In [12]:
w_salt = np.array([0, -.5, 3, 1.5])

Notice that this time around, the weights associated with sweetness are either 0 or negative - as this neuron determines if something is salty.  The neuron also has it's own bias for saltiness, which is $-7$.

In [13]:
b_salt = -7

In [14]:
x.dot(w_salt)

8.5

In [15]:
sigmoid(x.dot(w_salt) + b_salt)

0.8175744761936437

So this second neuron predicts whether or not the food contains salt, and here assigns a $.817$ likelihood to the probability of there being salt.

### Thinking with multiple neurons

There are a couple points to take from the above discussion. 

#### 1. Different weights same input

The first point, is that we use the **same vector** $x$ as an input to each neuron.  The two neurons simply weigh these inputs differently.

$z_{sugar}(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \end{bmatrix} \cdot \begin{bmatrix}
1 \\ 3 \\ -.5 \\ 0
\end{bmatrix}  - 12 = .5$

$z_{salt}(x) = \begin{bmatrix}
x_1 &
x_2 &
x_3 &
x_4 
\end{bmatrix} \cdot \begin{bmatrix}
0 \\ -.5 \\ 3 \\ 1.5 \\
\end{bmatrix} - 8 = 1.5$

In [38]:
sigmoid(x.dot(w_sugar) + b_sugar)
# 0.6224593312018546

sigmoid(x.dot(w_salt) + b_salt)
# 0.8175744761936437

0.8175744761936437

If we look at the diagram below, representing the layer of a neural network, it's illustrating this point: the attributes of an observation $x_1$ through $x_4$, are fed to each of the four neurons in the layer.  That's why each feature has a line drawn to each neuron.

<img src="./first-layer.png" width="20%">

So just to recap, every neuron in a layer receives the same inputs.  Each neuron just has separate weights and biases.

## Making it brief

Now that we understand that each neuron receives the same inputs, let's see if we can condense our representation of our neurons.  To do this, let's start by removing the biases of our two neurons.  This gives us:

$g_{sugar}(x) =    \begin{bmatrix}
2 & 4 & 3 & 1
\end{bmatrix} \cdot \begin{bmatrix}
1 \\
3 \\
-.5  \\
0
\end{bmatrix} = 1*2 + 3*4 + -.5*3+ 0*1 = 12.5$

$g_{salt}(x) = \begin{bmatrix}
2 & 4 & 3 & 1 
\end{bmatrix} \cdot  \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5
\end{bmatrix} = 2*0 + 4*-.5 + 3*3 + 1*1.5 = 8.5$

Next, observe that if we combine these two weight vectors into a matrix, then we can get the following: 

$\begin{bmatrix}
2 & 4 & 3 & 1 
\end{bmatrix} \cdot \begin{bmatrix}
1 & 0 \\
3 & -.5 \\ 
-.5 & 3 \\ 
1.5 & 0
\end{bmatrix}  = \begin{bmatrix}
12.5 & 8.5 \end{bmatrix}$

Or more generally: 

$ \begin{bmatrix}
-  &  w_{1} & - \\
-  &  w_{2} & - \\
\end{bmatrix} \cdot \begin{bmatrix}
| \\ x \\  |  
\end{bmatrix}  = \begin{bmatrix}
 w_{1} \cdot x  \\ w_{2} \cdot x \end{bmatrix}  = \begin{bmatrix} z_{1}(x) & z_{2}(x) \end{bmatrix}$

Let's prove this in code.  We'll start by placing our feature vectors in a matrix W. 

In [30]:
w_sugar = np.array([1, 3, -.5, 0])
w_salt = np.array([0, -.5, 3, 1.5])


W = np.stack([w_sugar, w_salt])
W

array([[ 1. ,  3. , -0.5,  0. ],
       [ 0. , -0.5,  3. ,  1.5]])

And then we'll multiply the attributes of our observation $x$ by the weights of these neurons.

In [31]:
result = W.dot(x)
result

array([12.5,  8.5])

So we have just seen that we can use matrix multiplication to calculate the weighted sum of multiple neurons.  

However, we still have not included the biases.  To complete our linear function, we need to add the bias of $-12$ for $z_{sugar}$ and our bias of $-7$ for $z_{salt}$.  We do so by placing the two biases into a vector.

In [22]:
b = np.array([-12, -7])

In [32]:
W.dot(x) + b

array([0.5, 1.5])

Or summarizing the above, we can calculate the outputs of both linear functions with:

$ \begin{bmatrix}
-  &  w_{1} & - \\
-  &  w_{2} & - \\
\end{bmatrix} \cdot \begin{bmatrix}
| \\ x \\  |  
\end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix}
w_1  \cdot x  \\ w_2 \cdot x \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} z_1(x) & z_2(x) \end{bmatrix}$

Which we can summarize as:

$z =  W \cdot x  + b$

Where z is a vector consisting of the output of each neuron's linear function.

In [33]:
W.dot(x) + b

array([0.5, 1.5])

> Your turn

Now consider that we have two different neurons that detect savory and bitter tastes.  They do so with the following weights.

In [34]:
import numpy as np

x = np.array([2, 4, 3, 1])

w_savory = np.array([2, 0, 1, 2])
w_bitter = np.array([1, 1, 2, 0])

W_new = np.vstack([w_savory, w_bitter])
W_new

array([[2, 0, 1, 2],
       [1, 1, 2, 0]])

Try to calculate the output of $ W \cdot x$, where $x$ is the following.

In [None]:
# write your answer here



Check your answer with the dot product below.

## Including the activation layer

And we can calculate hypothesis made by each neuron in a layer with:

In [35]:
sigmoid(W.dot(x) + b)

array([0.62245933, 0.81757448])

Or mathematically, we can write our layer as the following:

$\sigma(W\cdot x + b)$

Where sigma is applied to each entry of the vector resulting from $W\cdot x + b$

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

Or expressing the above formula as two layers of a neural network, our linear layer and an activation layer.  We can express this as the following:

$z = (W\cdot x + b)$

$a = \sigma(z)$

Where $z$ is our linear layer and $a$ is our activation layer.

### Summary

In this lesson we saw the components to build a layer of a neural network.  A single layer is a combination of a weighted input and a sigmoid activation function.  

The weighted input can be represented by $x \cdot W + b$



$\begin{bmatrix}
-  & w_1 & -  \\
- & w_2 & - \\
\end{bmatrix} \cdot \begin{bmatrix}
| \\ x \\  | 
\end{bmatrix}  + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix}
 w_1 \cdot x \\ w_2 \cdot x \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} = \begin{bmatrix} l_1(x) & l_2(x) \end{bmatrix}$

* The row vector $x$ represents the features of a single observation.
* Each column of the matrix W, contains the weights of a separate neuron, with the entries of $b$ as the corresponding biases.

The output of the weighted input is fed into the activation function, which applies an entrywise operation.  Here, we use the sigmoid function.  So we can summarize the operations of our entire layer as:

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \end{bmatrix}$

Or we can break up the above as a linear layer $z$ and an activation layer $a$ where:

$z = (W\cdot x + b)$

$a = \sigma(z)$

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/jigsaw-labs.png" width="15%" style="text-align: center"></a>
</center>

### Answers

In [36]:
W_new.dot(x)

array([ 9, 12])