# Multiple Neurons

### Introduction

In the last lesson, we saw new expressions for what occurs with a sigmoid neuron.  In general we can think of a neuron as having two components:

* Multiple dendrites which each receive a signal from an input 
* And a cell body that gets turned on based on a combination of the inputs from the dendrites

<img src="./neuron-math.png" width="60%">

In the diagram above, $x_o$ is the input from the outside, which combined with a corresponding weight feeds into the cell body.  The cell body activates or does not activate based on the combination of these inputs.  And importantly, it mimics the pattern of receiving a linear function from the inputs and a non-linear function -- a binary turn on or off -- as the output.

Remember, that if we have inputs: 

$\begin{bmatrix}
x_1 & x_2 \\
\end{bmatrix}$

And corresponding weights:

\begin{bmatrix}
w_1 \\
w_2 
\end{bmatrix}

Then our linear function is:

$f(x) = \begin{bmatrix}
x_1 & x_2 \\
\end{bmatrix}\cdot \begin{bmatrix}
w_1 \\
w_2 
\end{bmatrix} + b$

And we simply wrap this output in a sigmoid function, so we can write our entire sigmoid neuron as:

$S(x) = \sigma(x \cdot w + b)$

### A more complicated neuron

Ok, let's go back to our example of a neuron determining whether food it tastes is sweet or not.  This time, let's say that the observation of the food involves the following:

* sweet taste 2
* sweet smell 4
* salty taste 3
* salty smell 1

And we capture this data in the vector $x$.

In [43]:
x = np.array([2, 4, 3, 1])
x

array([2, 4, 3, 1])

Now even though, this neuron is in charge of determining if something is sweet, it still observes all of the data.

In [56]:
w_sweet = np.array([1, 3, 0, -.5])

Here the more it detects a salty smell, it is less likely to find something sweet.  After all if something is smells salty, it could overpower a determination of sweetness.  And note that in making it's determination, it does not consider a saltiness taste at all(as indicated by the 0).

Let's set the bias at $-12$.

In [85]:
b_sweet = -12

In [86]:
x.dot(w_sweet) + b_sweet

1.5

In [59]:
sigmoid(x.dot(w_sweet) + b_neuron_sweet)

0.8175744761936437

### A second neuron

Now let's feed this data into a second neuron, who makes a different decision - whether or not something is salty.  Remember that our observation looks like the following:

In [61]:
x = np.array([2, 4, 3, 1])
x
# sweet taste 2, sweet smell 4, salty taste 3, salty smell 1

array([2, 4, 3, 1])

In [74]:
w_salty = np.array([0, -.5, 3, 1.5])

So notice this time around, the sweetness weights are either 0 or negative - as this neuron is about determining if something is salty.  The neuron has it's own bias for saltiness, which is $-7$.

In [88]:
b_salty = -8

In [89]:
sigmoid(x.dot(w_salty) + b_salty)

0.6224593312018546

### Thinking with multiple neurons

There are a couple points to take from the above discussion. 

#### 1. Different weights same input

The first point, it is the same vector $x$ that is an input to each neuron, and that these neurons simply weigh these inputs differently.

$f\_sweet(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
1 \\
3 \\
0 \\
-.5 \\
\end{bmatrix} - 12$

$f\_salty(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} - 8$

In [92]:
sigmoid(x.dot(w_salty) + b_salty)

sigmoid(x.dot(w_sweet) + b_sweet)

0.8175744761936437

And so if you look at a neural network diagram, you will see a diagram illustrating this point that each input goes to each neuron in that first layer. 

<img src="./first-layer.png" width="20%">

> The diagram above illustrates a neural network where each observation has four features, and each feature is an input to each of the four neurons in the first layer.

#### 2. We can make it brief

The second point is that the only thing different from neuron to neuron is the weight vector and the bias.  Let's leave aside the biases for a moment, leaving us with the following:

$f\_sweet(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
1 \\
3 \\
0 \\
-.5 \\
\end{bmatrix} = 2*1 + 4*3 + 3*0 + 1*-.5 = 13.5$

$f\_salty(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} = 2*0 + 4*-.5 + 3*3 + 1*1.5 = 8.5$

Now let's observe the following: 

$\begin{bmatrix}
- & w_1 & - & -  \\
- & w_2 & - & - \end{bmatrix}
\cdot \begin{bmatrix}
| \\ x\\ | \\ | 
\end{bmatrix}  = \begin{bmatrix}
x \cdot w_1 \\ x \cdot w_2 \end{bmatrix}$

Or applied to our example: 

$ \begin{bmatrix}
1 & 3 & 0 & -.5 \\
0 & -.5 & 3 & 1.5  \\
\end{bmatrix} \cdot \begin{bmatrix}
2 \\ 4 \\ 3 \\ 1 \end{bmatrix} = \begin{bmatrix}
13.5 \\ 8.5 \end{bmatrix}$

Let's prove this in code:

In [107]:
W = np.stack([w_sweet, w_salty])
W

array([[ 1. ,  3. ,  0. , -0.5],
       [ 0. , -0.5,  3. ,  1.5]])

In [111]:
result = W.dot(x)
result

array([13.5,  8.5])

So at this point we have just summarized the weights of multiple neurons, however we still have not included the biases.  To complete our linear component, we need to add the bias of $-12$ to $13.5$ and the bias of $-8$ to $8.5$.  We can do so with the following:

In [115]:
W.dot(x) + b

array([1.5, 0.5])

$\begin{bmatrix}
- & w_1 & - & -  \\
- & w_2 & - & - \end{bmatrix}
\cdot \begin{bmatrix}
| \\ x\\ | \\ | 
\end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}  = \begin{bmatrix}
x \cdot w_1 \\ x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 \\ b_2 \end{bmatrix}$

So we can summarize the linear component of a layer of neurons as:

$W\cdot x + b$

So given a matrix $W$,  Where each row of W represents the weights of a different neuron, and a vector $b$ where each entry of $b$ represents the corresponding bias of a neuron, we can calculate the outputs each of our sigmoid neurons in a layer as:

In [119]:
sigmoid(W.dot(x) + b)

array([0.81757448, 0.62245933])

In [117]:
x

array([2, 4, 3, 1])

$\sigma(W\cdot x + b)$

So each of our observed inputs go into each logistic regression function.  What's special about this, is that we do not have to specify what each logistic regression function is trying to predict.

Instead, we have the logistic regression functions each output "something useful", which can be used to predict our target -- like whether a review is positive or negative. 

<img src="./log-to-prediction.png" width="60%">

So notice that the inputs to our hypothesis function aren't features, but rather the outputs of the intermediate layer.  So the backpropagation algorithm asks the middle layers to find useful signals from the underlying data, such tht it helps the final classifier make a prediction.  These middle layers are called the **hidden layers** of a neural network.

From there, data scientists can simply introduce more layers.

<img src="./multilayer.png" width="60%">

### Summary

So we can see that when we think about how we would model an individual neuron, it lines up to our logistic regression function.  A logistic regression function has inputs which feed into a linear model, which then is passed to a non-linear function (our activation function).  With a neuron, we have various inputs which each enter the cell body through a dendrite.  We can think of each dendrite as amplifying or softening the input it receives.  Then the output of the cell body is determined by combination of these inputs. 