# Multiple Neurons

### Introduction

In the last lesson, we saw new expressions for what occurs with a sigmoid neuron.  In general we can think of a neuron as having two components:

* Multiple dendrites which each receive a signal from an input 
* And a cell body that gets turned on based on a combination of the inputs from the dendrites

<img src="./neuron-math.png" width="60%">

$\begin{bmatrix}
x_1 & x_2 \\
\end{bmatrix}$

\begin{bmatrix}
w_1 \\
w_2 
\end{bmatrix}

$f(x) = \begin{bmatrix}
x_1 & x_2 \\
\end{bmatrix}\cdot \begin{bmatrix}
w_1 \\
w_2 
\end{bmatrix} + b$

$S(x) = \sigma(x \cdot w + b)$

### A more complicated neuron

* observation

In [2]:
import numpy as np
x = np.array([2, 4, 3, 1])
x

array([2, 4, 3, 1])

In [3]:
import numpy as np
w_sweet = np.array([1, 3, 0, -.5])

In [4]:
b_sweet = -12

In [5]:
x.dot(w_sweet) + b_sweet

1.5

In [6]:
def sigmoid(value): return 1/(1 + np.exp(-value))

In [7]:
sigmoid(x.dot(w_sweet) + b_sweet)

0.8175744761936437

### A second neuron

In [14]:
import numpy as np
x = np.array([2, 4, 3, 1])
x
# sweet taste 2, sweet smell 4, salty taste 3, salty smell 1

array([2, 4, 3, 1])

In [15]:
w_salty = np.array([0, -.5, 3, 1.5])

In [18]:
x.dot(w_salty)

8.5

In [16]:
b_salty = -8

In [11]:
x.dot(w_salty)

8.5

In [12]:
sigmoid(x.dot(w_salty) + b_salty)

0.6224593312018546

### Thinking with multiple neurons

$f\_sweet(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
1 \\
3 \\
0 \\
-.5 \\
\end{bmatrix} - 12$

$f\_salty(x) = \begin{bmatrix}
x_1 & x_2 & x_3 & x_4 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} - 8$

In [32]:
sigmoid(x.dot(w_salty) + b_salty)

sigmoid(x.dot(w_sweet) + b_sweet)

0.8175744761936437

And so if you look at a neural network diagram, you will see a diagram illustrating this point that each input goes to each neuron in that first layer. 

<img src="./first-layer.png" width="20%">

> The diagram above illustrates a neural network where each observation has four features, and each feature is an input to each of the four neurons in the first layer.

#### 2. We can make it brief

The second point is that the only thing different from neuron to neuron is the weight vector and the bias.  Let's leave aside the biases for a moment, leaving us with the following:

$f\_sweet(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
1 \\
3 \\
0 \\
-.5 \\
\end{bmatrix} = 2*1 + 4*3 + 3*0 + 1*-.5 = 13.5$

$f\_salty(x) = \begin{bmatrix}
2 & 4 & 3 & 1 \\
\end{bmatrix}\cdot \begin{bmatrix}
0 \\
-.5 \\
3 \\
1.5 \\
\end{bmatrix} = 2*0 + 4*-.5 + 3*3 + 1*1.5 = 8.5$

Now let's observe the following: 

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix}  = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix}  = \begin{bmatrix} l_1(x) & l_2(x) \end{bmatrix}$

Or applied to our example: 

$\begin{bmatrix}
2 & 4 & 3 & 1 \end{bmatrix} \cdot
\begin{bmatrix}
1 & 0\\
3 & -.5 \\
0 & 3 \\
-.5 & 1.5\end{bmatrix} = \begin{bmatrix}
13.5 & 8.5 \end{bmatrix}$

Let's prove this in code:

In [13]:
W = np.stack([w_sweet, w_salty]).T
W

array([[ 1. ,  0. ],
       [ 3. , -0.5],
       [ 0. ,  3. ],
       [-0.5,  1.5]])

In [37]:
result = x.dot(W)
result

array([13.5,  8.5])

So at this point we have just summarized the weights of multiple neurons, however we still have not included the biases.  To complete our linear component, we need to add the bias of $-12$ to $13.5$ and the bias of $-8$ to $8.5$.  We can do so with the following:

In [115]:
W.dot(x) + b

array([1.5, 0.5])

$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} l_1(x) & l_2(x) \end{bmatrix}$

So we can summarize the weighted input of a layer of neurons as:

$W\cdot x + b$

So given a matrix $W$,  Where each row of W represents the weights of a different neuron, and a vector $b$ where each entry of $b$ represents the corresponding bias of a neuron, we can calculate the outputs each of our sigmoid neurons in a layer as:

In [119]:
sigmoid(W.dot(x) + b)

array([0.81757448, 0.62245933])

Or mathematically, we can write our layer as the following:

$\sigma(W\cdot x + b)$

Where sigma is applied to each entry of the vector resulting from $W\cdot x + b$

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(l_1) \\ \sigma(l_2) \end{bmatrix}$

### Summary

In this lesson we saw the components to build a layer of a neural network.  A single layer is a combination of a weighted input and a sigmoid activation function.  

The weighted input can be represented by $x^T \cdot W + b$



$\begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} l_1(x) & l_2(x) \end{bmatrix}$

* The row vector $x$ represents the features of a single observation.
* Each column of the matrix W, contains the weights of a separate neuron, with the entries of $b$ as the corresponding biases.

The output of the weighted input is fed into the activation function, which applies an entrywise operation.  Here, we use the sigmoid function.  So we can summarize the operations of our entire layer as:

$\sigma (W\cdot x + b) = \begin{bmatrix} \sigma(l_1) \\ \sigma(l_2) \end{bmatrix}$