# Building an Initial Layer

### Introduction

In the last lesson, we saw that we can represent the weights of our neurons with a weight matrix, $W$.  In that matrix, each vector represents the weights of a different neuron.  So in the matrix below, W contains the weights of two neurons.

$W = \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix}$

Then we used the weight matrix to calculate outputs of the linear functions of each neuron, by multiplying our weight matrix by a feature vector $x$, and adding the bias of each neuron.

> So below is the ouput of the linear component of the first layer.

$z = \begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} h(x) & h(x) \end{bmatrix}$

### Building a Random Neuron

Now in previous lessons, the weights for each of our neurons were given to us.  But as from the section on training, our neural network **does not know** which weights to begin with.  Rather, it learns these weights over time.  So for us, this means that we can just initialize our neural network with random weights, and the important thing is to get the dimensions of each neuron, and our layer correct.  Let's walk through this.

Let's continue with our domain of using neurons to predict if a cell is cancerous.  And say that we have a feature vector $x$ to represent our first observation:

In [30]:
import numpy as np
# perimeter, radius, volume, # asymmetries, bumpiness    
x = np.array([.5, .1, .3, .4, .8])


Now to initialize a neuron that accepts each of these features, we need to initialize a vector with five random weights: one for each feature.  And we need to initialize a random bias term.  We can do this with our linear layer function.

In [2]:
import torch.nn as nn

ll = nn.Linear(5, 1)

In [3]:
ll._parameters

OrderedDict([('weight',
              Parameter containing:
              tensor([[ 0.0661,  0.1877,  0.0044,  0.1915, -0.2948]], requires_grad=True)),
             ('bias',
              Parameter containing:
              tensor([0.1442], requires_grad=True))])

> Below we use the `np.random.randn` function.  This function takes two arguments, the number of rows to return and the number of columns we want to initialize.  So below, we call `randn(5, 1)` as we want a single neuron that has weights for five features.

In [3]:
import numpy as np 
np.random.seed(1)
n_1_w = np.random.randn(5, 1)
n_1_w

array([[ 1.62434536],
       [-0.61175641],
       [-0.52817175],
       [-1.07296862],
       [ 0.86540763]])

Notice that the values above are between -2 and -2.  This is because `np.random.randn` draws values from the standard normal distribution.  

> If you're not familiar with this distribution, it's not critical to your understanding going forward.  The important point is that we will draw values that on average are 0, and that over 99% of the time, the numbers drawn will be between -3 and 3.

Now let's initialize the bias term.  Here we just want to produce a single random value.

In [32]:
n_1_b = np.random.randn(1)
n_1_b

array([-0.97727788])

So we just initialized the weights and bias term for a single neuron that takes five features with the following lines of code:

In [6]:
import numpy as np 
np.random.seed(1)
n_1_w = np.random.randn(5, 1)

n_1_b = np.random.randn(1)

The code, `np.random.randn(5, 1)` constructed a matrix with five rows and one column.  The one column was for the single neuron and the five rows are for the five weights to correspond with the features of an observation.  The `n_1_b` only has a single random value as the linear component of a neuron has one bias term.

> $z(x) = w_1x_1 + w_2x_2 + ... + w_nx_n + b $

### Building a random layer

So now let's initialize weight matrix to represent *two neurons*, each with five weights.  Here it is.

In [9]:
np.random.seed(1)
W = np.random.randn(5, 2)
W

array([[ 1.62434536, -0.61175641],
       [-0.52817175, -1.07296862],
       [ 0.86540763, -2.3015387 ],
       [ 1.74481176, -0.7612069 ],
       [ 0.3190391 , -0.24937038]])

So this represents two neurons, with each neuron initialized with five differents weights.

And the bias is simply a vector with two entries.

> Press shift + return.

In [11]:
b = np.random.randn(2)
b

array([ 1.46210794, -2.06014071])

### Making Predictions

Once we have initialized the weights and biases of our two neurons, we can then use these produce an output.

We can do this by starting with our feature vector $x$.

And passing this feature vector through our linear layer, followed by our activation layer.

* $z(x) = x \cdot W + b$
* $a(z) = \frac{1}{1 + e^{-z}}$

In [12]:
import numpy as np
# perimeter, radius, volume, # asymmetries, bumpiness    
x = np.array([.5, .1, .3, .4, .8])
z = x.dot(W) + b
z

array([ 3.43424171, -3.66775645])

In [14]:
def sigmoid(z): return 1/(1 + np.exp(-z))

sigmoid(z)

array([0.9687577 , 0.02489796])

So we can see that each of our neurons makes a separate prediction.

### Practicing with Layers

Now let's say that we want to initialize the weights and biases for a neural network that looks like the following:

<img src="./first-layer.png" width="30%">

What the graph symbolizes above is that each observation $x$ has four features, $x_1$ through $x_4$ -- as indicated by the blue circles.  And we have a layer with four neurons.

> The lines drawn from each feature to each neuron to each circle indicate that each neuron receives each feature of the observation.

Now let's initialize parameters for the layer of the neural network that we see above.  

So that would be four columns (one for each neuron) and four rows (as each neuron has four weights) matrix.

In [3]:
import numpy as np
W = np.random.randn(4,4)
W

array([[-0.01791712,  0.20491942,  0.12166726,  0.85470516],
       [-0.54014973,  0.20377669, -0.64654733,  1.51752847],
       [-1.27656353, -0.42488659,  0.02028611, -0.53855715],
       [-0.00312273,  1.6536728 , -1.13278799, -0.18014837]])

In [4]:
b = np.random.randn(4)
b

array([ 0.51094515, -0.4406927 ,  0.58420857,  0.43645582])

So notice that above we have four different neurons, each taking in the the four features of an observation x, and making a prediction.

### Summary

In this lesson, we saw how we can use numpy to initialize a layer of a neural network.  We do with the `np.random.randn` function, and then specifying the specifying the dimensions of our layer.  For the weight matrix W, the number of rows corresponds to the length of the feature vector.  And there is one vector for each neuron.

<center>
<a href="https://www.jigsawlabs.io/free" style="position: center"><img src="https://storage.cloud.google.com/curriculum-assets/curriculum-assets.nosync/mom-files/jigsaw-labs.png" width="15%" style="text-align: center"></a>
</center>