# Building an Initial Layer

### Introduction

In the last lesson, we saw that we can represent the weights of our neurons with a weight matrix, $W$.  In that matrix, each vector represents the weights of a different neuron.  So below, W contains the weights of two neurons.

$W = \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix}$

Then we used the weight matrix to calculate outputs of the linear functions of each neuron, by multiplying our weight matrix by a feature vector x, and adding the bias of each neuron.

> So below is the ouput of the linear component of the first layer.

$z = \begin{bmatrix}
- & x &  -  
\end{bmatrix} \cdot \begin{bmatrix}
|  & |  \\
w_1  & w_2 \\
|   & |
\end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix}
x \cdot w_1 & x \cdot w_2 \end{bmatrix} + \begin{bmatrix} b_1 & b_2 \end{bmatrix} = \begin{bmatrix} h(x) & h(x) \end{bmatrix}$

### Building a Random Neuron

Now so far, the weights for each of our neurons were given to us.

But as we'll see when move into the section on training, our neural network **does not know** which weights to begin with.  Rather, it learns these weights over time.  

So for us, this means that we can just initialize our neural network with random weights, and the important thing is to get the dimensions of each neuron, and our layer correct.  Let's walk through this.

Let's move back to our domain of using neurons to predict if a cell is cancerous.  Let's say that for our first observation, we have a feature vector $x$, displayed below:

In [11]:
import numpy as np
# perimeter, radius, volume, # asymmetries, bumpiness    
x = np.array([2, 1, 1, 5, 4])


Now to initialize a neuron that accepts each of these features, we simply need to initialize a vector with five random weights: one for each feature.  And initialize a random bias term.  We can do this with numpy.

In [13]:
import numpy as np 
n_1_w = np.random.randn(1,5)
n_1_w

array([[-2.06014071, -0.3224172 , -0.38405435,  1.13376944, -1.09989127]])

There are a couple of points to note from above.

1. The `randn` takes two arguments, the number of rows to return and the number of columns to return.  So above we call `randn(1, 5)` as we want a vector of one row and five columns.
2. `Randn` draws values from the standard normal distribution.  This is a good starting point to initialize a weight matrix.  
> If you're not familiar with this distribution, it's not critical to your understanding going forward.  The important point is that we will draw values centered around 0, and that 95% of numbers will be between -2 and 2.

Now let's initialize the bias term.

In [14]:
n_1_b = np.random.randn(1)
n_1_b

array([-0.17242821])

So if we want to use this weight vector to take in our feature vector and make a prediction, it can do so with the following:

In [15]:
def sigmoid(value): return 1/(1 + np.exp(-value))

In [17]:
sigmoid(x.dot(n_1_w[0]) + n_1_b[0])

0.02343297785415176

Now we don't know if this prediction is any good, or even what this neuron is responsible for predicting.  But that's ok.  We'll worry about that when we discuss training.  The important thing is that we have one neuron that takes in our observation and predicts something.

And we got that by initializing a random vector that had the same length as the feature vector, $x$, and a bias term consisting of a single value.

### Building a random layer

1. Recap

So we built a single neuron with two lines of code - one line to represent the weights.

In [24]:
np.random.randn(5, 1)

array([[ 0.05080775],
       [-0.63699565],
       [ 0.19091548],
       [ 2.10025514],
       [ 0.12015895]])

And another line of code to represent the bias.

In [30]:
np.random.randn(1)

array([1.74481176])

With the code, `np.random.randn(5, 1)` we are constructing a matrix with five rows and one column.  The one column is for our single neuron and the five rows are for the five weights to correspond with the features of an observation.

2. Go further

So now let's initialize weight matrix to represent two neurons, each with five weights.  Here it is.

In [25]:
W = np.random.randn(5, 2)
W

array([[ 0.61720311,  0.30017032],
       [-0.35224985, -1.1425182 ],
       [-0.34934272, -0.20889423],
       [ 0.58662319,  0.83898341],
       [ 0.93110208,  0.28558733]])

And the bias is simply a vector with two entries.

In [21]:
b = np.random.randn(2)
b

array([0.2344157 , 1.65980218])

And we can make predictions with these neurons with our normal function:

In [22]:
sigmoid(x.dot(W) + b)

array([0.00162605, 0.00313009])

So if we want to build the weights and biases for a neural network that looks like the following:

<img src="./first-layer.png" width="30%">

That's simply four columns (one for each neuron) and four rows (each neuron has four weights) matrix.

In [3]:
import numpy as np
W = np.random.randn(4,4)
W

array([[-0.01791712,  0.20491942,  0.12166726,  0.85470516],
       [-0.54014973,  0.20377669, -0.64654733,  1.51752847],
       [-1.27656353, -0.42488659,  0.02028611, -0.53855715],
       [-0.00312273,  1.6536728 , -1.13278799, -0.18014837]])

In [4]:
b = np.random.randn(4)
b

array([ 0.51094515, -0.4406927 ,  0.58420857,  0.43645582])

So notice that above we have four different neurons, each taking in the the four features of an observation x, and making a prediction.

### Summary

In this lesson, we saw how we can use numpy to initialize a layer of a neural network.  We do with the `np.random.randn` function, and then specifying the specifying the dimensions of our layer.  For the weight matrix W, the number of rows corresponds to the length of the feature vector.  And there is one vector for each neuron.

* How many neurons to initialize with?

* Make sure you don't say `length` to describe the feature vector.  It's number of entries or dimensions.