# Building Layers

### Introduction

In the last lesson, we learned about initializing a layer to our neural network.  We did so by building a weight matrix where we specified the dimensions of the attributes of the layer - a column for each neuron, and a row for each feature that the neuron accepts. 

In this lesson we'll move beyond constructing single layers and learn how to build a network with multiple layers.

### Multiple Layers

Let's start by taking another look at our single layer matrix:

<img src="./first-layer.png" width="20%">

It would be a bit large to draw a picture of, but imagine that we really have a layer with 5 neurons, and that each observation has 10 features.  Remember that we want each neuron to accept each of these ten features. 

Ok, now let's initialize our weight matrix.

In [2]:
import numpy as np
np.random.seed(2)
W_1 = np.random.randn(10, 5)

b_1 = np.random.randn(5)
b_1

array([ 1.00036589, -0.38109252, -0.37566942, -0.07447076,  0.43349633])

Now this indicates that we start with an observation with four features, and each feature in the observation is passed to each neuron in the first layer.  Then each of those neurons has an output.

In [3]:
def sigmoid(value): return 1/(1 + np.exp(-value))

x = np.array([.9, .4, .5, .6, .9, .8, .7, .2, .4, .3])
sigmoid(x.dot(W_1) + b_1)

array([0.28594759, 0.91737969, 0.00451515, 0.17750781, 0.32831314])

We can see that we get five different outputs ranging from 0 to 1, one for each of our neurons. 

Now to build second layer in our neural network, we take these five outputs from our first layer and feed these ouputs into each of the neurons in our second layer.

So think about what this means.  Our second layer can have as many neurons as we want, but the number of features of each neuron must equal the outputs - that is the number of neurons - from the previous layer.

<img src="./two-layers.png" width="40%">

Remember that our first layer has five neurons, with each taking in an observation of ten features.  Each neuron outputs a single value, and together the layer outputs five values.  Those five values are the inputs to each of the three neurons in the third layer.

Let's think about how we can represent this in code.

We already have the first layer and the corresponding biases.

In [6]:
W_1.shape

(10, 5)

In [8]:
b_1.shape

(5,)

> So we confirmed our first layer has five neurons ten parameters each.

In [10]:
W_1.shape

(10, 5)

So this means that because our layer outputs a vector of size 5, the next layer must have five rows, with a column for each neuron.

In [11]:
W_2 = np.random.randn(5,3)

In [12]:
b_2 = np.random.randn(3)

We can place these two layers into a single list.

Now let's play around with this a little bit.  If we want to execute the first layer, we do the following:

In [16]:
first_output = sigmoid(x.dot(W_1) + b_1)
first_output

array([0.28594759, 0.91737969, 0.00451515, 0.17750781, 0.32831314])

And then this is fed into the second layer.

In [39]:
sigmoid(second_layer[0].dot(first_output) + second_layer[1])

array([0.63303723, 0.54528456, 0.41825229])

Great.  Now let's use a loop to clean this up.

In [21]:
def feed_forward(input_features, layer):
    W = layer[0]
    b = layer[1]
    return sigmoid(input_features.dot(W) + b)

In [23]:
input_features = x
layers = [(W_1, b_1), (W_2, b_2)]
for layer in layers:
    input_features = feed_forward(input_features, layer)
input_features

array([0.2839813 , 0.29652172, 0.377956  ])

So that is how our neural network makes a prediction.  

1. Initialize random weights and biases for each layer where the number of weights of each neuron are the columns, and there is a separate row for each neuron.

2. Feed forward each layer with the formula $\sigma(W \cdot x + b)$, where x is the vector of inputs for the first layer, and afterwards is the vector of the previous layer's outputs

### The last layer

We want our final layer to tell us the likelihood that our observation is each possibility.  So if we want our network to classify an observation as cancerous or benign, our last layer has two outputs.  If we would like our network to classify an image of one of twenty-six letters our last layer has 26 outputs (27 if we have a none of the above category).  So in the last layer, the number of neurons determines the number of outputs we predict.  So for a neuron predicting letters, we would have 26 neurons in the last layer.