# Building Layers

### Introduction

In the last lesson, we learned about initializing a layer to our neural network.  We did so by building a weight matrix where we specified the dimensions of the attributes of the layer - a column for each neuron, and a row for each feature that the neuron accepts.  For example, we constructed a linear layer with two neurons, each of which accepts five features as the following:

In [3]:
import numpy as np
W = np.random.randn(5, 2)
W

array([[-1.49368401,  0.12450703],
       [-2.32335428, -0.19064423],
       [ 0.4744999 ,  1.11803926],
       [-0.54752483,  1.93996   ],
       [ 0.26949592, -0.741873  ]])

In [5]:
b = np.random.randn(2)
b

array([1.59001735, 0.09078673])

In this lesson we'll move beyond constructing single layers and learn how to build a network with multiple layers.

### Multiple Layers

Let's start by taking another look at our single layer matrix:

<img src="./first-layer.png" width="20%">

Now let's imagine that each observation in our training data has 10 features.  For example, our feature vector x, representing our first observation looks like the following: 

In [23]:
x = np.array([.9, .4, .5, .6, .9])

To build a layer that takes in a feature vector of this size, we need a matrix of 10 rows.  And let's have a layer with five neurons, giving us five columns.

In [24]:
import numpy as np
np.random.seed(2)
W_1 = np.random.randn(5, 3)

Our bias vector has five entries, one for each neuron.

In [28]:
b_1 = np.random.randn(3)
b_1

array([-0.87810789, -0.15643417,  0.25657045])

Now let's pass our first observation through the linear layer, and then the sigmoid function for the activation layer.

In [30]:
def sigmoid(value): return 1/(1 + np.exp(-value))

l1 = x.dot(W_1) + b_1
a1 = sigmoid(l1)
a1

array([0.29866564, 0.09776106, 0.33822734])

So we can see that we get five different outputs ranging from 0 to 1, one for each of our neurons. 

Now if we want to add a second layer to our neural network, we take the five outputs from the neurons in our first layer and feed these ouputs to each neurons in our second layer.

> This is called a **fully connected** neural network.  In a fully connected neural network, every neuron in one layer connects to every neuron in another layer.

In this case, the number of features accepted by a neuron in one layer must equal the number of inputs it receives from the previous layer.  

> (Or in case of the first layer, must equal the number of features of an observation.)  

In contrast to the features, the number of *neurons* that composes each layer is not constrained by previous layers.  

* Or to put this in terms of our weight matrix, the number of rows in a weight matrix is equal to the number of the neurons in the previous layer, as each neuron outputs a single number. 

### Walking through an example

Let's see this with our example above.  We start with an observation's feature vector that has five entries.

In [13]:
x

array([0.9, 0.4, 0.5, 0.6, 0.9, 0.8, 0.7, 0.2, 0.4, 0.3])

This means our weight matrix must have five rows (one for each weight of the neuron).  And we previously specified that there be three columns, to represent the weights of three neurons.

In [31]:
W_1

array([[-0.41675785, -0.05626683, -2.1361961 ],
       [ 1.64027081, -1.79343559, -0.84174737],
       [ 0.50288142, -1.24528809, -1.05795222],
       [-0.90900761,  0.55145404,  2.29220801],
       [ 0.04153939, -1.11792545,  0.53905832]])

And the outputs of the neurons' linear components are.

In [36]:
x.dot(W_1)

array([ 0.02444785, -2.0659189 , -0.92777425])

In [37]:
sigmoid(x.dot(W_1) + b_1)

array([0.29866564, 0.09776106, 0.33822734])

Because we get an output from each of the three neurons, our next weight matrix, $l2$ must have three rows, one row for the weights of each neuron.

> We'll initialize it to have two neurons.  Each column in the weight matrix represents a neuron.

In [41]:
W_2 = np.random.randn(3, 2)
W_2

array([[ 0.77101174, -1.86809065],
       [ 1.73118467,  1.46767801],
       [-0.33567734,  0.61134078]])

> And there is a bias term for each vector.

In [43]:
b_2 = np.random.randn(2)
b_2

array([ 0.04797059, -0.82913529])

### Wrapping Up

We can initialize our weights and biases so that our layers are fully connected.

In [None]:
x = np.array([0.9, 0.4, 0.5, 0.6, 0.9, 0.8, 0.7, 0.2, 0.4, 0.3])
W_1 = np.random.randn(5, 3)
b_1 = np.random.randn(3)

W_2 = np.random.randn(3, 2)
b_2 = np.random.randn(2)

And then we feed forward our neural network.

* feedforward because information ﬂows through the function being evaluated from x, through the intermediate computations used to deﬁne f, and ﬁnally to the output y.

In [47]:
l1 = sigmoid(x.dot(W_1) + b_1)
l2 = sigmoid(l1.dot(W_2) + b_2)

### Summarize

So that is how our neural network makes a prediction.  

1. Initialize random weights and biases for each layer where the number of weights of each neuron are the rows, and there is a separate vector for each neuron.

2. Feed forward each layer with the formula $\sigma(W \cdot x + b)$, where x is the vector of inputs for the first layer, and afterwards x is the vector of the previous layer's outputs

3. For each weight matrix W, the rows represent the number of weights of each neuron and must equal the number of neurons in the previous layer (or for the first layer, the number of features).

* It gets kinda confusing because now we are calling each neuron a linear component.