# Week 5, Back-propagation

## Further reading

* Elements of Statistical Learning: Chapter 11, Neural Networks ([get a copy](https://web.stanford.edu/~hastie/ElemStatLearn/))
* Computer Age Statistical Inference: Chapter 18, Neural Networks and Deep Learning (p. 351) ([get a copy](https://web.stanford.edu/~hastie/CASI/))
* [Peter's Notes](http://peterroelants.github.io/posts/neural_network_implementation_part01/) are a bit mathy and specific, but I've found them helpful when confused

## Code

### Feed-forward basics

#### Numpy importing

To get started, we'll need to import `numpy` to deal with all the matrices involved. Each NN library you use will have a way of handling matrices. They tend to be similar and might even just work with `numpy` matrices seemlessly.

In [7]:
import numpy as np

#### Values a-flowing

To feed-forward data through a neural network is to pass data through the network's weights and activation function in order to produce an output. The feed-forward begins with the input layer, a copy of the input data. The network's activity begins at the first hidden layer, when the input signal is passed through synapses (weights) and is transformed by the neuron (activation function). Here is what it does:

Signal passing though weights: $z_1 = B_1 + x W_1$

Signal shaped by activation function: $a_1 = \sigma(z_1)$

Let's look at that more closely. The $B_1$ (biases) and the $W_1$ (weights) are the "synapses" of the "neurons": they are the connections the neurons use to pull in data. During training these connections are tuned by the neurons to help the NN perform better. They can be increased to amplify an incoming variable, set to zero to ignore one, or made negative to invert the inbound signal.

* Synapses strengthen or weaken incoming variables with weights
* Neurons combine incoming data and transform it with an activation function

The weights are matrices with dimensions $in \times out$, $in$ the size of the data coming in and $out$ the size of the data coming out. $out$ is the number of neurons in the layers, and $in$ is the amount of values each of these neurons is fed during feed-forward. When you look

These matrices have dimensions of $in \times out$, $in$ the size of data coming in and $out$ the number of neurons. Each neuron produces a single output, so the number of columns in the weight matrices also corresponds to the size of the layer's output.

![Dimensions]("W5_SimpleNeurons.png" Dimensions)

Above you can see that the input data has size 3 and that each neuron has three synapses. There are 2 neurons in the layer and they produce 2 outputs, one each.

The crucial part of the neural network is that $X$ is matrix-multiplied by $W_1$, which is really $1 \times in$ by $in \times out$ giving a $1 \times out$ matrix. The properties of matrix multiplication being what they are, each neuron "works" on the whole data independently and outputs a value separately.

#### Neurons a-working

The data is attended by multiple neurons, each able to perform its own processing of the data.

Here is a trivial but familiar example. You can see that each neuron (column) does its own thing. Change one of the weight's element to see the effect on the output.

In [8]:
x = np.array([[1, 2, 3, 4, 5]])
weight = np.array([[1, 0, 0, 0, 0],
                   [0, 1, 0, 0, 0],
                   [0, 0, 1, 0, 0],
                   [0, 0, 0, 1, 0],
                   [0, 0, 0, 0, 1]])
x.dot(weight)

array([[1, 2, 3, 4, 5]])

The feed-forward is then a great mixing of data among layers of neurons, finally creating a network output. This "densely connected" NN has each neuron working separately from its neighbors but seeing the whole of the previous layer's data. These plentiful connections mean that the NN can model much more complicated functions. Although each layer is essentially linear, great power comes from the layers *interacting*.

Or in other words, feed-forward is like a decision reached by successive commitees. Each neuron of each commitee works on a group report sent to its superior commitee, who work at a "higher" level further removed from the raw data. The big cheese at the output layer summarizes everything into a value from 0 to 1.

#### Hiddens layers feed-forwarding

With all that in mind, this is the feed-forward:

$$z_1 = B_1 + X W_1$$

$$a_1 = \sigma(z_1)$$

$$z_2 = B_2 + a_1 W_2$$

$$a_2 = \sigma(z_2)$$

$$z_{output} = B_{output} + a_2 W_{output}$$

$$a_{output} = \sigma(z_{output})$$

Let's generate some data.

In [9]:
X = np.random.random((10,5)) # Ten records of 5 variables
b1 = np.random.random((1,3)) # 1 x layer_1_size
w1 = np.random.random((5,3)) # input_vars x layer_1_size
b2 = np.random.random((1,2)) # 1 x layer_2_size
w2 = np.random.random((3,2)) # layer_1_size x layer_2_size
b_out = np.random.random((1,1)) # 1 x output_size
w_out = np.random.random((2,1)) # layer_2_size x output_size

Here then are the feed-forward results.

In [10]:
def sigmoid(z):
    return 1/(1+np.exp(-z))

# First hidden layer, three neurons each give an output
z1 = b1 + X.dot(w1)
a1 = sigmoid(z1)
print(a1)

[[ 0.82777957  0.72575849  0.77176295]
 [ 0.92620955  0.85706331  0.8890858 ]
 [ 0.83853692  0.77898447  0.75915107]
 [ 0.91532254  0.76986905  0.90732909]
 [ 0.77760853  0.66409317  0.75770406]
 [ 0.8264613   0.74243828  0.79285232]
 [ 0.89717011  0.82897145  0.83053974]
 [ 0.91557286  0.8278428   0.85564208]
 [ 0.89115127  0.81131168  0.83602588]
 [ 0.81392367  0.69700482  0.799218  ]]


In [11]:
# Second hidden layer, two neurons each give an ouput
z2 = b2 + a1.dot(w2)
a2 = sigmoid(z2)
print(a2)

[[ 0.84824552  0.83219524]
 [ 0.86711719  0.84638292]
 [ 0.85392645  0.83686628]
 [ 0.85859515  0.83913049]
 [ 0.839436    0.82559796]
 [ 0.85042012  0.83387462]
 [ 0.86233629  0.84292561]
 [ 0.863309    0.84342722]
 [ 0.86050841  0.8414055 ]
 [ 0.84527275  0.82972201]]


In [12]:
# Output layer: one output for each input record
z_out = b_out + a2.dot(w_out)
a_out = sigmoid(z_out)
print(a_out)

[[ 0.79886702]
 [ 0.80021694]
 [ 0.79927688]
 [ 0.79960233]
 [ 0.79823471]
 [ 0.79902323]
 [ 0.79987655]
 [ 0.79994444]
 [ 0.79974492]
 [ 0.7986521 ]]
