# Back Propagation Step by Step

The image below shows you how back propagation starts once you've performed forward propagation. 

![Back Propagation Image](http://hmkcode.github.io/images/ai/backpropagation.png)

First we need to know why we need to write our own back propagation algorithm. Suppose if we were to build our own neural network, we'd need to perform back propagation. Without back propagation we would never get the right prediction. Let's go through this article to understand how back propagation happens.



In this post, we will be building a neural network with three layers where 1 layer is the **input layer**, 1 layer is the **output layer** and the other layer is the **hidden layer**.

We are going to consider:
*   **Input** layer with 2 neurons or nodes
*   **Output** layer with 1 neuron or node
*   Finally, **Hidden** layer with 2 neurons or nodes.

Here's an image which tell us what are the different parameters that are present in the neural network that we are trying to train.
![Neural Networks image](https://i.ibb.co/D8VnkYB/nn1.png)

# Weights, weights, weights

Neural network training is about finding weights that minimize prediction error. We usually start our training with a set of randomly generated weights.Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.



Our initial weights will be as following: ```w1 = 0.21```, ```w2 = 0.56```, ```w3 = 0.19```, ```w4 = 0.02```, ```w5 = 0.23``` and ```w6 = 0.16```

![Image with Weights](https://i.ibb.co/3SRN1YZ/bp-weights.png)


# Dataset
Our dataset has one sample with two inputs and one output.

![Input and Actual Output Representation](https://i.ibb.co/nk0kGpW/bp-dataset.png)


Our single sample is as following inputs=[8, 12] and output=[4].

![Data sample](https://i.ibb.co/N9tSRVg/bp-sample.png)

# Forward Pass
We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed forward to next layer.

![Forward Propagation](https://i.ibb.co/QK86MGr/bp-forward.png)

# Calculating Error
Now, it’s time to find out how our network performed by calculating the difference between the actual output and predicted one. It’s clear that our network output, or prediction, is not even close to actual output. We can calculate the difference or the error as following.

![Error Calculation](https://i.ibb.co/yP3zZJW/bp-error.png)


# Reducing Error
Our main goal of the training is to reduce the error or the difference between prediction and actual output. Since actual output is constant, “not changing”, the only way to reduce the error is to change prediction value. The question now is, how to change prediction value?

By decomposing prediction into its basic elements we can find that weights are the variable elements affecting prediction value. In other words, in order to change prediction value, we need to change weights values.

![Reducing Error](https://i.ibb.co/t3TMGLd/bp-prediction-elements.png)

> **The question now is how to change\update the weights value so that the error is reduced?**

> **The answer is Backpropagation!**

# Backpropagation
Backpropagation, short for “backward propagation of errors”, is a mechanism used to update the weights using gradient descent. It calculates the gradient of the error function with respect to the neural network’s weights. The calculation proceeds backwards through the network.

>Gradient descent is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

![gradient descent](https://i.ibb.co/d61ZZyt/bp-update-formula.png)

For example, to update w6, we take the current w6 and subtract the partial derivative of error function with respect to w6. Optionally, we multiply the derivative of the error function by a selected number to make sure that the new updated weight is minimizing the error function; this number is called learning rate.

![Updating weights](https://i.ibb.co/DbKtXk9/bp-w6-update.png)

The derivation of the error function is evaluated by applying the chain rule as following:
![Derivation of Back propagation](https://i.ibb.co/jkyPbzD/bp-error-function-partial-derivative-w6.png)

So to update w6 we can apply the following formula: 

![Updating w6 values](https://i.ibb.co/JtT2JHY/bp-w6-update-closed-form.png)

Similarly, we can derive the update formula for w5 and any other weights existing between the output and the hidden layer.

![Updating w5 values](https://i.ibb.co/GRrYmXj/bp-w5-update-closed-form.png)

However, when moving backward to update w1, w2, w3 and w4 existing between input and hidden layer, the partial derivative for the error function with respect to w1, for example, will be as following.

![Updating weight values from layer 0 to layer 1](https://i.ibb.co/09x50DP/bp-error-function-partial-derivative-w1.png)

We can find the update formula for the remaining weights w2, w3 and w4 in the same way.

In summary, the update formulas for all weights will be as following:

![Updating all weights](https://i.ibb.co/HnxhszX/bp-update-all-weights.png)

We can rewrite the update formulas in matrices as following:

![Updating all wieights in matrix](https://i.ibb.co/25b1fwb/bp-update-all-weights-matrix.png)

# Backward Pass
Using derived formulas we can find the new weights.

Learning rate: is a hyperparameter which means that we need to manually guess its value.

![Update all weights values](https://i.ibb.co/njz0djw/bp-new-weights.png)

Now, using the new weights we will repeat the forward passed.

![New forward propagation](https://i.ibb.co/685qtN7/bp-forward.png)

We can notice that the prediction 13.95 is very far from the actual output than the previously predicted value 2.2136. We have to repeat the process of backward pass and forward pass until error is close or equal to zero.



---



# Code for Neural Networks with Back Propagation

In [0]:
import numpy as np

In [0]:
input_data = np.array([[8], [12]])
output_data = np.array([4])

In [0]:
w0 = np.array([[0.21, 0.19],[0.56, 0.02]])
w1 = np.array([[0.23], [0.16]])

In [0]:
def forward_pass(data_in, w0,w1):
    layer0 = data_in.T
    layer1 = np.dot(layer0, w0)
    layer2 = np.dot(layer1, w1)
    
#     print(w0, w1)

    return layer0, layer1, layer2

In [0]:
def backpropogate(i, layer0, layer1, layer2, actual_y, w0,w1, learning_rate):
    delta = layer2 - output_data
    
    w1_1 = w1.copy()
    w1 = w1 - (learning_rate * delta * layer1).reshape(2,1)   
    w0 = w0 - (learning_rate * delta * w1_1.T * input_data)

    if i%1==0:
        loss = np.mean(np.power(layer2-actual_y, 2))*0.5
        print("\n", int(i), loss)
        
    return w0, w1

In [0]:
epochs = 2

In [0]:
for i in range(epochs):
  layer0, layer1, layer2 = forward_pass(input_data, w0,w1)
  w0,w1 = backpropogate(i,layer0, layer1, layer2, output_data, w0,w1, 0.05 )


 0 1.59561248

 1 49.24728928473661


In [0]:
  layer2

array([[13.92444349]])