Fundamentally, we want to know, how the cost function change with update to its weights (w) in the network. So we can update the weights to minimize the cost function. The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

#### Backpropagation

For this tutorial, we’re going to use a neural network with two inputs, two hidden neurons, two output neurons. Additionally, the hidden and output neurons will include a bias.

Here’s the basic structure:

![NN](images/NN/NN_1.png)

In order to have some numbers to work with, here are the **initial weights**, the **biases**, and **training inputs/outputs**

![](images/NN/NN_1_numbers.png)

The goal of backpropagation is to optimize the weights so that the neural network can learn how to correctly map arbitrary inputs to outputs.

For the rest of this tutorial we’re going to work with a single training set: given inputs 0.05 and 0.10, we want the neural network to output 0.01 and 0.99.

### Feedword pass

In order to fully understand **backpropagation**, first we need to compute the feedforward pass. So it is important to understand about the feedforward propagation. Please go through below link before continuing.

[FeedForward Pass](https://github.com/smsrikanthreddy/deep_learning/blob/main/NN_Feedforward.ipynb)

### Backpropagation

Our goal with backpropagation is to update each of the weights in the network so that they cause the actual output to be closer the target output, thereby minimizing the error for each output neuron and the network as a whole.

##### Output Layer

Consider $w_5$. We want to know how much a change in $w_5$ affects the total error, aka $\frac{\partial E_{total}}{\partial w_{5}}.$


$\frac{\partial E_{total}}{\partial w_{5}}$ know as partial derivative of $E_{total}$ with respect to $w_5$/gradient with respect to $w_5$

By applying the [chain rule](https://en.wikipedia.org/wiki/Chain_rule) we know that:
![](images/NN/bp_1.PNG)

We need to figure out each piece in this equation.

First, how much does the total error change with respect to the output?

 $$ E_{total} = \frac{1}{2}(target_{o1} - out_{o1})^2 + \frac{1}{2}(target_{o2} - out_{o2})^2 $$
 $$ \frac{\partial{E_{total}}}{\partial{out_{o1}}}  = 2 * \frac{1}{2} (target_{o1} - out_{o1})^2 * -1 + 0 $$
 $$ \frac{\partial{E_{total}}}{\partial{out_{o1}}}  = -(target_{o1} - out_{o1}) = -(0.01 - 0.75136507) = 0.74136507 $$
 
Next, how much does the output of o_1 change with respect to its total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the output:

$$ out_{o1} = \frac{1}{1+e^{-net_{o1}}} $$
$$ \frac{\partial{E_{total}}}{\partial{out_{o1}}}  = -out_{o1}(1 - out_{o1}) = 0.751365071(1-0.751365071) = 0.186815602 $$

Finally, how much does the total net input of o1 change with respect to $w_5$?


### References
-  https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
-  http://neuralnetworksanddeeplearning.com/chap2.html