# ASSIGNMENT 2a




## Backpropogation Step by Step

![backpropogation](https://hmkcode.github.io/images/ai/backpropagation.png)

If you are building your own neural network, you will definitely need to understand how to train it. Backpropagation is a commonly used technique for training neural network. There are many resources explaining the technique, but this post will explain backpropagation with concrete example in a very detailed colorful steps. 


## Overview

In this post, we will build a neural network with three layers:

   * **Input** layer with two inputs neurons
   * One **hidden** layer with two neurons
   * **Output** layer with a single neuron
   
   ![img](https://hmkcode.github.io/images/ai/nn1.png)


## Weights, weights, weights

Neural network training is about finding weights that minimize prediction error. We usually start our training with a set of randomly generated weights.Then, backpropagation is used to update the weights in an attempt to correctly map arbitrary inputs to outputs.

Our initial weights will be as following: `w1 = 0.17`, `w2 = 0.25`, `w3 = 0.22`, `w4 = 0.09`, `w5 = 0.11` and `w6 = 0.13`

![nn](https://github.com/navpreetnp7/img/raw/master/nn1.png)

## Dataset

Our dataset has one sample with two inputs and one output.

![img1](https://hmkcode.github.io/images/ai/bp_dataset.png)

Our single sample is as following `inputs=[2, 3]` and 
`output=[1]`.

![img2](https://hmkcode.github.io/images/ai/bp_sample.png)



## Forward Pass

We will use given weights and inputs to predict the output. Inputs are multiplied by weights; the results are then passed forward to next layer.
![nn2](https://github.com/navpreetnp7/img/raw/master/nn2.png)

$\begin{bmatrix}2 & 3\end{bmatrix} . \begin{bmatrix}.17 & .22\\.25 & .09\end{bmatrix} = \begin{bmatrix}1.09 & .71\end{bmatrix} . \begin{bmatrix}.11\\.13\end{bmatrix} = \begin{bmatrix}.2122\end{bmatrix}$


## Calculating error

Now, it’s time to find out how our network performed by calculating the difference between the actual output and predicted one. It’s clear that our network output, or **prediction**, is not even close to **actual output**. We can calculate the difference or the error as following.

![nn3](https://github.com/navpreetnp7/img/raw/master/nn3.png)

$ Error =  \frac{1}{2}(prediction - actual)^2 $

$ Error =  \frac{1}{2}(.2122 - 1)^2 = .3103$



##Reducing Error
Our main goal of the training is to reduce the **error** or the difference between **prediction** and **actual output**. Since **actual output** is constant, “not changing”, the only way to reduce the error is to change **prediction value**. The question now is, how to change **prediction value**?

By decomposing **prediction** into its basic elements we can find that **weights** are the variable elements affecting **prediction** value. In other words, in order to change **prediction value**, we need to change **weights** values.

![img](https://hmkcode.github.io/images/ai/bp_prediction_elements.png)

> The question now is **how to change/update the weights value so that the error is reduced**?
The answer is **Backpropagation**!


## Backpropogation

**Backpropagation**, short for “backward propagation of errors”, is a mechanism used to update the **weights** using gradient descent. It calculates the gradient of the error function with respect to the neural network’s weights. The calculation proceeds backwards through the network.

  > **Gradient descent** is an iterative optimization algorithm for finding the minimum of a function; in our case we want to minimize th error function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

![img](https://hmkcode.github.io/images/ai/bp_update_formula.png)

For example, to update `w6`, we take the current `w6` and subtract the partial derivative of **error** function with respect to `w6`. Optionally, we multiply the derivative of the **error** function by a selected number to make sure that the new updated **weight** is minimizing the error function; this number is called **learning rate**.

![img](https://hmkcode.github.io/images/ai/bp_w6_update.png)

The derivation of the error function is evaluated by applying the chain rule as following

![img1](https://hmkcode.github.io/images/ai/bp_error_function_partial_derivative_w6.png)

So to update `w6` we can apply the following formula

![img2](https://hmkcode.github.io/images/ai/bp_w6_update_closed_form.png)

Similarly, we can derive the update formula for `w5` and any other weights existing between the output and the hidden layer.

![img3](https://hmkcode.github.io/images/ai/bp_w5_update_closed_form.png)

However, when moving backward to update `w1`, `w2`, `w3` and `w4` existing between input and hidden layer, the partial derivative for the error function with respect to `w1`, for example, will be as following.

![img4](https://hmkcode.github.io/images/ai/bp_error_function_partial_derivative_w1.png)

We can find the update formula for the remaining weights `w2`, `w3` and `w4` in the same way.

In summary, the update formulas for all weights will be as following:

![img5](https://hmkcode.github.io/images/ai/bp_update_all_weights.png)

We can rewrite the update formulas in matrices as following

![img6](https://hmkcode.github.io/images/ai/bp_update_all_weights_matrix.png)





## Backward Pass

Using derived formulas we can find the new **weights**.

 > **Learning rate**: is a hyperparameter which means that we need to manually guess its value.
    
 $\Delta = 0.2122 - 1 = -0.7878$ ------>  **Delta** $ = prediction - actual $
 
 $ a = 0.05$ ----> **Learning rate**, we smartly guess this number
 
 $\begin{bmatrix}w_5\\w_6 \end{bmatrix} = \begin{bmatrix}.11 \\.13\end{bmatrix} - 0.05(-0.7878) \begin{bmatrix}1.09 \\ .71\end{bmatrix} = \begin{bmatrix}.11\\.13\end{bmatrix} - \begin{bmatrix}-0.0428\\-0.0279\end{bmatrix} = \begin{bmatrix}.1528\\.1579\end{bmatrix}$
 
 
 $\begin{bmatrix}w_1 & w_3\\w_2 & w_4 \end{bmatrix} = \begin{bmatrix}.17 & .22\\.25 & .09\end{bmatrix} - 0.05(-0.7878) \begin{bmatrix}2 \\ 3\end{bmatrix}.\begin{bmatrix}.11 & .13\end{bmatrix} = \begin{bmatrix}.17 & .22\\.25 & .09\end{bmatrix} - \begin{bmatrix}-0.0086 & -0.0102\\-0.0129 & -0.0153\end{bmatrix} = \begin{bmatrix}.1786 & .2302\\.2629 & .1053\end{bmatrix}$
 
    
Now, using the new **weights** we will repeat the forward pass

![img](https://github.com/navpreetnp7/img/raw/master/nn5.png)

$ \begin{bmatrix}2\\3\end{bmatrix}. \begin{bmatrix}.1786 & .2302\\.2629 & .1053\end{bmatrix} = \begin{bmatrix}1.1463&0.7765\end{bmatrix} . \begin{bmatrix}.1528\\.1579\end{bmatrix} = \begin{bmatrix}0.2979\end{bmatrix} $

We can notice that the **prediction** $0.2979$ is a little bit closer to **actual output** than the previously predicted one $0.2122$. We can repeat the same process of backward and forward pass until **error** is close or equal to zero.





# ASSIGNMENT 2b

In [0]:
import numpy as np

i1 = 2
i2 = 3
w1 = 0.17
w2 = 0.25
w3 = 0.22
w4 = 0.09
w5 = 0.11
w6 = 0.13
out = 1



In [0]:
i = np.array([i1,i2])
I = np.array([[i1],[i2]])
W1 = np.array([[w1,w3],[w2,w4]])
W2 = np.array([[w5],[w6]])
W21 = np.array([w5,w6])
print('W1 = ',end='')
print(W1)
print('W2 = ',end='')
print(W2)

W1 = [[0.17 0.22]
 [0.25 0.09]]
W2 = [[0.11]
 [0.13]]


In [0]:
h = np.dot(i,W1)
print('Hidden layer h = ',end='')
print(h)

Hidden layer h = [1.09 0.71]


In [0]:
z1 = np.dot(h,W2)
print('Prediction = ',end='')
print(z1)

Prediction = [0.2122]


In [0]:
error = 1/2*((z1-out)**2)
print('Error = ',end='')
print(error)

Error = [0.31031442]


In [0]:
delta = z1-out
a = 0.05 #learning rate
print('Delta = ',end='')
H = np.array([[h[0]],[h[1]]])
print(delta)
print('Learning rate = ',end='')
print(a)

Delta = [-0.7878]
Learning rate = 0.05


In [0]:
W22 = W2 - a*delta*H
print('Gradient descent for weights w5 and w6 = ',end='')
print(-1*a*delta*H)
print('New weights = ',end='')
print(W22)

Gradient descent for weights w5 and w6 = [[0.0429351]
 [0.0279669]]
New weights = [[0.1529351]
 [0.1579669]]


In [0]:
W11 = W1 - a*delta*I*W21
print('Gradient descent for weights w1,w2,w3 and w4 = ',end='')
print(-1*a*delta*I*W21)
print('New weights = ',end='')
print(W11)

Gradient descent for weights w1,w2,w3 and w4 = [[0.0086658 0.0102414]
 [0.0129987 0.0153621]]
New weights = [[0.1786658 0.2302414]
 [0.2629987 0.1053621]]


In [0]:
z2 = (i.dot(W11)).dot(W22)
print('New predicted output = ',end='')
print(z2)

New predicted output = [0.29798595]
