# Lecture 4: Back-propagation

How do we compute the gradient in a complicated network?

Better way = Computational graph + chain rule

## Chain Rule

$$ \frac{df}{dx} =  \frac{df}{dg} \frac{dg}{dx} $$


## Backward propagation

Given function f(x,y) = z

From the function,

$$ \frac{\partial z}{\partial x} $$
$$ \frac{\partial z}{\partial y} $$ 

Can be calculated.
Therefore, after given $$ \frac{\partial L}{\partial z} $$
The following can be calculated

$$ \frac{\partial L}{\partial x} = \frac{\partial L}{\partial z} * \frac{\partial z}{\partial x} $$
$$ \frac{\partial L}{\partial y} = \frac{\partial L}{\partial z} * \frac{\partial z}{\partial y} $$

This process is called backward propagation.


## Example

$$ f: \hat{y} = x * w $$
$$ loss = (\hat{y} - y)^2 = (x*w -y)^2 $$

Let 
$$ \hat{y} - y = s $$
$$ loss = s^2 $$

Backward Propagation

$$ \frac{\partial loss}{\partial s} = \frac{\partial s^2}{\partial s} =2s $$
$$ \frac{\partial s}{\partial \hat{y}} = \frac{\partial (\hat{y}-y)}{\partial \hat{y}} =1 $$
$$ \frac{\partial \hat{y}}{\partial w}= \frac{\partial xw}{\partial w} = x$$
$$ \therefore \frac{\partial loss}{\partial w} = \frac{\partial loss}{\partial s} \frac{\partial s}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial w} = 2sx $$

For example, when x=2, y=4, w=1 :

$$ \frac{\partial loss}{\partial w} = -8 $$


## Pytorch: Variable

In [9]:
import torch
from torch.autograd import Variable

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w= Variable(torch.Tensor([1.0]), requires_grad=True) #Any Random Value

In [10]:
#our model forward pass
def forward(x):
    return x*w

#Loss function
def loss (x,y):
    y_pred = forward(x)
    return (y_pred-y) * (y_pred -y)

#Before trainging
print("Predict (before training", 4, forward(4).data[0])

#Training loop
for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        l = loss(x_val, y_val)
        l.backward()
        print("\tgrad: ", x_val, y_val, w.grad.data[0])
        w.data = w.data -0.01 * w.grad.data

        #Manually zero the gradients after updating weights
        w.grad.data.zero_()
    print("progress:",epoch, l.data[0])

#After training
print("predict (after training)",4, forward(4).data[0])


Predict (before training 4 tensor(4.)
	grad:  1.0 2.0 tensor(-2.)
	grad:  2.0 4.0 tensor(-7.8400)
	grad:  3.0 6.0 tensor(-16.2288)
progress: 0 tensor(7.3159)
	grad:  1.0 2.0 tensor(-1.4786)
	grad:  2.0 4.0 tensor(-5.7962)
	grad:  3.0 6.0 tensor(-11.9981)
progress: 1 tensor(3.9988)
	grad:  1.0 2.0 tensor(-1.0932)
	grad:  2.0 4.0 tensor(-4.2852)
	grad:  3.0 6.0 tensor(-8.8704)
progress: 2 tensor(2.1857)
	grad:  1.0 2.0 tensor(-0.8082)
	grad:  2.0 4.0 tensor(-3.1681)
	grad:  3.0 6.0 tensor(-6.5580)
progress: 3 tensor(1.1946)
	grad:  1.0 2.0 tensor(-0.5975)
	grad:  2.0 4.0 tensor(-2.3422)
	grad:  3.0 6.0 tensor(-4.8484)
progress: 4 tensor(0.6530)
	grad:  1.0 2.0 tensor(-0.4417)
	grad:  2.0 4.0 tensor(-1.7316)
	grad:  3.0 6.0 tensor(-3.5845)
progress: 5 tensor(0.3569)
	grad:  1.0 2.0 tensor(-0.3266)
	grad:  2.0 4.0 tensor(-1.2802)
	grad:  3.0 6.0 tensor(-2.6500)
progress: 6 tensor(0.1951)
	grad:  1.0 2.0 tensor(-0.2414)
	grad:  2.0 4.0 tensor(-0.9465)
	grad:  3.0 6.0 tensor(-1.9592)
progres