# 4. Back-propagation and Autograd

### Computing gradient in simple network

x ==> (Linear) ==> y

gradient of loss with respect to w 

$$ {\partial loss \over \partial w} = ? $$

manually compute gradient as follows ...

### Complicated network?

### Better way => Computation graph + Chain Rule

### Chain Rule

$$ g = g(x) \implies  f = f(g) $$

$$ {\partial f \over \partial x} = {\partial f \over \partial g}{\partial g \over \partial x} $$

### Computation Graph

$$ J(a,b,c) = 2 * (a + b * c) $$

$$ u = b*c  \implies   v = a+u  \implies  J=3*v $$
Let $a=5$, $b=3$, $c=2$, then
$$ u = 6 \implies v = 11 \implies J = 33 $$


$$ {\partial J \over \partial v} = 3 $$

$$ {\partial v \over \partial a} = 1 $$

$$ {\partial J \over \partial a} = {\partial J \over \partial v} {\partial v \over \partial a} = 3 * 1 = 3$$

When $u = 6 \implies 6.001$ <br>
then $ v = a+ u = 11 \implies 11.001$ <br>
then $ J=3*v=33 \implies 33.0003$


$$ {\partial J \over \partial v} = 3 $$

$$ {\partial v \over \partial u} = 1 $$

$$ {\partial u \over \partial b} = 2 $$

$$ {\partial J \over \partial b} = {\partial J \over \partial v} {\partial v \over \partial u}{\partial u \over \partial b} = 3 * 1 * 2 = 6$$

When $b = 3 \implies 3.001$ <br>
then $u = b*c = 6 \implies 6.002$ <br>
then $ v = a+u = 11 \implies 11.002$ <br>
then $ J=3*v=33 \implies 33.0006$


### Computation Graph for linear loss function


$$ \text{loss} = (\hat{y} - y) ^2 = (w*x -y)^2  $$

$$ {\partial \text{loss} \over \partial w} = 2x(w*x - y)  $$

$$ \hat{y} = w * x \implies s = \hat{y} -y \implies  loss = s^2 $$
When $ loss = 1 $ <br> 
$$ {\partial loss \over \partial s } = 2s = 2 $$

$$ {\partial loss \over \partial \hat{y}} = {\partial loss \over \partial s }{\partial s \over \partial \hat{y} } = 2 *-1 = -2 $$

$$ {\partial loss \over \partial w} = {\partial loss \over \partial \hat{y} }{\partial \hat{y} \over \partial w} = -2 *x = -2 $$

### Exercise 4.2

For $$ \hat{y} = w * x + b $$ 
and
$$ loss = ( \hat{y} - y ) ^2 $$
Let $ x = 1, y=2, w = 1, b=2 $

## Data and Variable

In [3]:
import torch
from torch.autograd import Variable

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]

w = Variable(torch.Tensor([1.0]), requires_grad=True)

def forward(x):
    return x * w 

def loss(x, y):
    y_pred = forward(x)
    return (y_pred-y)*(y_pred-y)

for epoch in range(40):
    for x, y in zip(x_data, y_data):
        l = loss(x, y) 
        l.backward()
        if (epoch%5 == 0):
            print("grad: ", x, y, w.grad.data[0])
        w.data = w.data - 0.01 * w.grad.data
        w.grad.data.zero_()
    if (epoch%5 == 0):
        print("process: ", epoch, l.data[0])

print("predict (after training) ", 4, forward(4).data[0])

grad:  1.0 2.0 tensor(-2.)
grad:  2.0 4.0 tensor(-7.8400)
grad:  3.0 6.0 tensor(-16.2288)
process:  0 tensor(7.3159)
grad:  1.0 2.0 tensor(-0.4417)
grad:  2.0 4.0 tensor(-1.7316)
grad:  3.0 6.0 tensor(-3.5845)
process:  5 tensor(0.3569)
grad:  1.0 2.0 tensor(-0.0976)
grad:  2.0 4.0 tensor(-0.3825)
grad:  3.0 6.0 tensor(-0.7917)
process:  10 tensor(0.0174)
grad:  1.0 2.0 tensor(-0.0215)
grad:  2.0 4.0 tensor(-0.0845)
grad:  3.0 6.0 tensor(-0.1749)
process:  15 tensor(0.0008)
grad:  1.0 2.0 tensor(-0.0048)
grad:  2.0 4.0 tensor(-0.0187)
grad:  3.0 6.0 tensor(-0.0386)
process:  20 tensor(0.0000)
grad:  1.0 2.0 tensor(-0.0011)
grad:  2.0 4.0 tensor(-0.0041)
grad:  3.0 6.0 tensor(-0.0085)
process:  25 tensor(2.0219e-06)
grad:  1.0 2.0 tensor(-0.0002)
grad:  2.0 4.0 tensor(-0.0009)
grad:  3.0 6.0 tensor(-0.0019)
process:  30 tensor(9.8744e-08)
grad:  1.0 2.0 tensor(-0.0001)
grad:  2.0 4.0 tensor(-0.0002)
grad:  3.0 6.0 tensor(-0.0004)
process:  35 tensor(4.8467e-09)
predict (after training) 

### Exercise 4.3 : implement computational graph and backprop using Numpy



### Exercise 4.4 : Compute gradients using computational graph (manually)

$$ \hat{y} = x^2 w_2 + x w_1 + b$$
$$ loss = (\hat{y} - y)^2 $$
$$ { \partial \text{loss} \over \partial w_1 } = ? $$
$$ { \partial \text{loss} \over \partial w_2 } = ? $$