# Back Propapation, Compute Graph, and Autograd

This note is built on top of basic understanding of 
[gradient descent](gd-general.ipynb) and [chain rules](chain-rule.ipynb).

We start with training data pair $(x,y)$, that is, given $x$, we want to predict $\hat{y}$ and we want to train weight $w$.

$$\hat{y} = x \cdot w $$

This is driven by the loss function we define:

$$\text{loss} = (\hat{y} -y) ^ 2 = (x\cdot w - y)^2$$



## Compute Graph

A computational graph can be built based on this:

![compute graph](../figs/bp-compute-graph.png)


The **operators** involved in this compute graph are:
* multiplication
* subtraction
* squared

We need derivative for each of them so we can compute **local graidients**, which will help compute the $\dfrac{\partial{L}}{\partial{w}}$.

## Local gradient

Note: we don't need to compute local graident for $x$ and $y$ as they are not changable.


### multiplication

$$\frac{\partial{\hat{y}}}{\partial{w}}  = \frac{\partial{(xw)}}{\partial{w}} = x $$

### subtraction

$$\frac{\partial{s}}{\partial{\hat{y}}} = \frac{\partial{(\hat{y} - y)}}{\partial{\hat{y}}} = 1 $$

### squared

$$\frac{\partial{L}}{\partial{s}} = \frac{\partial{(s^2)}}{\partial{s}} = 2s $$







## Forward Propapation

Assume x = 1, w = 1, y = 2, we can do forward propapation


In [21]:

def forward(x, w, y):
    y_pred = x * w 
    print("y_pred=", y_pred)
    s =  y_pred - y 
    print("s = ", s)
    loss = (s) ** 2
    print("loss = ", loss)
    return y_pred, s, loss 
x = 1; w = 1; y = 2
y_pred, s, loss = forward(x, w, y)


y_pred= 1
s =  -1
loss =  1


## Back Propagation

In [22]:


def backprop(loss):

    # step 1
    dloss_ds = 2 * s 
    ds_dy_pred = 1 

    # step 2
    dloss_dy_pred = dloss_ds * ds_dy_pred 

    # step 3
    dy_pred_dw = x 
    dloss_dw = dloss_dy_pred * dy_pred_dw  

    return dloss_dw

dloss_dw = backprop(loss)
print(f"dloss_dw = {dloss_dw}")


dloss_dw = -2


Above steps are captured by this diagram:

![compute graph](../figs/bp-steps.png)

## Using pytorch to simplify



In [18]:
import torch 
from torch.autograd import Variable

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = Variable(torch.Tensor([1.0]), requires_grad=True)


def forward(x):
    return x * w 

def loss(x,y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)

for epoch in range(10):
    for x_val, y_val in zip(x_data, y_data):
        l = loss(x_val, y_val)
        l.backward()
        print(f"x={x_val}, y={y_val}, grad={w.grad.data[0]}")
        w.data = w.data - 0.01 * w.grad.data 

        # manually zero gradient after updating weights
        w.grad.data.zero_()
    print(f"epoch={epoch}, loss={l.data[0]}")



x=1.0, y=2.0, grad=-2.0
x=2.0, y=4.0, grad=-7.840000152587891
x=3.0, y=6.0, grad=-16.228801727294922
epoch=0, loss=7.315943717956543
x=1.0, y=2.0, grad=-1.478623867034912
x=2.0, y=4.0, grad=-5.796205520629883
x=3.0, y=6.0, grad=-11.998146057128906
epoch=1, loss=3.9987640380859375
x=1.0, y=2.0, grad=-1.0931644439697266
x=2.0, y=4.0, grad=-4.285204887390137
x=3.0, y=6.0, grad=-8.870372772216797
epoch=2, loss=2.1856532096862793
x=1.0, y=2.0, grad=-0.8081896305084229
x=2.0, y=4.0, grad=-3.1681032180786133
x=3.0, y=6.0, grad=-6.557973861694336
epoch=3, loss=1.1946394443511963
x=1.0, y=2.0, grad=-0.5975041389465332
x=2.0, y=4.0, grad=-2.3422164916992188
x=3.0, y=6.0, grad=-4.848389625549316
epoch=4, loss=0.6529689431190491
x=1.0, y=2.0, grad=-0.4417421817779541
x=2.0, y=4.0, grad=-1.7316293716430664
x=3.0, y=6.0, grad=-3.58447265625
epoch=5, loss=0.35690122842788696
x=1.0, y=2.0, grad=-0.3265852928161621
x=2.0, y=4.0, grad=-1.2802143096923828
x=3.0, y=6.0, grad=-2.650045394897461
epoch=6, lo

In [12]:
W_x.shape

torch.Size([20, 20])