In [1]:
import torch
import numpy as np


In [2]:
# To see how backpropagation works in PyTorch, let's make a toy example "neural network"

# input value
x = torch.tensor(1.0)

# actual output label/value
y = torch.tensor(2.0)

# weights
w = torch.tensor(1.0, requires_grad = True)

So our "neural network" has a single input $x$ and a single output $\hat{y}$ with a single weigth $w$
$$\hat{y} = w\cdot x$$

In [3]:
# forward pass
y_hat = w*x

# compute loss using MSE
loss = (y_hat - y)**2

loss

tensor(1., grad_fn=<PowBackward0>)

In [4]:
# compute gradient of loss
loss.backward()

w.grad

tensor(-2.)

- So the gradient is -2. Let's verify this manually:
$$L = (\hat{y}-y)^2$$
$$\frac{dL}{dw} = 2(\hat{y}-y)\cdot \frac{d\hat{y}}{dw}$$
Recall that $\hat{y}=w\cdot x$, and therefore
$$\frac{d\hat{y}}{dw}=x$$
So therefore:
$$\frac{dL}{dw} = 2(\hat{y}-y)\cdot x$$
Now, evaluating at $x=1$, $y=2$, and $w=1$, gives us:
$$\frac{dL}{dw} = 2(1-2)(1)=-2$$
which is exactly what PyTorch said!

---

# An Example of Auto Differentiation

Let's start by hard coding an example without using PyTorch.

### Manual

In [32]:
X = np.array([i*0.3 for i in range(0,20)], dtype=np.float32)
y = np.array([2.2*i for i in range(0,20)], dtype=np.float32)

In [41]:
w = 0.0

# feed forward
def forward_pass(x):
    return (w*x)

# define loss function; we use MSE
def loss(y, y_preds):
    return ((y_preds-y)**2).mean()

# gradient
def gradient(x, y, y_preds):
    return np.dot(2*x, y_preds-y).mean()

# training
learning_rate = 0.001
epochs = 20

for epoch in range(epochs):
    # forward pass
    y_preds = forward_pass(X)
    # compute loss
    L = loss(y,y_preds)
    # compute gradient
    grad = gradient(X,y, y_preds)
    # descend
    w = w - learning_rate*grad
    
    print(f'epoch {epoch+1}: weight = {w:.3f}     ,     loss = {L:.4f}')
    
predictions = forward_pass(X)
    
print(f'Prediction: y = {predictions}')

epoch 1: weight = 3.260     ,     loss = 597.7401
epoch 2: weight = 5.071     ,     loss = 184.3844
epoch 3: weight = 6.077     ,     loss = 56.8769
epoch 4: weight = 6.636     ,     loss = 17.5448
epoch 5: weight = 6.946     ,     loss = 5.4120
epoch 6: weight = 7.118     ,     loss = 1.6694
epoch 7: weight = 7.214     ,     loss = 0.5150
epoch 8: weight = 7.267     ,     loss = 0.1589
epoch 9: weight = 7.296     ,     loss = 0.0490
epoch 10: weight = 7.313     ,     loss = 0.0151
epoch 11: weight = 7.322     ,     loss = 0.0047
epoch 12: weight = 7.327     ,     loss = 0.0014
epoch 13: weight = 7.330     ,     loss = 0.0004
epoch 14: weight = 7.331     ,     loss = 0.0001
epoch 15: weight = 7.332     ,     loss = 0.0000
epoch 16: weight = 7.333     ,     loss = 0.0000
epoch 17: weight = 7.333     ,     loss = 0.0000
epoch 18: weight = 7.333     ,     loss = 0.0000
epoch 19: weight = 7.333     ,     loss = 0.0000
epoch 20: weight = 7.333     ,     loss = 0.0000
Prediction: y = [ 0.   

### PyTorch

In [44]:
X = torch.tensor([i*0.3 for i in range(0,20)], device = 'cuda', dtype=torch.float32)
y = torch.tensor([2.2*i for i in range(0,20)], device = 'cuda', dtype=torch.float32)

In [47]:
w = torch.tensor(0.0, device='cuda', dtype = torch.float32, requires_grad=True)


learning_rate = 0.01
epochs = 20

for epoch in range(epochs):
    # forward pass
    y_preds = forward_pass(X)
    # compute loss
    L = loss(y,y_preds)
    # compute gradient; notice PyTorch Does it for us!
    L.backward()
    # descend; 
    # we need be careful here: PyTorch is tracking all the computations on a graph
    # so that it can autodifferentiate. If we try to descend by re-assigning w,
    # it'll create a loop in the computation graph. So what we have to do is momentarily
    # suspend the computation tracking by PyTorch
    with torch.no_grad():
        w -= learning_rate*w.grad
    
    # reset accumulated gradient
    w.grad.zero_()
    
    print(f'epoch {epoch+1}: weight = {w:.3f}     ,     loss = {L:.4f}')
    
predictions = forward_pass(X)
    
print(f'Prediction: y = {predictions}')

epoch 1: weight = 1.630     ,     loss = 597.7400
epoch 2: weight = 2.898     ,     loss = 361.5235
epoch 3: weight = 3.884     ,     loss = 218.6557
epoch 4: weight = 4.651     ,     loss = 132.2467
epoch 5: weight = 5.247     ,     loss = 79.9851
epoch 6: weight = 5.711     ,     loss = 48.3764
epoch 7: weight = 6.072     ,     loss = 29.2589
epoch 8: weight = 6.352     ,     loss = 17.6963
epoch 9: weight = 6.570     ,     loss = 10.7030
epoch 10: weight = 6.740     ,     loss = 6.4734
epoch 11: weight = 6.872     ,     loss = 3.9152
epoch 12: weight = 6.974     ,     loss = 2.3680
epoch 13: weight = 7.054     ,     loss = 1.4322
epoch 14: weight = 7.116     ,     loss = 0.8662
epoch 15: weight = 7.164     ,     loss = 0.5239
epoch 16: weight = 7.202     ,     loss = 0.3169
epoch 17: weight = 7.231     ,     loss = 0.1916
epoch 18: weight = 7.254     ,     loss = 0.1159
epoch 19: weight = 7.272     ,     loss = 0.0701
epoch 20: weight = 7.285     ,     loss = 0.0424
Prediction: y = 

## So Where Are We At?

Without PyTorch:
1. Predictions / Feed-Forward: Manual.
2. Differentiation / Backpropagation: Manual.
3. Define Loss Function: Manual.
4. Updata Weights: Manual.

With ```tensor.backward()``` in PyTorch:
1. Predictions / Feed-Forward: Manual.
2. Differentiation / Backpropagation: PyTorch Autograd.
3. Define Loss Function: Manual.
4. Updata Weights: Manual.

Looking ahead:
1. Predictions / Feed-Forward: PyTorch Model.
2. Differentiation / Backpropagation: Manual.
3. Define Loss Function: PyTorch Loss.
4. Updata Weights: PyTorch Optimizer.