Autograd and backpropagation
==============================

Training loop using PyTorch's autograd.

- Backpropagation is computing the gradient of a function by using the chain rule
- Requirement: all functions need to be differentiable
- Autograd: pytorch automatically gives the gradient of a tensor via the `grad` attribute

In [2]:
%matplotlib inline
import numpy as np
import torch
torch.set_printoptions(edgeitems=2)

## Input and definitions

In [3]:
# y
t_c = torch.tensor([0.5, 14.0, 15.0, 28.0, 11.0, 8.0,
                    3.0, -4.0, 6.0, 13.0, 21.0])
# x
t_u = torch.tensor([35.7, 55.9, 58.2, 81.9, 56.3, 48.9,
                    33.9, 21.8, 48.4, 60.4, 68.4])
# scaling to make training converge
t_un = 0.1 * t_u

In [4]:
def model(t_u, w, b):
    return w * t_u + b

In [5]:
def loss_fn(t_p, t_c):
    squared_diffs = (t_p - t_c)**2
    return squared_diffs.mean()

Initializes parameters

In [7]:
params = torch.tensor([1.0, 0.0], requires_grad=True)

## The `grad` attribute

When `requires_grad=True`, the derivative value will be automatically populated as a `grad` attribute

In [8]:
params.grad is None

True

`params.grad` will contain the derivatives of the loss wrt the parameters. See Figure 5.10 for the mechanics of how this works.

In [7]:
loss = loss_fn(model(t_u, *params), t_c)
loss.backward()

params.grad

tensor([4517.2969,   82.6000])

Need to zero the gradient explicitly after using it for parameter updates

In [9]:
if params.grad is not None:
    params.grad.zero_()

## Training loop

In [13]:
from tqdm import tqdm

In [14]:
def training_loop(n_epochs, learning_rate, params, t_u, t_c):
    for epoch in tqdm(range(1, n_epochs + 1)):
        # <1> can be done anywhere before loss.backward()
        if params.grad is not None:  
            params.grad.zero_()

        # model prediction
        t_p = model(t_u, *params) 
        # loss
        loss = loss_fn(t_p, t_c)
        loss.backward()

        # updates parameters using gradient descent
        with torch.no_grad():  # <2>
            params -= learning_rate * params.grad

        if epoch % 500 == 0:
            print('Epoch %d, Loss %f' % (epoch, float(loss)))
            
    return params

In [15]:
training_loop(
    n_epochs = 5000, 
    learning_rate = 1e-2, 
    params = torch.tensor([1.0, 0.0], requires_grad=True), # <1> requires_grad=True is crucial
    t_u = t_un, # <2> notice the normalized t
    t_c = t_c)

 58%|█████████████████████▌               | 2912/5000 [00:00<00:00, 15082.74it/s]

Epoch 500, Loss 7.860115
Epoch 1000, Loss 3.828538
Epoch 1500, Loss 3.092191
Epoch 2000, Loss 2.957698
Epoch 2500, Loss 2.933134
Epoch 3000, Loss 2.928648
Epoch 3500, Loss 2.927830


100%|█████████████████████████████████████| 5000/5000 [00:00<00:00, 15580.10it/s]

Epoch 4000, Loss 2.927679
Epoch 4500, Loss 2.927652
Epoch 5000, Loss 2.927647





tensor([  5.3671, -17.3012], requires_grad=True)