Let's do linear regression again with pytorch and see how autograd helps us.

In [23]:
import torch

In [30]:
X = torch.tensor([1,2,3,4], dtype=torch.float32)
Y = torch.tensor([2,4,6,8], dtype=torch.float32)

# Remember to set requires_grad to True, since we want to use gradient descent with respect to this variable.
w = torch.tensor(0.0, requires_grad=True)

In [31]:
# Model prediction
def forward(x):
    return w * x

# loss = MSE
def loss(y, y_hat):
    return ((y - y_hat) ** 2).mean()

In [32]:
# global variables, for now
lr = 0.01
n_iters = 20

In [33]:
f'Prediction before training: f(5) = {forward(5):.3f}'

'Prediction before training: f(5) = 0.000'

In [34]:
for epoch in range(n_iters):
    # predict
    y_hat = forward(X)
    
    # loss
    l = loss(Y, y_hat)
    
    # gradients, aka the backward pass, calculates dl/dw
    l.backward()
    
    # update weights, and since we don't want this update to be part of computation graph, we do the 'with no grad'
    with torch.no_grad():
        w -= lr * w.grad
    
    # and since we are using auto grad, remember to zero out the grad, otherwise pytorch accumulate the grad values
    w.grad.zero_()
    
    # loggging
    if epoch % 2 == 0:
        print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}')

epoch 1: w = 0.300, loss = 30.00000000
epoch 3: w = 0.772, loss = 15.66018772
epoch 5: w = 1.113, loss = 8.17471695
epoch 7: w = 1.359, loss = 4.26725292
epoch 9: w = 1.537, loss = 2.22753215
epoch 11: w = 1.665, loss = 1.16278565
epoch 13: w = 1.758, loss = 0.60698116
epoch 15: w = 1.825, loss = 0.31684780
epoch 17: w = 1.874, loss = 0.16539653
epoch 19: w = 1.909, loss = 0.08633806


In [35]:
f'Prediction after training: f(5) = {forward(5):.3f}'

'Prediction after training: f(5) = 9.612'

Notice, if you compare this file with the maunal gradient descent using numpy, this version took 20 epochs to get worse performance than 10 epochs in numpy. This is (?) because the auto grad is not as accurate as numerical calucaltion (?)