### Gradient Descent with Autograd and Backprop

Here's a model built completely from scratch

In [1]:
import numpy as np

# Assume a simple linear regression model
# The equation for that can be modeled as
# y = Wx + b = 2x + 3

# Inputs
X = np.array([1, 2, 3, 4], dtype=np.float32)

# Outputs
Y = np.array([5, 7, 9, 11], dtype=np.float32)

# Weights and biases
W = 0 # Assume that we start with 0 weight initially and there is only one input dimension
b = 0 # Assume that we start with 0 bias initially

# Predict
def forward(W, X, b):
    return W*X+b

# Loss
# Assume we are going with L = MSE
def loss(Y, Yhat):
    return ((Yhat - Y)**2).mean()

# Gradient - dL
def gradient(X, Y, Yhat):
    dW = np.dot(-2*X, Y-Yhat).mean()
    db = -2*((Y-Yhat)).mean()
    return dW, db

In [2]:
# Pretraining prediction
print("Actual value: " + str(Y))
print("Prediction before training: " + str(forward(W, X, b)))

Actual value: [ 5.  7.  9. 11.]
Prediction before training: [0. 0. 0. 0.]


In [3]:
# Training
epochs = 100 # Converges for sure at 100000 iterations at alpha = 0.01; could experiment a little more
alpha = 0.01

for epoch in range(epochs):
    print("Epoch " + str(epoch+1))
    Yhat = forward(W, X, b)
    print("Prediction: " + str(Yhat))
    prediction_loss = loss(Y, Yhat)
    print("Loss: " + str(prediction_loss))
    dW, db = gradient(X, Y, Yhat)
    W -= alpha * dW
    b -= alpha * db

Epoch 1
Prediction: [0. 0. 0. 0.]
Loss: 69.0
Epoch 2
Prediction: [1.9599999 3.76      5.5599995 7.3599997]
Loss: 11.205602
Epoch 3
Prediction: [ 2.7148     5.2028     7.6907997 10.1788   ]
Loss: 2.710111
Epoch 4
Prediction: [ 3.007704  5.757544  8.507384 11.257224]
Loss: 1.4554435
Epoch 5
Prediction: [ 3.1235778  5.971941   8.820305  11.668668 ]
Loss: 1.2643182
Epoch 6
Prediction: [ 3.1715946  6.055897   8.9402    11.824502 ]
Loss: 1.2294439
Epoch 7
Prediction: [ 3.1935937  6.089856   8.986118  11.8823805]
Loss: 1.2175634
Epoch 8
Prediction: [ 3.2056103  6.104649   9.003689  11.902727 ]
Loss: 1.2091044
Epoch 9
Prediction: [ 3.2137892  6.1120906  9.010391  11.908692 ]
Loss: 1.2011905
Epoch 10
Prediction: [ 3.2204862  6.116709   9.012932  11.909155 ]
Loss: 1.1934005
Epoch 11
Prediction: [ 3.2266035  6.1202397  9.013876  11.907513 ]
Loss: 1.1856713
Epoch 12
Prediction: [ 3.232487  6.123348  9.01421  11.905071]
Loss: 1.1779941
Epoch 13
Prediction: [ 3.2382696  6.126289   9.014308  11.90232

Obviously this is far too much work for us. We are going to incorporate a great degree of library functions to make our lives easier.

In [4]:
import torch

# Assume the same linear regression model
# y = Wx + b = 2x + 3

# Inputs
X = torch.tensor([1, 2, 3, 4], dtype=torch.float32)

# Outputs
Y = torch.tensor([5, 7, 9, 11], dtype=torch.float32)

# Weights and biases
# We need to calculate the gradient of the loss function with respect
# to the weights and biases. So we need to add the requires_grad=True
# flag to the tensors
W = torch.tensor(0, dtype=torch.float32, requires_grad=True) # Assume that we start with 0 weight initially and there is only one input dimension
b = torch.tensor(0, dtype=torch.float32, requires_grad=True) # Assume that we start with 0 bias initially

# Predict
def forward(W, X, b):
    return W*X+b

# Loss
# Assume we are going with L = MSE
def loss(Y, Yhat):
    return ((Yhat - Y)**2).mean()

# We do not need this
# This will be handled by the autograd library in PyTorch
# # Gradient - dL
# def gradient(X, Y, Yhat):
#     dW = np.dot(-2*X, Y-Yhat).mean()
#     db = -2*((Y-Yhat)).mean()
#     return dW, db

In [5]:
# Training
epochs = 100 # Converges at alpha = 0.1
alpha = 0.1

for epoch in range(epochs):
    print("Epoch " + str(epoch+1))
    # Forward pass
    Yhat = forward(W, X, b)
    print("Prediction: " + str(Yhat))
    # Loss prediction
    prediction_loss = loss(Y, Yhat)
    print("Loss: " + str(prediction_loss))
    # Backward pass
    # dW, db = gradient(X, Y, Yhat)
    prediction_loss.backward()
    # Update weights and biases
    with torch.no_grad():
        W -= alpha * W.grad
        b -= alpha * b.grad
    # Zero out the gradients
    W.grad.zero_()
    b.grad.zero_()

Epoch 1
Prediction: tensor([0., 0., 0., 0.], grad_fn=<AddBackward0>)
Loss: tensor(69., grad_fn=<MeanBackward0>)
Epoch 2
Prediction: tensor([ 6.1000, 10.6000, 15.1000, 19.6000], grad_fn=<AddBackward0>)
Loss: tensor(31.3350, grad_fn=<MeanBackward0>)
Epoch 3
Prediction: tensor([2.0800, 3.5300, 4.9800, 6.4300], grad_fn=<AddBackward0>)
Loss: tensor(14.4032, grad_fn=<MeanBackward0>)
Epoch 4
Prediction: tensor([ 4.8390,  8.2990, 11.7590, 15.2190], grad_fn=<AddBackward0>)
Loss: tensor(6.7813, grad_fn=<MeanBackward0>)
Epoch 5
Prediction: tensor([3.0537, 5.1342, 7.2147, 9.2952], grad_fn=<AddBackward0>)
Loss: tensor(3.3407, grad_fn=<MeanBackward0>)
Epoch 6
Prediction: tensor([ 4.3115,  7.2846, 10.2578, 13.2309], grad_fn=<AddBackward0>)
Loss: tensor(1.7785, grad_fn=<MeanBackward0>)
Epoch 7
Prediction: tensor([ 3.5283,  5.8726,  8.2169, 10.5612], grad_fn=<AddBackward0>)
Loss: tensor(1.0607, grad_fn=<MeanBackward0>)
Epoch 8
Prediction: tensor([ 4.1110,  6.8468,  9.5826, 12.3184], grad_fn=<AddBackwar