We start with ground truth function:

`f = 2 * x`

And our model:

`f = w * x`

Generally we have these steps:

1) Design model (input size, output size, forward pass) 

2) Construct loss and optimizer function

3) Training loop:
    - forward pass: compute predictions
    - backward pass: calculate gradients
    - update weights

In [76]:
import torch

# import neural net module
import torch.nn as nn

In [77]:
# X and Y needs to comform to a specific shape to use model, 
# where row count is sample count, and column count is feature count
X = torch.tensor([[1], [2], [3], [4]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8]], dtype=torch.float32)

n_sample, n_feature = X.shape
X.shape

# Dont need explicit w anymore, we will let model take care of it.
# w = torch.tensor(0.0, requires_grad=True)

torch.Size([4, 1])

In [78]:
input_size = n_feature
output_size = n_feature
model = nn.Linear(input_size, output_size)

In [79]:
# if we are to wrap a built-in linear regression model
class MyLinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(MyLinearRegression, self).__init__()
        # define layers
        self.lin = nn.Linear(input_dim, output_dim)
        
    def forward(self, x):
        return self.lin(x)
    
model = MyLinearRegression(input_size, output_size)

In [80]:
# global variables, for now
learning_rate = 0.03
n_iters = 20

In [81]:
# old, manual model prediction
# def forward(x):
#     return w * x

# loss = MSE
# def loss(y, y_hat):
#     return ((y - y_hat) ** 2).mean()

loss = nn.MSELoss() # note this is a callable function

optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

In [82]:
# create test tensor for the before training
X_test = torch.tensor([5], dtype=torch.float32)

In [83]:
# a tensor with a single value can has its value unwrapped by calling item()
f'Prediction before training: f(5) = {model(X_test).item():.3f}'

'Prediction before training: f(5) = 1.821'

In [84]:
for epoch in range(n_iters):
    # predict
    y_hat = model(X)
    
    # loss
    l = loss(Y, y_hat)
    
    # gradients, aka the backward pass, calculates dl/dw
    l.backward()
    
    # update weights, and since we don't want this update to be part of computation graph, we do the 'with no grad'
#     with torch.no_grad():
#         w -= lr * w.grad
    optimizer.step()
    
    # and since we are using auto grad, remember to zero out the grad, otherwise pytorch accumulate the grad values
#     w.grad.zero_()
    optimizer.zero_grad()
    
    # loggging
    if epoch % 2 == 0:
        [w, bias] = model.parameters()
        print(f'epoch {epoch+1}: w = {w[0][0].item():.3f}, loss = {l:.8f}')

epoch 1: w = 1.135, loss = 20.51086235
epoch 3: w = 1.693, loss = 1.29222107
epoch 5: w = 1.834, loss = 0.10007751
epoch 7: w = 1.870, loss = 0.02546586
epoch 9: w = 1.881, loss = 0.02015778
epoch 11: w = 1.885, loss = 0.01917017
epoch 13: w = 1.888, loss = 0.01847376
epoch 15: w = 1.890, loss = 0.01781785
epoch 17: w = 1.892, loss = 0.01718624
epoch 19: w = 1.894, loss = 0.01657704


In [85]:
f'Prediction after training: f(5) = {model(X_test).item():.3f}'

'Prediction after training: f(5) = 9.783'

Notice, if you compare this file with the maunal gradient descent using numpy, this version took 20 epochs to get worse performance than 10 epochs in numpy. This is (?) because the auto grad is not as accurate as numerical calucaltion (?)