# **Gradient Descent and Linear Regression with PyTorch**

In this tutorial, we'll discuss one of the foundational algorithms in machine learning: Linear regression. We'll create a model that predicts crop yields for apples and oranges (target variables) by looking at the average temperature, rainfall, and humidity (input variables or features) in a region.

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

>`yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1`

>`yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2`


* The learning part of linear regression is to figure out a set of weights w11, w12,... w23, b1 & b2 using the training data, to make accurate predictions for new data. The learned weights will be used to predict the yields for apples and oranges in a new region using the average temperature, rainfall, and humidity for that region.

* We'll train our model by adjusting the weights slightly many times to make better predictions, using an optimization technique called gradient descent. Let's begin by importing Numpy and PyTorch.

**Course-Name:-** <a href='https://jovian.ai/learn/deep-learning-with-pytorch-zero-to-gans'>Deep Learning with PyTorch: Zero to GANs</a>

In [1]:
import numpy as np
import torch

# **Training Data**

In [2]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
# Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


# **Linear regression model from scratch**

The weights and biases (w11, w12,... w23, b1 & b2) can also be represented as matrices, initialized as random values. The first row of w and the first element of b are used to predict the first target variable, i.e., yield of apples, and similarly, the second for oranges.

In [5]:
# weights and biases
w  = torch.randn(2,3, requires_grad=True)
b  = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[ 0.3367,  0.3359,  1.1855],
        [-1.0399,  0.3834, -1.6653]], requires_grad=True)
tensor([1.4847, 0.3783], requires_grad=True)


In [6]:
# @ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor.
def model(input):
  return input @ w.t() + b

In [7]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[  99.5437, -121.4546],
        [ 137.5527, -167.0931],
        [ 144.5433, -135.3035],
        [  94.1348, -150.8222],
        [ 139.9446, -151.1398]], grad_fn=<AddBackward0>)


In [8]:
# Compare with targets
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


In [9]:
# MSE loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()

In [10]:
# Compute loss
loss = mse(preds, targets)
print(loss)

tensor(30108.3281, grad_fn=<DivBackward0>)


#**Compute Gradients**

With PyTorch, we can automatically compute the gradient or derivative of the loss w.r.t. to the weights and biases because they have `requires_grad` set to `True`. We'll see how this is useful in just a moment.

The gradients are stored in the **.grad** property of the respective tensors. Note that the derivative of the `loss` w.r.t. the `weights matrix` is itself a matrix with the same dimensions.

In [11]:
# Compute gradients
loss.backward()

In [12]:
# Gradients for weights
print(w)
print(w.grad)

tensor([[ 0.3367,  0.3359,  1.1855],
        [-1.0399,  0.3834, -1.6653]], requires_grad=True)
tensor([[  4090.8342,   3593.0676,   2445.6733],
        [-19902.5156, -21276.4180, -13362.2637]])


# **Adjust weights and biases to reduce the loss**

We multiply the gradients with a very small number `(10^-5 in this case)` to ensure that we don't modify the weights by a very large amount. We want to take a small step in the downhill direction of the gradient, not a giant leap. This number is called the `learning rate` of the algorithm.

We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the `weights and biases`.

In [13]:
# Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(30108.3281, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the `.zero_() `method. We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke `.backward` on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [14]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


# **Train the model using gradient descent**

In [15]:
# Generate predictions
preds = model(inputs)
print(preds)

tensor([[  99.5437, -121.4546],
        [ 137.5527, -167.0931],
        [ 144.5433, -135.3035],
        [  94.1348, -150.8222],
        [ 139.9446, -151.1398]], grad_fn=<AddBackward0>)


In [16]:
# Calculate the loss
loss = mse(preds, targets)
print(loss)

tensor(30108.3281, grad_fn=<DivBackward0>)


In [17]:
# Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[  4090.8342,   3593.0676,   2445.6733],
        [-19902.5156, -21276.4180, -13362.2637]])
tensor([  46.9438, -237.1627])


In [18]:
# Adjust weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [19]:
print(w)
print(b)

tensor([[ 0.2958,  0.2999,  1.1610],
        [-0.8409,  0.5962, -1.5317]], requires_grad=True)
tensor([1.4843, 0.3806], requires_grad=True)


In [20]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(20429.6406, grad_fn=<DivBackward0>)


# **Train model on Multiple epoch**

In [21]:
# Train for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [22]:
# Calculate loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(234.1776, grad_fn=<DivBackward0>)


In [23]:
# Predictions
preds

tensor([[ 60.6193,  71.5186],
        [ 87.7972,  87.7463],
        [100.4305, 160.4081],
        [ 40.0470,  42.2798],
        [100.8700,  93.6597]], grad_fn=<AddBackward0>)

In [24]:
# Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

# **Linear regression using PyTorch built-ins**

In [25]:
import torch.nn as nn

In [26]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [27]:
inputs

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])

# Dataset and DataLoader

* We'll create a `TensorDataset`, which allows access to rows from inputs and 
targets as tuples, and provides standard APIs for working with many different types of datasets in PyTorch.

* We'll also create a `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.


In [28]:
from torch.utils.data import TensorDataset

In [29]:
# Define dataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [30]:
from torch.utils.data import DataLoader


In [31]:
# Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [32]:
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[ 92.,  87.,  64.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 91.,  88.,  64.],
        [ 74.,  66.,  43.]])
tensor([[ 82., 100.],
        [ 21.,  38.],
        [104., 118.],
        [ 81., 101.],
        [ 57.,  69.]])


# **nn.Linear**
Instead of initializing the weights & biases manually, we can define the model using the nn.Linear class from PyTorch, which does it automatically.

In [33]:
# Define model
model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[-0.2538,  0.0064, -0.3498],
        [-0.2782,  0.0660,  0.4008]], requires_grad=True)
Parameter containing:
tensor([0.2426, 0.4197], requires_grad=True)


In [34]:
# Parameters
list(model.parameters())

[Parameter containing:
 tensor([[-0.2538,  0.0064, -0.3498],
         [-0.2782,  0.0660,  0.4008]], requires_grad=True),
 Parameter containing:
 tensor([0.2426, 0.4197], requires_grad=True)]

In [35]:
# Generate predictions
preds = model(inputs)
preds

tensor([[-32.8939,   1.7721],
        [-44.6723,   6.5688],
        [-41.2632,   8.3133],
        [-38.3094, -10.2841],
        [-41.1360,  15.6216],
        [-33.1541,   1.4280],
        [-45.0285,   6.9036],
        [-41.8667,   8.4360],
        [-38.0492,  -9.9399],
        [-41.2320,  16.3006],
        [-33.2501,   2.1069],
        [-44.9325,   6.2246],
        [-40.9070,   7.9785],
        [-38.2135, -10.9631],
        [-40.8758,  15.9658]], grad_fn=<AddmmBackward>)

# **Loss Function**
Instead of defining a loss function manually, we can use the built-in loss function `mse_loss`.

In [36]:
# Import nn.functional
import torch.nn.functional as F

In [37]:
# Define loss function
loss_fn = F.mse_loss

In [38]:
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(11564.3379, grad_fn=<MseLossBackward>)


# **Optimizer**

* Instead of manually manipulating the model's `weights & biases` using gradients, we can use the optimizer `optim.SGD`. SGD is short for "stochastic gradient descent". The term stochastic indicates that samples are selected in random batches instead of as a single group.

* **Note:-** that `model.parameters()` is passed as an argument to `optim.SGD` so that the optimizer knows which matrices should be modified during the update step. Also, we can specify a `learning rate` that controls the amount by which the parameters are modified.

In [39]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

In [40]:
opt

SGD (
Parameter Group 0
    dampening: 0
    lr: 1e-05
    momentum: 0
    nesterov: False
    weight_decay: 0
)

# **Tain the Model**

In [41]:
# Utility function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in train_dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = loss_fn(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients
            opt.step()
            
            # 5. Reset the gradients to zero
            opt.zero_grad()
        
        # Print the progress
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

In [42]:
# let fit the model
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 122.3226
Epoch [20/100], Loss: 245.3594
Epoch [30/100], Loss: 175.2832
Epoch [40/100], Loss: 66.1374
Epoch [50/100], Loss: 57.4019
Epoch [60/100], Loss: 46.5265
Epoch [70/100], Loss: 39.6761
Epoch [80/100], Loss: 54.5286
Epoch [90/100], Loss: 15.6225
Epoch [100/100], Loss: 15.1176


In [43]:
# Generate predictions
preds = model(inputs)
preds

tensor([[ 58.1562,  71.1319],
        [ 79.1154, 100.0030],
        [122.4791, 132.6697],
        [ 27.5352,  41.6490],
        [ 92.9443, 115.3914],
        [ 56.9838,  70.1359],
        [ 78.3931,  99.9706],
        [122.4921, 133.2240],
        [ 28.7076,  42.6451],
        [ 93.3944, 116.3550],
        [ 57.4339,  71.0995],
        [ 77.9430,  99.0069],
        [123.2015, 132.7021],
        [ 27.0851,  40.6853],
        [ 94.1168, 116.3874]], grad_fn=<AddmmBackward>)

In [44]:
# Compare with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

In [45]:
# single prediction value
model(torch.tensor([[75, 63, 44.]]))

tensor([[54.1351, 68.3161]], grad_fn=<AddmmBackward>)

The predicted yield of apples is 55.6 tons per hectare, and that of oranges is 68.5 tons per hectare.

# **Thank You!**