**Introduction to Linear Regression**

The learning part of linear regression is to figure out a set of weights `w11, w12,... w23, b1 & b2` using the training data, to make accurate predictions for new data. 

In [1]:
#@ Loading the required libraries
import numpy as np
import torch

**Preparing the dataset**

In [2]:
# ******** Input (temp, rainfall, humidity) ********
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70]], dtype='float32')

In [3]:
# ******** Targets (apples, oranges) ********
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119]], dtype='float32')

In [4]:
#@ Converting inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


**Linear Regression model from scratch**

The weights and biases (w11, w12,... w23, b1 & b2) can also be represented as matrices, initialized as random values. 

In [5]:
#@ Weights and biases
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
print(w)
print(b)

tensor([[-0.1548, -0.7328,  0.2397],
        [ 0.5012, -0.3079, -0.8642]], requires_grad=True)
tensor([0.7990, 0.2215], requires_grad=True)


`torch.randn` creates a tensor with the given shape, with elements picked randomly from normal distribution with mean 0 and standard deviation as 1

In [6]:
#@ Defining the model
def model(x):
    return x @ w.t() + b

- `@` represents matrix multiplication
- `.t()` method returns the transpose of a tensor

In [7]:
#@ Generating predictions
preds = model(inputs)
print(preds)

tensor([[-49.2899, -20.9859],
        [-62.4307, -36.5802],
        [-96.9573, -47.5642],
        [-37.6307,   6.1243],
        [-63.4490, -55.2552]], grad_fn=<AddBackward0>)


In [8]:
#@ Comparing the model
print(targets)

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


**Loss Function**

Before we improve our model, we need a way to evaluate how well our model is performing. We can compare the model's predictions with the actual targets using the following method:

- Calculate the difference between the two matrices (preds and targets).
- Square all elements of the difference matrix to remove negative values.
- Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the mean squared error (MSE).

In [9]:
#@ Mean Squared Error Loss
def mse(t1, t2):
    diff = t1 - t2
    return torch.sum(diff*diff) / diff.numel()

- `torch.sum` returns sum of all elements of tensor
- `numel` method of a tensor returns the number of elements in tensor

In [10]:
#@ Let's compute the loss
loss = mse(preds, targets)
print(loss)

tensor(20068.5293, grad_fn=<DivBackward0>)


Here’s how we can interpret the result: *On average, each element in the prediction differs from the actual target by the square root of the loss*. The result is called the *loss* because it indicates how bad the model is at predicting the target variables. It represents information loss in the model: the lower the loss, the better the model.

**Compute Gradients**

In PyTorch, we can automatically compute the gradient or derivative of the loss w.r.t. to the weights and biases because it have `requires_grad` set to `True`.

In [11]:
#@ Computing gradients
loss.backward()

Those gradients are stored in `.grad` property of the respective tensors.

*Note:* The derivative of the loss w.r.t. the weights matrix is itself a matrix with the same dimension

**Adjusting weights and biases to reduce the loss**

If a gradient element is positive:

- increasing the weight element's value slightly will increase the loss
- decreasing the weight element's value slightly will decrease the loss

If a gradient element is negative:

- increasing the weight element's value slightly will decrease the loss
- decreasing the weight element's value slightly will increase the loss

The increase or decrease in the loss by changing a weight element is proportional to the gradient of the loss w.r.t. that element. This observation forms the basis of the gradient descent optimization algorithm that we'll use to improve our model (by descending along the gradient).

In [12]:
print(w)
print(w.grad)

tensor([[-0.1548, -0.7328,  0.2397],
        [ 0.5012, -0.3079, -0.8642]], requires_grad=True)
tensor([[-11418.7891, -13431.5635,  -8018.0635],
        [-10008.7568, -12090.9736,  -7306.1025]])


In [13]:
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5

We multiply the gradients with a very small number (`10^-5` in this case) to ensure that we don't modify the weights by a very large amount. We want to take a small step in the downhill direction of the gradient, not a giant leap. This number is called the *learning rate* of the algorithm. 

We use `torch.no_grad` to indicate to PyTorch that we shouldn't track, calculate, or modify gradients while updating the weights and biases.

In [14]:
#@ Let's verify that the loss is actually lower
loss = mse(preds, targets)
print(loss)

tensor(20068.5293, grad_fn=<DivBackward0>)


Before we proceed, we reset the gradients to zero by invoking the `.zero_()` method. We need to do this because PyTorch accumulates gradients. Otherwise, the next time we invoke `.backward` on the loss, the new gradient values are added to the existing gradients, which may lead to unexpected results.

In [15]:
w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


**Train the model using gradient descent**

As seen above, we reduce the loss and improve our model using the gradient descent optimization algorithm. Thus, we can _train_ the model using the following steps:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

In [16]:
#@ Generate the predictions
preds = model(inputs)
print(preds)

tensor([[-28.5059,  -2.4357],
        [-35.0869, -12.1550],
        [-64.3728, -18.4159],
        [-17.2399,  24.2368],
        [-37.0617, -31.6263]], grad_fn=<AddBackward0>)


In [17]:
#@ Calculating the loss
loss = mse(preds, targets)
print(loss)

tensor(13922.8984, grad_fn=<DivBackward0>)


In [18]:
#@ Compute gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[ -9270.5996, -11116.5469,  -6591.0264],
        [ -8090.6318, -10021.9014,  -6030.9717]])
tensor([-112.6534, -100.0792])


In [19]:
#@ Adjusting weights & reset gradients
with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()
    
print(w)
print(b)

tensor([[ 0.0521, -0.4873,  0.3858],
        [ 0.6822, -0.0868, -0.7309]], requires_grad=True)
tensor([0.8015, 0.2237], requires_grad=True)


In [20]:
#@ Training for 100 epochs
for i in range(100):
    preds = model(inputs)
    loss = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()

In [21]:
#@ Calculating loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(397.9607, grad_fn=<DivBackward0>)


In [22]:
#@ Printing predictions and comparing with targets
print(preds)
print(targets)

tensor([[ 62.3964,  77.4039],
        [ 86.8037,  96.6783],
        [ 99.8310, 130.6319],
        [ 50.8120,  77.1475],
        [ 92.7262,  88.8180]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


## Linear Regression using PyTorch built-ins

In [23]:
#@ importing the package from pytorch
import torch.nn as nn

In [24]:
# ********** Input (temp, rainfall, humidity) **********
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

# ********** Targets (apples, oranges) **********
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In [25]:
from torch.utils.data import TensorDataset

In [26]:
#@ Defining dataset
train_data = TensorDataset(inputs, targets)
train_data[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

In [27]:
#@ Defning dataloader
from torch.utils.data import DataLoader
batch_size = 5
train_dl = DataLoader(train_data, batch_size, shuffle=True)

In [28]:
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break

tensor([[ 68.,  96.,  71.],
        [ 74.,  66.,  43.],
        [ 68.,  97.,  70.],
        [101.,  44.,  37.],
        [ 73.,  66.,  44.]])
tensor([[104., 118.],
        [ 57.,  69.],
        [102., 120.],
        [ 21.,  38.],
        [ 57.,  69.]])


**`nn.Linear`**

Instead of initializing the weights & biases manually, we can define the model using `nn.Linear` class from PyTorch which does it automatically

In [29]:
#@ Defining the model
model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.5669,  0.0396,  0.2929],
        [ 0.3753,  0.2228, -0.3838]], requires_grad=True)
Parameter containing:
tensor([0.3812, 0.0802], requires_grad=True)


In [30]:
#@ Accessing parameters
list(model.parameters())

[Parameter containing:
 tensor([[ 0.5669,  0.0396,  0.2929],
         [ 0.3753,  0.2228, -0.3838]], requires_grad=True),
 Parameter containing:
 tensor([0.3812, 0.0802], requires_grad=True)]

In [31]:
#@ Generating the predictions
preds = model(inputs)
preds

tensor([[57.0069, 25.9033],
        [74.1916, 29.2782],
        [71.9875, 40.3291],
        [70.7391, 33.7430],
        [63.7942, 20.5007],
        [57.5342, 26.0558],
        [74.4449, 28.6715],
        [72.8472, 40.3206],
        [70.2118, 33.5905],
        [63.5202, 19.7416],
        [57.2602, 25.2967],
        [74.7189, 29.4307],
        [71.7342, 40.9357],
        [71.0131, 34.5021],
        [63.2669, 20.3482]], grad_fn=<AddmmBackward0>)

In [32]:
#@ Defining loss function & compute the loss for the current predictions
import torch.nn.functional as F
loss_fn = F.mse_loss
loss = loss_fn(model(inputs), targets)
print(loss)

tensor(3159.3359, grad_fn=<MseLossBackward0>)


In [33]:
#@ Defining the optimizers
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

## Train the model

We are now ready to train the model. We'll follow the same process to implement gradient descent:

1. Generate predictions

2. Calculate the loss

3. Compute gradients w.r.t the weights and biases

4. Adjust the weights by subtracting a small quantity proportional to the gradient

5. Reset the gradients to zero

The only change is that we'll work batches of data instead of processing the entire training data in every iteration. Let's define a utility function `fit` that trains the model for a given number of epochs.

In [34]:
#@ Function to train the model
def fit(num_epochs, model, loss_fn, opt, train_dl):
    for epoch in range(num_epochs):   # repeat given number of epochs
        for xb, yb in train_dl:       # train with batches of data
            pred = model(xb)          # generate predictions
            loss = loss_fn(pred, yb)  # calculate the loss
            loss.backward()           # computing gradients
            opt.step()                # update parameter using gradients
            opt.zero_grad()           # Reset the gradients to zero
            if (epoch+1) % 10 == 0:   # print the progress
                print('Epoch [{}/{}], Loss: {:4f}'.format(epoch+1, num_epochs, loss.item()))

In [35]:
#@ Training the model for 100 epochs
fit(100, model, loss_fn, opt, train_dl)

Epoch [10/100], Loss: 643.811096
Epoch [10/100], Loss: 587.447754
Epoch [10/100], Loss: 706.822205
Epoch [20/100], Loss: 465.904724
Epoch [20/100], Loss: 422.202728
Epoch [20/100], Loss: 480.014740
Epoch [30/100], Loss: 274.210907
Epoch [30/100], Loss: 518.143921
Epoch [30/100], Loss: 197.475891
Epoch [40/100], Loss: 240.571091
Epoch [40/100], Loss: 103.636192
Epoch [40/100], Loss: 366.612305
Epoch [50/100], Loss: 166.503311
Epoch [50/100], Loss: 137.644608
Epoch [50/100], Loss: 229.249512
Epoch [60/100], Loss: 225.209198
Epoch [60/100], Loss: 119.720825
Epoch [60/100], Loss: 68.374771
Epoch [70/100], Loss: 64.624939
Epoch [70/100], Loss: 139.034409
Epoch [70/100], Loss: 98.596786
Epoch [80/100], Loss: 111.324112
Epoch [80/100], Loss: 51.440758
Epoch [80/100], Loss: 75.630882
Epoch [90/100], Loss: 71.231995
Epoch [90/100], Loss: 61.041504
Epoch [90/100], Loss: 66.190109
Epoch [100/100], Loss: 51.964294
Epoch [100/100], Loss: 50.647369
Epoch [100/100], Loss: 58.968616


In [36]:
#@ Generating predictions
preds = model(inputs)
preds

tensor([[ 58.6014,  71.8464],
        [ 81.6341,  95.4797],
        [116.2578, 140.2192],
        [ 30.0436,  46.8903],
        [ 95.8716, 104.5114],
        [ 57.5807,  70.8450],
        [ 81.3277,  94.6336],
        [116.5254, 140.3525],
        [ 31.0642,  47.8918],
        [ 96.5859, 104.6667],
        [ 58.2950,  71.0003],
        [ 80.6134,  94.4783],
        [116.5642, 141.0653],
        [ 29.3293,  46.7350],
        [ 96.8923, 105.5128]], grad_fn=<AddmmBackward0>)

In [37]:
#@ Comparing with targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

In [38]:
model(torch.tensor([[75, 63, 44.]]))

tensor([[55.4564, 68.0069]], grad_fn=<AddmmBackward0>)

The predicted yield of apples is 54.3 tons per hectare, and that orange is 68.3 tons per hectare

**The End**