In [32]:
#Imports

import numpy as np
import torch

We Predict the number of Apples and Oranges (in tons) produced with thw help of Temperature, Rainfall and Humidity.

# Training Data

In [33]:
#Input (Temperature, Rainfall, Humidity)
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]],
                  dtype = 'float32')

In [34]:
#Targets (Apples, Oranges)
targets = np.array([[56, 70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]],
                    dtype = 'float32')

In [35]:
#Convert inputs and targets to tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

#We are not setting `requires_grad = True` for `inputs` and `targets` as they are not changing.

print(inputs)
print(targets)

tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])


# Linear Regression Model from Scratch

- yield_apple = w11 x temp + w12 x rainfall + w13 x humidity + b1  
yield_oranges = w21 x temp + w22 x rainfall + w23 x humidity + b2

- The weights and biases (`w11, w12, .... w23, b1 and b2`) can also be represented as matrices, initializes as random values. The first row of `w` and the first element of `b` are used to predict the first target variable i.e., yield of apples and similarly the second for oranges.



In [36]:
#Weights and Biases
w = torch.randn(2, 3, requires_grad = True)
b = torch.randn(2, requires_grad = True)

print(w)
print(b)

tensor([[ 2.0590,  0.1758,  0.5604],
        [ 0.2936, -0.3160,  0.2585]], requires_grad=True)
tensor([ 0.3125, -1.0325], requires_grad=True)


`torch.randn` creates a tensor with the given shape, with elements picked randomly from a normal distribution with mean 0 and standard deviation 1.

Our model is simply a function that performs a matrix multiplication of the `inputs` and the weights `w` (transposed) and adds the bias `b` (replicated for each observation).

In [37]:
#Model
def model(x):
  return x @ w.t() + b

`@` represents matrix multiplication in PyTorch and `.t` method returns the transpose of a tensor.

In [38]:
#Generate Predictions
preds = model(inputs)
print(preds)

tensor([[186.4973,  10.3438],
        [239.0203,  14.4214],
        [235.5074,  -2.8398],
        [238.6277,  24.8903],
        [198.4906,   6.9858]], grad_fn=<AddBackward0>)


The model does not perform well as it has been initialized with random weights and biases.

# Loss Function

To evaluate how well our model is performing  we can compare the model's predictions with the actual targets using the following method:

- Calculate the difference between the two matrice (`preds` and `targets`).
- Square all the elements of the difference matrix to remove negative values.
- Calculate the average of the resulting matrix.

In [39]:
#MSE Loss
def mse(t1, t2):
  diff = t1 - t2;
  return torch.sum(diff * diff) / diff.numel()

`torch.sum` returns the sum of all elements in a tensor and `.numel` method returns the no. of elements in a tensor. 

In [40]:
loss = mse(preds, targets)
print(loss)

tensor(15382.0967, grad_fn=<DivBackward0>)


# Compute Gradients

With PyTorch we can automatically compute the gradient or derivative of the loss w.r.t. the weights and biases, because they have `requires_grad` set to `True`.

In [41]:
#Compute Gradients
loss.backward()

The gradients are stored in the `.grad` property of the respective tensors. Note that the derivative of the loss w.r.t. the weights matrix is itself a matrix, with the same dimensions.

In [42]:
#Gradients for Weights
print(w)
print(w.grad)

tensor([[ 2.0590,  0.1758,  0.5604],
        [ 0.2936, -0.3160,  0.2585]], requires_grad=True)
tensor([[12545.4365, 11348.6387,  7436.3369],
        [-6603.1597, -8218.5020, -4854.8027]])


- The loss is a quadratic function of our weights and biases, and our objective is to find the set of weights where the loss is the lowest. 
- A key insight from calculus is that the gradient indicates the rate of change of the loss or the slope of the loss function w.r.t. the weights and biases.  
  
If gradient element is **positive**:
  - **Increasing** the element's value slightly **Increase** the loss.
  - **Decreasing** the element's value slightly will **Decrease** the loss.

If the gradient element is **negative**:
  - **Increasing** the element's value slightly **Decrease** the loss.
  - **Decreasing** the element's value slightly will **Increase** the loss.

  
The increase or decrease in loss by changing a weight element is proportional to the value of the gradient of the loss w.r.t. that element. This forms the basis of the optimization algorithm that we will use to improve our model.  
  
Before we proceed, we reset the gradients to zero by calling `.zero_()` method, because PyTorch accumulates gradients i.e., the next time we call `.backward` on the loss, the new gradient values will get added to the existing values which may lead to unexpected results.

In [43]:
w.grad.zero_()
b.grad.zero_()

print(w.grad)
print(b.grad)

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])


#Adjust Weights and Biases using Gradient Descent

We will reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps:

1. Generate Predictions
2. Calculate the loss
3. Compute Gradients w.r.t. the weights and biases
4. Adjust the weights by subtracting a small quantity proportional to the gradient
5. Reset the gradients to zero

In [44]:
#1. Generate Predictions
preds = model(inputs)
print(preds)

tensor([[186.4973,  10.3438],
        [239.0203,  14.4214],
        [235.5074,  -2.8398],
        [238.6277,  24.8903],
        [198.4906,   6.9858]], grad_fn=<AddBackward0>)


In [45]:
#2. Claculate the Loss
loss = mse(preds, targets)
print(loss)

tensor(15382.0967, grad_fn=<DivBackward0>)


In [46]:
#3. Compute Gradients
loss.backward()
print(w.grad)
print(b.grad)

tensor([[12545.4365, 11348.6387,  7436.3369],
        [-6603.1597, -8218.5020, -4854.8027]])
tensor([143.4287, -81.2397])


In [47]:
#4-5. Adjust Weights and Reset Gradients
with torch.no_grad():
  w -= w.grad * 1e-5
  b -= b.grad * 1e-5
  w.grad.zero_()
  b.grad.zero_()

- We use `torch.no_grad()` to indicate to PyTorch that we shouldn't track, calculate or modify gradients while updating the weights and biases.
- We multiply the gradients with a really small number (10^-5 in this case), to ensure that we don't modify the weights by a really large amount, since we only want to take a small step in the downhill direction of the gradient. The number is called the *learning rate* of the algorithm.
- After we have updated the weights, we reset the gradients back to zero, to avoid affecting future computation.

In [48]:
#Updated weights and biases
print(w)
print(b)

tensor([[ 1.9336,  0.0623,  0.4861],
        [ 0.3596, -0.2338,  0.3071]], requires_grad=True)
tensor([ 0.3111, -1.0317], requires_grad=True)


In [49]:
#Calculate Loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(11043.9824, grad_fn=<DivBackward0>)


The loss is lesser than the previous iteration.

#Train for Multiple Epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an epoch. Let's train the model for 100 epochs.

In [50]:
#Train for 100 epochs
for i in range(100):
  preds = model(inputs)
  loss = mse(preds, targets)
  loss.backward()

  with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

In [51]:
#Calculate Loss
preds = model(inputs)
loss = mse(preds, targets)
print(loss)

tensor(606.7211, grad_fn=<DivBackward0>)


Loss is significatly lower after 100 epochs.

In [52]:
#Predictions
preds

tensor([[ 67.0648,  75.4338],
        [ 87.0209, 102.6451],
        [ 91.8301, 120.2249],
        [ 78.1402,  67.6532],
        [ 77.0570, 104.4911]], grad_fn=<AddBackward0>)

In [53]:
#Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

#Linear Regrassion using PyTorch built-ins 

`torch.nn` package of PyTorch contains utility classes for building neural networks.

In [54]:
import torch.nn as nn

In [55]:
#Input (Temperature, Rainfall, Humidity)
inputs = np.array([[73, 67, 43], [91, 88, 64], [87, 134, 58],
                   [102, 43, 37], [69, 96, 70], [73, 67, 43],
                   [91, 88, 64], [87, 134, 58], [102, 43, 37],
                   [69, 96, 70], [73, 67, 43], [91, 88, 64],
                   [87, 134, 58], [102, 43, 37], [69, 96, 70]],
                  dtype = 'float32')

#Targets (Apples, Oranges)
targets = np.array([[56, 70], [81, 101], [119, 133],
                    [22, 37], [103, 119], [56, 70],
                    [81, 101], [119, 133], [22, 37],
                    [103, 119], [56, 70], [81, 101],
                    [119, 133], [22, 37], [103, 119]],
                   dtype = 'float32')

#Convert to Tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

#Dataset and DataLoader

We will create a `TensorDataset`, which allows access to rows from `inputs` and `targets` as tuples, and provide standard APIs for working with many different types of datasets in PyTorch

In [56]:
from torch.utils.data import TensorDataset

In [57]:
#Define Daataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

The `TensorDataset` allows us to access a small section of the training data using the array indexing notation (`[0:3]`). It returns a tuple in which first element contains the input variables for the selected rows and the second contains the targets.  

We will also create a `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data. 

In [58]:
from torch.utils.data import DataLoader

In [59]:
#Define DataLoader 
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle = True)

The data loader is typically used in a `for-in` loop. Example:

In [60]:
for xb, yb in train_dl:
  print('batch:')
  print(xb) #Inputs
  print(yb) #Targets

batch:
tensor([[ 69.,  96.,  70.],
        [ 87., 134.,  58.],
        [ 91.,  88.,  64.],
        [ 91.,  88.,  64.],
        [ 69.,  96.,  70.]])
tensor([[103., 119.],
        [119., 133.],
        [ 81., 101.],
        [ 81., 101.],
        [103., 119.]])
batch:
tensor([[102.,  43.,  37.],
        [ 73.,  67.,  43.],
        [102.,  43.,  37.],
        [ 73.,  67.,  43.],
        [ 87., 134.,  58.]])
tensor([[ 22.,  37.],
        [ 56.,  70.],
        [ 22.,  37.],
        [ 56.,  70.],
        [119., 133.]])
batch:
tensor([[ 91.,  88.,  64.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 87., 134.,  58.],
        [ 73.,  67.,  43.]])
tensor([[ 81., 101.],
        [ 22.,  37.],
        [103., 119.],
        [119., 133.],
        [ 56.,  70.]])


So here we have 3 batches each of size 5.

In each iteration, the data loader returns one batch of data with the given batch size. If shuffle is set to True, it shuffles the training data before creating batches. Shuffling helps randomize the input to the optimization algorithm, leading to a faster reduction in the loss.

#nn.Linear
Instead of initializing the weights & biases manually, we can define the model using the nn.Linear class from PyTorch, which does it automatically.

In [61]:
#Define Model
model = nn.Linear(3,2) #3 Inputs (Temperature, Rainfall, Humidity), 2 Outputs (Apples, Oranges)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.4522,  0.3313, -0.5235],
        [ 0.1670,  0.2491, -0.4215]], requires_grad=True)
Parameter containing:
tensor([0.0239, 0.3159], requires_grad=True)


PyTorch models also have a helpful .parameters method, which returns a list containing all the weights and bias matrices present in the model. For our linear regression model, we have one weight matrix and one bias matrix.

In [62]:
#Parameters
list(model.parameters())

[Parameter containing:
 tensor([[ 0.4522,  0.3313, -0.5235],
         [ 0.1670,  0.2491, -0.4215]], requires_grad=True),
 Parameter containing:
 tensor([0.0239, 0.3159], requires_grad=True)]

We can use the model to generate predictions in the same way as before.

In [63]:
#Generate Predictions
preds = model(inputs)
preds

tensor([[32.7248, 11.0714],
        [36.8293, 10.4569],
        [53.4023, 23.7786],
        [41.0285, 12.4627],
        [26.3901,  6.2480],
        [32.7248, 11.0714],
        [36.8293, 10.4569],
        [53.4023, 23.7786],
        [41.0285, 12.4627],
        [26.3901,  6.2480],
        [32.7248, 11.0714],
        [36.8293, 10.4569],
        [53.4023, 23.7786],
        [41.0285, 12.4627],
        [26.3901,  6.2480]], grad_fn=<AddmmBackward>)

#Loss Function
Instead of defining a loss function manually, we can use the built-in loss function mse_loss

In [64]:
#Import nn.functional
import torch.nn.functional as F

The nn.functional package contains many useful loss functions and several other utilities.

In [65]:
#Define Loss Function
loss_fn = F.mse_loss

Let's compute the loss for the current predictions of our model.

In [66]:
loss = loss_fn(model(inputs), targets)
loss

tensor(4994.2041, grad_fn=<MseLossBackward>)

#Optimizer
Instead of manually manipulating the model's weights & biases using gradients, we can use the optimizer optim.SGD. SGD is short for "stochastic gradient descent". The term stochastic indicates that samples are selected in random batches instead of as a single group.

In [67]:
#Define Optimizer
opt = torch.optim.SGD(model.parameters(), lr = 1e-5)

Note that model.parameters() is passed as an argument to optim.SGD so that the optimizer knows which matrices should be modified during the update step. Also, we can specify a learning rate that controls the amount by which the parameters are modified.

#Train the model
We are now ready to train the model. We'll follow the same process to implement gradient descent:

1. Generate predictions
2. Calculate the loss
3. Compute gradients w.r.t the weights and biases
4. Adjust the weights by subtracting a small quantity proportional to the gradient
5. Reset the gradients to zero

The only change is that we'll work batches of data instead of processing the entire training data in every iteration. Let's define a utility function fit that trains the model for a given number of epochs.

In [68]:
#Utility Function to Train the Model
def fit(num_epochs, model, loss_fn, opt, train_dl):

  #Repeat for given no. of epochs
  for epoch in range(num_epochs):

    #Train with batches of data
    for xb, yb in train_dl:

      #1. Generate Predictions
      pred = model(xb)

      #2. Calculate the loss
      loss = loss_fn(pred, yb)

      #3. Compute gradients
      loss.backward()

      #4. Update parameters using gradients
      opt.step()

      #5. Reset gradients to zero
      opt.zero_grad()

    #Print the Progress
    if(epoch+1) % 10 == 0:
      print('Epoch [{}/{}, Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Some things to note above:

- We use the data loader defined earlier to get batches of data for every iteration.

- Instead of updating parameters (weights and biases) manually, we use opt.step to perform the update and opt.zero_grad to reset the gradients to zero.

- We've also added a log statement that prints the loss from the last batch of data for every 10th epoch to track training progress. loss.item returns the actual value stored in the loss tensor.

Let's train the model for 1000 epochs.

In [69]:
fit(1000, model, loss_fn, opt, train_dl)

Epoch [10/1000, Loss: 711.9346
Epoch [20/1000, Loss: 366.9831
Epoch [30/1000, Loss: 282.7798
Epoch [40/1000, Loss: 124.9011
Epoch [50/1000, Loss: 172.0351
Epoch [60/1000, Loss: 80.3598
Epoch [70/1000, Loss: 159.7980
Epoch [80/1000, Loss: 79.4750
Epoch [90/1000, Loss: 69.2567
Epoch [100/1000, Loss: 90.5530
Epoch [110/1000, Loss: 67.7038
Epoch [120/1000, Loss: 34.0371
Epoch [130/1000, Loss: 42.0527
Epoch [140/1000, Loss: 53.5583
Epoch [150/1000, Loss: 33.8320
Epoch [160/1000, Loss: 74.1195
Epoch [170/1000, Loss: 45.1720
Epoch [180/1000, Loss: 66.5091
Epoch [190/1000, Loss: 26.2031
Epoch [200/1000, Loss: 39.5761
Epoch [210/1000, Loss: 35.6062
Epoch [220/1000, Loss: 60.1364
Epoch [230/1000, Loss: 34.5742
Epoch [240/1000, Loss: 30.0067
Epoch [250/1000, Loss: 36.6481
Epoch [260/1000, Loss: 27.2978
Epoch [270/1000, Loss: 17.4434
Epoch [280/1000, Loss: 23.6493
Epoch [290/1000, Loss: 22.6331
Epoch [300/1000, Loss: 22.6886
Epoch [310/1000, Loss: 28.5024
Epoch [320/1000, Loss: 29.2135
Epoch [330/

Let's generate predictions using our model and verify that they're close to our targets.



In [70]:
#Generate Predictions
preds = model(inputs)
preds

tensor([[ 57.2266,  70.3966],
        [ 81.7737, 100.1684],
        [119.6103, 133.8393],
        [ 21.3431,  37.2527],
        [101.0167, 118.1981],
        [ 57.2266,  70.3966],
        [ 81.7737, 100.1684],
        [119.6103, 133.8393],
        [ 21.3431,  37.2527],
        [101.0167, 118.1981],
        [ 57.2266,  70.3966],
        [ 81.7737, 100.1684],
        [119.6103, 133.8393],
        [ 21.3431,  37.2527],
        [101.0167, 118.1981]], grad_fn=<AddmmBackward>)

In [71]:
#Compare with Targets
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

Indeed, the predictions are quite close to our targets. We have a trained a reasonably good model to predict crop yields for apples and oranges by looking at the average temperature, rainfall, and humidity in a region. We can use it to make predictions of crop yields for new regions by passing a batch containing a single row of input.

In [72]:
#General Prediction
model(torch.tensor([[75, 63, 44.]]))

tensor([[53.5990, 67.3947]], grad_fn=<AddmmBackward>)