# **Linear regression using PyTorch built-ins**

Linear Regression is the simplest form of Neural Network. So, let's understand Neural Network with Linear regression first.

In [3]:
import numpy as np
import torch

In [1]:
# Let's begin by importing the torch.nn package from PyTorch, which contains utility classes for building neural networks.
import torch.nn as nn

In [4]:
# Input (temp, rainfall, humidity)
inputs = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')


In [5]:
# Targets (apples, oranges)
targets = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

In [6]:
# convert inputs and targets from numpys to pytorch tensors
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)

In a real world, the dataset is usually huge. It has millions or records. In such cases training the model with the entire dataset at once is troublesome. It takes time and resources. 

So, let' learn how we can train a model in batches. 

**TensorDataset** - It allows access to rows from inputs and targets as tuples. It also provides standard APIs for working with many different types of datasets in PyTorch.

In [7]:
# Import TensorDataset
from torch.utils.data import TensorDataset

In [8]:
# Now we'll pass our datset to TensorDataset
train_data = TensorDataset(inputs, targets)
train_data [0:3] #now we can access rows as tuple. This gives us first three rows of inputs and first 3 rows of targets.

(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]), tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))

**DataLoader** - It can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

In [27]:
# Import DataLoader
from torch.utils.data import DataLoader

In [28]:
# Define data loader
batch_size = 5
dl = DataLoader(train_data, batch_size, shuffle=True) #when Shuffle=true DataLoader shuffles the data before creating batches

# nn.Linear

Instead of initializing the weights & biases manually, we can define the model using the nn.Linear class from PyTorch, which does it automatically.

In [10]:
# Define model
model = nn.Linear(3, 2) #3 is the the number of input variables, and 2 are the target variables. This does everything automatically.

Parameter containing:
tensor([[ 0.5189,  0.5301, -0.3422],
        [ 0.3069,  0.2692, -0.3315]], requires_grad=True)
Parameter containing:
tensor([0.4331, 0.5253], requires_grad=True)


In [11]:
# For our linear regression model, we have one weight matrix and one bias matrix. To get all the weights and bias matrices present in the model.
list(model.parameters())

# Individually
#print(model.weight)
#print(model.bias)

[Parameter containing:
 tensor([[ 0.5189,  0.5301, -0.3422],
         [ 0.3069,  0.2692, -0.3315]], requires_grad=True),
 Parameter containing:
 tensor([0.4331, 0.5253], requires_grad=True)]

In [15]:
w = model.weight
b = model.bias

In [16]:
# Next, let's generate predictions
pred = inputs @ w.t() + b
pred

tensor([[59.1215, 26.7143],
        [72.4096, 30.9310],
        [96.7735, 44.0774],
        [63.5006, 31.1427],
        [63.1808, 24.3433],
        [59.1103, 26.7520],
        [71.5372, 30.3302],
        [96.9502, 44.0528],
        [63.5118, 31.1050],
        [62.3197, 23.7048],
        [58.2492, 26.1136],
        [72.3984, 30.9687],
        [97.6458, 44.6782],
        [64.3617, 31.7812],
        [63.1920, 24.3056]], grad_fn=<AddBackward0>)

In [17]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

# **Loss Function**
Instead of defining a loss function manually, we can use the built-in loss functions.

In [18]:
# Import nn.functional. This package contains many useful loss functions.
import torch.nn.functional as F

In [21]:
# Define loss function
lf = F.mse_loss

In [22]:
# Compute the loss for the current predictions of our model.
mse = lf(pred, targets)
print(mse)

tensor(2766.3408, grad_fn=<MseLossBackward>)


# **Optimizer**
Instead of performing Gradient Descent manually, we can use the optimizer optim.SGD. SGD is short for "stochastic gradient descent". The term stochastic indicates that samples are selected in random batches instead of as a single group.

In [23]:
# Define optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

The optim.SGD does the same as

with torch.no_grad():

    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()

Note that model.parameters() is passed as an argument to optim.SGD so that the optimizer knows which matrices should be modified during the update step. Also, we can specify a learning rate that controls the amount by which the parameters are modified.

# **Train the model**
Let's train the model

1. Generate predictions
2. Calculate the loss
3. Compute gradients w.r.t the weights and biases
4. Adjust the weights by subtracting a small quantity proportional to the  
   gradient
5. Reset the gradients to zero

The only change is that we'll work batches of data instead of processing the 
entire training data in every iteration. Let's define a utility function fit 
that trains the model for a given number of epochs.

In [33]:
# we will be using the function everytime from now instead of writing it everytime

# Utility function to train the model
def fit(num_epochs, model, lf, opt, dl):
    
    # Repeat for given number of epochs
    for epoch in range(num_epochs):
        
        # Train with batches of data
        for xb,yb in dl:
            
            # 1. Generate predictions
            pred = model(xb)
            
            # 2. Calculate loss
            loss = lf(pred, yb)
            
            # 3. Compute gradients
            loss.backward()
            
            # 4. Update parameters using gradients. opt.step is doing this w -= w.grad * 1e-5, b -= b.grad * 1e-5
            opt.step()
            
            # 5. Reset the gradients to zero. This is doing w.grad.zero_(), b.grad.zero_()
            opt.zero_grad()
        
        # Print the loss at the end of every 10th epoch
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Some things to note above:

We use the data loader defined earlier to get batches of data for every iteration.

Instead of updating parameters (weights and biases) manually, we use opt.step to perform the update and opt.zero_grad to reset the gradients to zero.

We've also added a log statement that prints the loss from the last batch of data for every 10th epoch to track training progress. loss.item returns the actual value stored in the loss tensor.

Let's train the model for 100 epochs.

In [29]:
# we can see the loss after every 10th epoch
fit(100, model, lf, opt, dl)

Epoch [10/100], Loss: 848.6642
Epoch [20/100], Loss: 366.5586
Epoch [30/100], Loss: 447.7979
Epoch [40/100], Loss: 204.1231
Epoch [50/100], Loss: 172.7939
Epoch [60/100], Loss: 132.5098
Epoch [70/100], Loss: 123.9473
Epoch [80/100], Loss: 55.9234
Epoch [90/100], Loss: 34.9637
Epoch [100/100], Loss: 91.8006


In [30]:
# Generate predictions
pred = inputs @ w.t() + b
pred

tensor([[ 58.6706,  72.0140],
        [ 77.5597,  95.8702],
        [125.3118, 140.4297],
        [ 30.0586,  46.3743],
        [ 88.7928, 105.4807],
        [ 57.5060,  70.9952],
        [ 76.5285,  95.0547],
        [125.1662, 140.5728],
        [ 31.2232,  47.3930],
        [ 88.9262, 105.6840],
        [ 57.6393,  71.1985],
        [ 76.3951,  94.8515],
        [126.3431, 141.2452],
        [ 29.9252,  46.1710],
        [ 89.9575, 106.4995]], grad_fn=<AddBackward0>)

In [31]:
targets

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

The predictions are quite close to our targets. 

In [32]:
# You can also make predictions of crop yields for new regions by passing a batch containing a single row of input.
model(torch.tensor([[75, 63, 44.]]))

tensor([[54.2849, 68.1723]], grad_fn=<AddmmBackward>)

## **Now to predict using a Neural Network this is what we can do**

In [34]:
model2 = nn.Sequential(
    nn.Linear(3,5), #this is taking 3 inputs, and giving out 5 outputs (this is the input layer)
    nn.Sigmoid(),   #this is the hidden layer (the activation function. We can also use nn.ReLu)
    nn.Linear(5,2)  #this is taking those 5 outputs as inputs and giving two target outputs)
)

In [38]:
# Define optimizer
opt2 = torch.optim.SGD(model2.parameters(), lr=1e-3) #experiement with the learning rate if the mse is high

In [39]:
# we can see the loss after every 10th epoch
fit(100, model2, lf, opt2, dl)

Epoch [10/100], Loss: 5289.6348
Epoch [20/100], Loss: 2164.5210
Epoch [30/100], Loss: 3438.3008
Epoch [40/100], Loss: 1262.5712
Epoch [50/100], Loss: 808.3759
Epoch [60/100], Loss: 2130.7808
Epoch [70/100], Loss: 2453.2122
Epoch [80/100], Loss: 1623.0161
Epoch [90/100], Loss: 1074.6339
Epoch [100/100], Loss: 1256.4371
