<a href="https://colab.research.google.com/github/jindaldisha/Deep-Learning-and-Neural-Networks/blob/main/00_4_gradient_descent_and_linear_regression_with_pytorch_using_buildins.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Linear regression using PyTorch built-ins

PyTorch provides several built-in functions and classes to make it easy to create and train models with just a few lines of code.

In [57]:
#Import Libraries
import torch
import numpy as np
import torch.nn as nn #contains utility classes for building neural networks.

##Training data

We'll represent the training data using matrics x_train and y_train.

x_train will have the values of temperature, rainfall and humidity for every region as a single row each.
y_train will have the crop yields of apples and oranges for every region as a single row each.

In [58]:
# Input Features (temp, rainfall, humidity)
x_train = np.array([[73, 67, 43], 
                   [91, 88, 64], 
                   [87, 134, 58], 
                   [102, 43, 37], 
                   [69, 96, 70], 
                   [74, 66, 43], 
                   [91, 87, 65], 
                   [88, 134, 59], 
                   [101, 44, 37], 
                   [68, 96, 71], 
                   [73, 66, 44], 
                   [92, 87, 64], 
                   [87, 135, 57], 
                   [103, 43, 36], 
                   [68, 97, 70]], 
                  dtype='float32')

In [59]:
# Input Labels (apples, oranges)
y_train = np.array([[56, 70], 
                    [81, 101], 
                    [119, 133], 
                    [22, 37], 
                    [103, 119],
                    [57, 69], 
                    [80, 102], 
                    [118, 132], 
                    [21, 38], 
                    [104, 118], 
                    [57, 69], 
                    [82, 100], 
                    [118, 134], 
                    [20, 38], 
                    [102, 120]], 
                   dtype='float32')

In [60]:
#Convert input numpy arrays to pytorch tensors
x_train = torch.from_numpy(x_train)
y_train = torch.from_numpy(y_train)

##Dataset and DataLoader

We'll create a `TensorDataset`, which allows access to rows from `x_train` and `y_train` as tuples, and provides standard APIs for working with many different types of datasets in PyTorch.


When working with large datasets, its not possible to train the entire dataset at once as it may not fit into the memory and even if it does, the entire process will be very slow.
And therefore what we do instead is take the dataset and break it into batches and train our model batch by batch.

The `TensorDataset` allows us to access a small section of the training data using the array indexing notation. It returns a tuple with two elements. The first element contains the input variables for the selected rows, and the second contains the targets.

We'll also create a `DataLoader`, which can split the data into batches of a predefined size while training. It also provides other utilities like shuffling and random sampling of the data.

We can set the `shuffle = True` in `DataLoader`. This helps randomize the input to optimization algorithm, leading to a faster reduction in loss.

In [61]:
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

In [62]:
#Define data
train_ds = TensorDataset(x_train, y_train)
#Viewing a single example
train_ds[0]

(tensor([73., 67., 43.]), tensor([56., 70.]))

In [63]:
#Define data loader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

In [64]:
#Viewing the batches - each iteration displays a batch
for xb, yb in train_dl:
    print(xb)
    print(yb)

tensor([[ 68.,  96.,  71.],
        [ 87., 134.,  58.],
        [ 91.,  87.,  65.],
        [ 91.,  88.,  64.],
        [ 73.,  66.,  44.]])
tensor([[104., 118.],
        [119., 133.],
        [ 80., 102.],
        [ 81., 101.],
        [ 57.,  69.]])
tensor([[ 73.,  67.,  43.],
        [ 69.,  96.,  70.],
        [ 88., 134.,  59.],
        [ 87., 135.,  57.],
        [ 92.,  87.,  64.]])
tensor([[ 56.,  70.],
        [103., 119.],
        [118., 132.],
        [118., 134.],
        [ 82., 100.]])
tensor([[ 68.,  97.,  70.],
        [102.,  43.,  37.],
        [101.,  44.,  37.],
        [ 74.,  66.,  43.],
        [103.,  43.,  36.]])
tensor([[102., 120.],
        [ 22.,  37.],
        [ 21.,  38.],
        [ 57.,  69.],
        [ 20.,  38.]])


##nn.Linear

Instead of initializing the weights and biases manually, we can defin the model using `nn.Linear` class from PyTorch, which does it automatically. It is a linear layer.

In [65]:
#Define model
model = nn.Linear(3,2) #(no. of inputs, no. of outputs)
print(model.weight)
print(model.bias)

Parameter containing:
tensor([[ 0.0533, -0.5322, -0.4030],
        [-0.3642,  0.2834, -0.5382]], requires_grad=True)
Parameter containing:
tensor([0.2873, 0.1964], requires_grad=True)


PyTorch also has a `.parameter` method, which returns a list containing all the weights and biases matrics present in the model. 

In [66]:
#Parameters
list(model.parameters())

[Parameter containing:
 tensor([[ 0.0533, -0.5322, -0.4030],
         [-0.3642,  0.2834, -0.5382]], requires_grad=True),
 Parameter containing:
 tensor([0.2873, 0.1964], requires_grad=True)]

In [67]:
#Generate predictions
y_pred = model(x_train)
y_pred

tensor([[-48.8023, -30.5447],
        [-67.4799, -42.4512],
        [-89.7546, -24.7297],
        [-32.0668, -44.6778],
        [-75.3278, -35.4014],
        [-48.2168, -31.1923],
        [-67.3507, -43.2728],
        [-90.1042, -25.6321],
        [-32.6522, -44.0303],
        [-75.7840, -35.5754],
        [-48.6731, -31.3663],
        [-66.8945, -43.0987],
        [-89.8838, -23.9082],
        [-31.6105, -44.5038],
        [-75.9132, -34.7539]], grad_fn=<AddmmBackward>)

##Loss Function

We can use the build-in loss function `mse_loss`.

In [68]:
# Import nn.functional
import torch.nn.functional as F

In [69]:
# Define loss function
loss_fn = F.mse_loss

In [70]:
#Calculate loss for the model
loss = loss_fn(y_pred, y_train)
print(loss)

tensor(19737.1836, grad_fn=<MseLossBackward>)


## Optimizer

Instead of manually manipulating the model's weights & biases using gradients, we can use the optimizer `optim.SGD`. SGD is short for "stochastic gradient descent". The term _stochastic_ indicates that samples are selected in random batches instead of as a single group.

`model.parameters()` is passed as an argument to `optim.SGD` so that the optimizer knows which matrices should be modified during the update step. Also, we can specify a learning rate that controls the amount by which the parameters are modified.

In [71]:
#Define Optimizer
opt = torch.optim.SGD(model.parameters(), lr=1e-5)

## Train the model

We'll follow the following process to implement gradient descent:

1. Generate predictions
2. Calculate the loss
3. Compute gradients w.r.t the weights and biases
4. Adjust the weights by subtracting a small quantity proportional to the gradient
5. Reset the gradients to zero


Note:
 - We'll work batches of data instead of processing the entire training data in every iteration. We use the data loader defined earlier to get batches of data for every iteration.

 - Instead of updating parameters (weights and biases) manually, we use `opt.step` to perform the update and `opt.zero_grad` to reset the gradients to zero.

 - We've also added a log statement that prints the loss from the last batch of data for every epoch to track training progress. `loss.item` returns the actual value stored in the loss tensor.


In [72]:
#Utility function to train (fit) the model 
def fit(num_epochs, model, loss_fn, opt, train_dl):
  #Repeat for given number of epochs
  for epoch in range(num_epochs):
    #Train with batches of data
    for xb, yb in train_dl:

      # 1. Generate predictions
      y_pred = model(xb)

      # 2. Calculate Loss
      loss = torch.nn.functional.mse_loss(y_pred, yb)

      # 3. Compute gradients
      loss.backward()

      # 4. Update parameters using gradients
      opt.step()

      # 5. Reset the gradients to zero
      opt.zero_grad()
    
    #Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

In [73]:
#Fit the Model for 100 epochs
fit(100, model, loss_fn, opt, train_dl)

Epoch [1/100], Loss: 10200.880859375
Epoch [2/100], Loss: 6400.791015625
Epoch [3/100], Loss: 1234.308349609375
Epoch [4/100], Loss: 619.7398071289062
Epoch [5/100], Loss: 540.5086669921875
Epoch [6/100], Loss: 904.9818115234375
Epoch [7/100], Loss: 1014.5422973632812
Epoch [8/100], Loss: 226.34780883789062
Epoch [9/100], Loss: 637.846435546875
Epoch [10/100], Loss: 755.33935546875
Epoch [11/100], Loss: 337.67962646484375
Epoch [12/100], Loss: 585.8490600585938
Epoch [13/100], Loss: 680.42578125
Epoch [14/100], Loss: 430.82080078125
Epoch [15/100], Loss: 418.80462646484375
Epoch [16/100], Loss: 647.3411865234375
Epoch [17/100], Loss: 37.851680755615234
Epoch [18/100], Loss: 367.4296875
Epoch [19/100], Loss: 578.204345703125
Epoch [20/100], Loss: 156.33804321289062
Epoch [21/100], Loss: 198.8542938232422
Epoch [22/100], Loss: 554.1481323242188
Epoch [23/100], Loss: 138.4562530517578
Epoch [24/100], Loss: 465.9931640625
Epoch [25/100], Loss: 490.10040283203125
Epoch [26/100], Loss: 274.4

In [76]:
#Generate predictions
y_pred = model(x_train)
y_pred

tensor([[ 58.8174,  71.4142],
        [ 80.0736,  95.1983],
        [119.5106, 142.8298],
        [ 31.1705,  43.2479],
        [ 92.4786, 106.0793],
        [ 57.7700,  70.2926],
        [ 79.4725,  94.2972],
        [119.6198, 142.9036],
        [ 32.2179,  44.3694],
        [ 92.9249, 106.2996],
        [ 58.2163,  70.5130],
        [ 79.0262,  94.0768],
        [120.1117, 143.7310],
        [ 30.7242,  43.0275],
        [ 93.5260, 107.2008]], grad_fn=<AddmmBackward>)

In [75]:
y_train

tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

The predictions are quite close to our targets. We have a trained a reasonably good model to predict crop yields for apples and oranges by looking at the average temperature, rainfall, and humidity in a region. We can use it to make predictions of crop yields for new regions by passing a batch containing a single row of input.

The approach in machine learning is very different from classical programming. Usually we write programs that take some inputs, perform some operations and return the result. 
However, here we've defined a 'mode' that assumes a specific relation between the inputs and outputs, expresses using some random parameters i.e. weights and biases. We then show the model some known inputs and outputs and train the model to come up with good values for the unknown parameters. Once trained the model can be used to compute the outputs for new inputs.Deep learning is a branch of machine learning that uses matrix operations, non-linear activation functions and gradient descent to build and train models.