# <font color = 'indianred'>**Lecture Goal**
In this lecture, we will understand PyTorch nn. Module. All the modules in Pytorch are implemented as subclass of the torch.nn.Module class. Pytorch uses these modules to perfrom operations on Tensors. We will first understand some importnat modules and then use these in implementing Linear Regression.

We will first disuss following modules

- nn.Linear()
- nn.Sequential()
- nn.MSELoss()
- torch.optim()
- torch.utils.data.DataLoader
- torch.utils.data.TensorDataset



# <font color = 'indianred'>**Install Libraries**

In [2]:
# install torchviz libraries
if 'google.colab' in str(get_ipython()):
    !pip install torchsummary -qq

# <font color = 'indianred'>**Import Libraries**

In [3]:
# Importing PyTorch Library for building neural networks
import torch

# Importing PyTorch's neural network module
import torch.nn as nn

# Importing PyTorch's data loading utility
from torch.utils import data

# Importing PyTorch's functional interface for neural network operations
import torch.nn.functional as F

# Importing PyTorch's summary module for visualizing network architectures
import torchsummary

# Importing the random library to generate random dataset
import random

# Importing the math library for mathematical operations
import math

# <font color = 'indianred'>**nn.Module**

`nn.Module` is a fundamental base class for all neural network modules in PyTorch, and it serves as a base class for defining your own neural network architectures. Subclassing `nn.Module` is crucial for creating a class that can hold your model's weights, biases, and other learnable parameters.

By subclassing `nn.Module`, you gain access to a variety of helpful attributes and methods, including `.parameters()`. The `.parameters()` method returns an iterator over the model's parameters, which can be used for updating the weights during training.

Subclassing `nn.Module` also makes it easier to work with pre-defined layers, loss functions, and other components that are provided by PyTorch. By organizing your model in this way, you can create a reusable and modular architecture that can be easily adapted to different tasks and datasets.

In [4]:
class LinearRegression(nn.Module):
    """
    A linear regression model that predicts a real-valued output based on a real-valued input.
    """

    def __init__(self, input_dim, output_dim):
        """
        Initializes the LinearRegression model.

        Args:
            input_dim (int): The dimensionality of the input feature vector.
            output_dim (int): The dimensionality of the output feature vector.
        """
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.weights = nn.Parameter(torch.randn(self.output_dim, self.input_dim) / math.sqrt(self.input_dim))
        self.biases = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        """
        Computes the forward pass of the LinearRegression model.

        Args:
            x (torch.Tensor): The input feature vector.

        Returns:
            torch.Tensor: The predicted output value.
        """
        return x @ self.weights.T + self.biases


- The `LinearRegression` class is a subclass of nn.Module in PyTorch and represents a linear regression model that predicts a real-valued output based on a real-valued input.

- The `__init__` method is the constructor for the class, which takes two arguments: `input_dim` and `output_dim`. These arguments represent the dimensions of the input and output feature vectors, respectively. The method initializes the parameters for the linear regression model, including `weights` and `biases`, which are stored as `nn.Parameter` objects. The weights are initialized using a Gaussian distribution with a mean of 0 and a standard deviation of `1/sqrt(2)` to improve the stability of the training process.

- The `forward` method computes the forward pass of the linear regression model, given an input feature vector x. The method returns the predicted output value, which is computed as the dot product of x and the model's weights, plus the model's biases. The weights and biases are learnable parameters of the model, which are optimized during the training process.

- `nn.Module` objects are used as if they are functions (i.e they are callable), but behind the scenes Pytorch will call the forward method automatically.


Together, these methods define a linear regression model that can be used for a variety of regression tasks. By subclassing nn.Module, we gain access to a variety of helpful methods and attributes that make it easier to work with PyTorch's autograd system and perform backpropagation to update the model's parameters during training.


In [5]:
x = torch.arange(6).view(3, 2).float()

# Input Dimension
input_dim = 2

# Output Dimension
output_dim = 1

# Since we're now using an object instead of just using a function, we
# first have to instantiate our model

model = LinearRegression(input_dim, output_dim)

# Get the output of linear layer after transformation
output = model(x)

print('input_tensor shape :', x.shape)
print('output_tensor shape: ', output.shape)

input_tensor shape : torch.Size([3, 2])
output_tensor shape:  torch.Size([3, 1])


In [6]:
model.weights

Parameter containing:
tensor([[0.7679, 1.4397]], requires_grad=True)

In [7]:
model.biases

Parameter containing:
tensor([0.], requires_grad=True)

In [8]:
model.parameters()

<generator object Module.parameters at 0x7a31677209e0>

In [9]:
list(model.parameters())

[Parameter containing:
 tensor([[0.7679, 1.4397]], requires_grad=True),
 Parameter containing:
 tensor([0.], requires_grad=True)]

In [10]:
model.state_dict()

OrderedDict([('weights', tensor([[0.7679, 1.4397]])),
             ('biases', tensor([0.]))])

# <font color = 'indianred'>**Linear Module (nn.Linear)**


Instead of manually defining and
initializing parameter (weights and biases), and calculating `x @ self.weights.T + self.biases`, we can use the Pytorch class `nn.Linear`for a
linear layer, which does all that for us.

This layer takes in dimensions of input and output features and applies the following transformation to the input tensor $x$

$y = x w^T + b$ ,
$w$ and $b$ are the parameters.

The syntax for Linear Module is  :
`torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)`

- in_features – size of each input sample
- out_features – size of each output sample

Shapes :

Input: $(N, *, H_{in})$ <br>

here ,  $H_{in} = in\_features$, ∗ means any number of additional dimensions and N is the batch size (number of observations). <br><br>

Output: $(N ,*,  H_{out})$,
where all but the last dimension are the same shape as the input and $H_{out} = out\_features$,


Example :
  - if input has shape(3, 2) (batch size is 3 and there are two features)
  and output = nn.Linear(in_features = 2, out_features =1)
  - then output will have the shape (3, 1) (3 observations and 1 feature).


In [11]:
class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.linear_layer = nn.Linear(input_dim, output_dim)


    def forward(self, x):
        return self.linear_layer(x)

In [12]:
x = torch.arange(6).view(3, 2).float()

# Input Dimension
input_dim = 2

# Output Dimension
output_dim = 1

# Initialize first linear layer
model = LinearRegression(input_dim, output_dim)

# Get the output of linear layer after transformation
output = model(x)

print('input_tensor shape :', x.shape)
print('output_tensor shape: ', output.shape)

input_tensor shape : torch.Size([3, 2])
output_tensor shape:  torch.Size([3, 1])


We have not specified any initial weights or bias values.  Linear module automatically initializes the weights randomly using LeCun initialization.  LeCun Initialization initializes weights using  $N(0, \frac{1}{n_{in}})$


In [13]:
# We can get all the parameters associated with model(linear layer) as follows
for name, param in  model.named_parameters():
  print(name, param)

linear_layer.weight Parameter containing:
tensor([[ 0.0361, -0.4659]], requires_grad=True)
linear_layer.bias Parameter containing:
tensor([-0.1331], requires_grad=True)


In [14]:
print('We can see that PyTorch initializes  weights  in the background\n')
print('W:', model.linear_layer.weight)
print('b:', model.linear_layer.bias)
print('Shape of W :', model.linear_layer.weight.data.shape)
print('Shape of b:', model.linear_layer.bias.data.shape)

We can see that PyTorch initializes  weights  in the background

W: Parameter containing:
tensor([[ 0.0361, -0.4659]], requires_grad=True)
b: Parameter containing:
tensor([-0.1331], requires_grad=True)
Shape of W : torch.Size([1, 2])
Shape of b: torch.Size([1])


## <font color = 'indianred'>**Summary Linear Layer:**

- When we initializes the layer (`layer = nn.Linear(input_dim, output_dim)`), Linear module takes the input and output dimensions as parameters, and automatically initializes the weights randomly.

  - PyTorch sets the attribute requires_grad = True for weights and biases.
  - Shape of weights is [out_features, in_features]
  - Shape of bias is [out_features]

- We can then apply this layer to inputs to get our output `(output = layer(input)`
  - It then uses randomly initilaized weights and biases to transform inputs.

  - Shape of input = [batch_size, in_features]
  - output = input (W.T) + b
  - shape of output = [batch_size, out_features]

<img src ="https://drive.google.com/uc?export=view&id=1ewECT6hqC1sXd-TqXG3K1WZHAKXhY7g7" width =700 >

In the example above, the **output layer** would be `nn.Linear(2, 1)`. In the figure above, we have assumed a batch size of 1.


# <font color = 'indianred'>**Shallow Neural Network**

Many times, we want to compose Modules together. `torch.nn.Sequential` provides a good interface to combine modules sequentially where the output of a module (layer) is sequentially fed as an input to the next layer. Consider the following network:

<img src ="https://drive.google.com/uc?export=view&id=1rymZGH-Xrp_1ywGAcRJraiuuAd-verg7" width =700 >


In the example above, the **hidden layer** would be `nn.Linear(3, 4)` and the **output layer** would be `nn.Linear(4, 1)`. In the figure above, we have assumed a batch size of 1.

## <font color = 'indianred'>**Shallow NN with Custom Class**

In [15]:
class LinearRegression(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dim = hidden_dim
        self.linear_layer1 = nn.Linear(input_dim,hidden_dim)
        self.linear_layer2 = nn.Linear(hidden_dim, output_dim)


    def forward(self, x):
        out1 = self.linear_layer1(x)
        out2 = self.linear_layer2(out1)
        return out2

In [16]:
# The code below illustrates above eample with batch size of 5
input_ =   torch.arange(15).view(5, 3).float()
model = LinearRegression(3, 4 , 1)
output = model(input_)

print('input_tensor shape :', input_.shape)
print('output_tensor shape: ', output.shape)

input_tensor shape : torch.Size([5, 3])
output_tensor shape:  torch.Size([5, 1])


In [17]:
# We can get all the parameters associated with model(linear layer) as follows
for name, param in  model.named_parameters():
  print(name, param)

linear_layer1.weight Parameter containing:
tensor([[ 0.4389, -0.4374,  0.1833],
        [ 0.3030, -0.1730,  0.1389],
        [ 0.0680,  0.1408,  0.1448],
        [ 0.2911,  0.0317, -0.0375]], requires_grad=True)
linear_layer1.bias Parameter containing:
tensor([-0.4349,  0.5764, -0.2530, -0.1742], requires_grad=True)
linear_layer2.weight Parameter containing:
tensor([[ 0.2446, -0.2885, -0.1945, -0.4396]], requires_grad=True)
linear_layer2.bias Parameter containing:
tensor([-0.1284], requires_grad=True)


In [18]:
model.state_dict()

OrderedDict([('linear_layer1.weight',
              tensor([[ 0.4389, -0.4374,  0.1833],
                      [ 0.3030, -0.1730,  0.1389],
                      [ 0.0680,  0.1408,  0.1448],
                      [ 0.2911,  0.0317, -0.0375]])),
             ('linear_layer1.bias',
              tensor([-0.4349,  0.5764, -0.2530, -0.1742])),
             ('linear_layer2.weight',
              tensor([[ 0.2446, -0.2885, -0.1945, -0.4396]])),
             ('linear_layer2.bias', tensor([-0.1284]))])

## <font color = 'indianred'>**Shallow NN with nn.Sequential module**

In [19]:
# The code below illustrates above eample with batch size of 5
input_ =   torch.arange(15).view(5, 3).float()
hidden_layer = nn.Linear(3, 4)
output_layer = nn.Linear(4, 1)
model = nn.Sequential(hidden_layer, output_layer)
output = model(input_)

print('input_tensor shape :', input_.shape)
print('output_tensor shape: ', output.shape)


input_tensor shape : torch.Size([5, 3])
output_tensor shape:  torch.Size([5, 1])


In [20]:
# print the model
print(model)

Sequential(
  (0): Linear(in_features=3, out_features=4, bias=True)
  (1): Linear(in_features=4, out_features=1, bias=True)
)


In [21]:
# We can get all the parameters associated with model(linear layer) as follows
for name, param in  model.named_parameters():
  print(name, param)

0.weight Parameter containing:
tensor([[ 0.3042,  0.0616,  0.5314],
        [ 0.3029,  0.3482, -0.1284],
        [ 0.4099,  0.2782, -0.4788],
        [-0.3551,  0.3838,  0.1684]], requires_grad=True)
0.bias Parameter containing:
tensor([ 0.2974, -0.0717, -0.4111, -0.1068], requires_grad=True)
1.weight Parameter containing:
tensor([[-0.2957, -0.3542,  0.2458,  0.1222]], requires_grad=True)
1.bias Parameter containing:
tensor([0.2401], requires_grad=True)


In [22]:
model.state_dict()

OrderedDict([('0.weight',
              tensor([[ 0.3042,  0.0616,  0.5314],
                      [ 0.3029,  0.3482, -0.1284],
                      [ 0.4099,  0.2782, -0.4788],
                      [-0.3551,  0.3838,  0.1684]])),
             ('0.bias', tensor([ 0.2974, -0.0717, -0.4111, -0.1068])),
             ('1.weight', tensor([[-0.2957, -0.3542,  0.2458,  0.1222]])),
             ('1.bias', tensor([0.2401]))])

## <font color = 'indianred'>**Sequential Module with Layer Names**

In [23]:
# The code below illustrates above eample with batch size of 5
input_ =   torch.arange(15).view(5, 3).float()
model = nn.Sequential()
model.add_module('hidden_layer', nn.Linear(3, 4))
model.add_module('output_layer', nn.Linear(4, 1))
output = model(input_)

print('input_tensor shape :', input_.shape)
print('output_tensor shape: ', output.shape)

input_tensor shape : torch.Size([5, 3])
output_tensor shape:  torch.Size([5, 1])


In [24]:
# We can get all the parameters associated with model(linear layer) as follows
for name, param in  model.named_parameters():
  print(name, param)

hidden_layer.weight Parameter containing:
tensor([[-0.0107,  0.2627, -0.2292],
        [-0.0197, -0.1751, -0.3574],
        [ 0.3438,  0.1353,  0.0849],
        [ 0.0635,  0.3004,  0.3145]], requires_grad=True)
hidden_layer.bias Parameter containing:
tensor([-0.4595, -0.5556,  0.4239, -0.3555], requires_grad=True)
output_layer.weight Parameter containing:
tensor([[-0.0532, -0.2842,  0.0963,  0.0406]], requires_grad=True)
output_layer.bias Parameter containing:
tensor([0.2836], requires_grad=True)


In [25]:
print(model.state_dict())

OrderedDict([('hidden_layer.weight', tensor([[-0.0107,  0.2627, -0.2292],
        [-0.0197, -0.1751, -0.3574],
        [ 0.3438,  0.1353,  0.0849],
        [ 0.0635,  0.3004,  0.3145]])), ('hidden_layer.bias', tensor([-0.4595, -0.5556,  0.4239, -0.3555])), ('output_layer.weight', tensor([[-0.0532, -0.2842,  0.0963,  0.0406]])), ('output_layer.bias', tensor([0.2836]))])


# <font color = 'indianred'>**Mean Squared Error Loss (nn.MSELoss())**

PyTorch implements many common loss functions including `MSELoss` and `CrossEntropyLoss`. We will discuss `MSELoss()` in this lecture. We will explore `CrossEntropyLoss` in coming lectures.

Supposedly our input and output is as follows:

`x = [0, 1, 2, 3, 4]`

`y = [1, 3, 5, 7, 9]`

But our predicted output comes out with an error with equation `y = 2 * x`

`ypred = [0, 2, 4, 6, 8] `

Mean Squared Error (MSE) = $\frac{\sum_{i=1}^{n} (ypred_i  - y_i)^2} {n}$. Here, n = number of elements.

For the above example, loss = 1.0

Earlier we have written function to implement MSE. We can use nn.MSE() module from pytorch to calculate loss.



In [26]:
# Instantiate Mean Squared Error loss function
def mse_loss(ypred, y):
    """
    Computes the mean squared error loss between predicted and actual labels.

    Args:
        ypred: a tensor of shape (num_examples, 1) containing the predicted labels
        y: a tensor of shape (num_examples, 1) containing the actual labels

    Returns:
        A scalar tensor containing the mean squared error loss
    """
    error = ypred - y
    mean_squared_error = torch.mean(error**2)
    return mean_squared_error

# As a class - This is a higher order function that returns a function
loss_nn = nn.MSELoss(reduction='mean')


# As a function - This is a higher order function that retirns a function
loss_functional = F.mse_loss

# when we specify reduction = 'mean' - this will give us mean sqaured loss
# if reduction = 'sum' - this will give us total squared loss
# reduction = 'mean' is the default

# inputs
x = torch.Tensor([0, 1, 2, 3, 4])
y = torch.Tensor([1, 3, 5, 7, 9])

# output
ypred = 2 * x

# Calculating loss
# Loss function will take in 2 inputs: actual labels and predicted labels.
loss_manual = mse_loss(y, ypred)
loss_nn_module = loss_nn(y, ypred)
loss_functional = loss_functional(y, ypred)
print(loss_manual, loss_nn_module, loss_functional )

tensor(1.) tensor(1.) tensor(1.)


# <font color = 'indianred'>**torch.optim**
We can implement number of gradient-based optimization methods using `torch.optim`. **SGD (Stochastic Gradient Descent)** is the most basic of them and **Adam** is one of the most popular. We will use SGD in this notebook and cover other optimizers in a later lecture.

An optimizer takes the **model parameters** we want to update (learnable parameters), and the **learning rate**  (and some other hyper-parameters as well).

Optimizers do not compute the gradients on their own, we need to call **backward()** on the loss first.

We can then use optimizer's **step()** mehod to update the model parameters.

Further, we do no not need to zero the gradients one by one. We can invoke the optimizer’s **zero_grad()** method.

This does  `zero_()` call on all learnable parametets of the model.

In [27]:
# create a simple model
model = nn.Linear(3, 1)

# create a simple dataset
X = torch.tensor([[1., 3., 4.]])
y = torch.tensor([[2.]])

# create our optimizer
optim = torch.optim.SGD(model.parameters(), lr=1e-2)

# loss function
criterion = nn.MSELoss()

y_hat = model(X)

print('model params before weight update:', model.weight.data, model.bias.data)

# calculate loss
loss = criterion(y_hat, y)

# reset gradients to zero
optim.zero_grad()

# calculate gradients
loss.backward()

# update weights
optim.step()


print('model params after weight update:', model.weight.data, model.bias.data)


model params before weight update: tensor([[ 0.1606, -0.1262, -0.1093]]) tensor([-0.4791])
model params after weight update: tensor([[0.2233, 0.0618, 0.1414]]) tensor([-0.4164])


# <font color = 'indianred'>**Dataset and Dataloader**

When we train our model, we typically

  - want to process the data in batches
  - reshuffle the data at every epoch to reduce model overfitting,
  - and use Python’s multiprocessing to speed up data retrieval.

Earlier we wrote a function to create an iterator, that will shuffle the data and yield batches of data. However, we can do this much more efficently using **torch.utils.data.DataLoader**, which is an iterator that provides all the above features.

The most important argument of DataLoader constructor is dataset, which is a PyTorch Dataset. Pytorch **Dataset** is a regular **Python class** that inherits from the [**Dataset**](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) class.

If a dataset consists of tensors of lables and features, we can use PyTorch’s [**TensorDataset**](https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset) class to wrap tensors in a Dataset class.

If the **dataset is big** (tens of thousands of image/text files, for instance), loading it at once would not be memory efficient. In that case we will need to create  custom dataset class , that load the files\examples on demand. We will demonstrate how to create a CustomDataset that inherits from PyTorch's Dataset class later.

In [28]:
# Generate Dataset
x = torch.arange(10).view(5, 2)
x = x.type(dtype = torch.float)
w = torch.Tensor([2, 3]).view(-1, 1)
y = x.mm(w) + 1
print(f'x:{x}' )
print(f'\ny: {y}')

x:tensor([[0., 1.],
        [2., 3.],
        [4., 5.],
        [6., 7.],
        [8., 9.]])

y: tensor([[ 4.],
        [14.],
        [24.],
        [34.],
        [44.]])


In [29]:
# Create Dataset
dataset = data.TensorDataset(x, y)

In [30]:
dataset

<torch.utils.data.dataset.TensorDataset at 0x7a3238cb6290>

In [31]:
dataset[1]

(tensor([2., 3.]), tensor([14.]))

In [32]:
# Create DataLoader
data_iter = data.DataLoader(dataset, batch_size= 2, shuffle= True)

In [33]:
# We can loop over the DataLoader object to get batch of observations

for epoch in range(3):
  print(f'\nEpoch {epoch + 1}\n')
  for i, (x, y) in enumerate(data_iter):
    print(f'Batch Number {i+1}')
    print(f'x:{x}' )
    print(f'y: {y}\n')


Epoch 1

Batch Number 1
x:tensor([[6., 7.],
        [2., 3.]])
y: tensor([[34.],
        [14.]])

Batch Number 2
x:tensor([[8., 9.],
        [4., 5.]])
y: tensor([[44.],
        [24.]])

Batch Number 3
x:tensor([[0., 1.]])
y: tensor([[4.]])


Epoch 2

Batch Number 1
x:tensor([[4., 5.],
        [2., 3.]])
y: tensor([[24.],
        [14.]])

Batch Number 2
x:tensor([[8., 9.],
        [6., 7.]])
y: tensor([[44.],
        [34.]])

Batch Number 3
x:tensor([[0., 1.]])
y: tensor([[4.]])


Epoch 3

Batch Number 1
x:tensor([[8., 9.],
        [2., 3.]])
y: tensor([[44.],
        [14.]])

Batch Number 2
x:tensor([[0., 1.],
        [6., 7.]])
y: tensor([[ 4.],
        [34.]])

Batch Number 3
x:tensor([[4., 5.]])
y: tensor([[24.]])



We can obseve that in every epoch, an obsetvation is a part of a different batch. This happens as DataLoader shuffles the dataset to create batches.