# Explore `torch.nn.Module`

[`nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is the base class for neural network modules in PyTorch. In this notebook, we explore some attributes / methods of `nn.Module`.

Last week, we have seen how to define a feed forward neural network: 

In [6]:
import torch
from torch import nn
import torch.nn.functional as F

In [7]:
class FashionCNN(nn.Module):
    def __init__(self):
        super(FashionCNN, self).__init__()
        
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        
        self.fc1 = nn.Linear(in_features=64*6*6, out_features=600)
        self.drop = nn.Dropout2d(0.25)
        self.fc2 = nn.Linear(in_features=600, out_features=120)
        self.fc3 = nn.Linear(in_features=120, out_features=10)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = self.drop(out)
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        
        return out

This week we will use pre-built neural network. [torchvision](https://pytorch.org/vision/stable/index.html) and [torchaudio](https://pytorch.org/audio/stable/index.html) provide an implementation of different neural architectures, with pretrained weights. This is similar to  [tensorflow.keras.applications](https://keras.io/api/applications/). 

**Exercice:** Initialize a ResNet34 classification model. You may want to have a look at [the documentation](https://pytorch.org/vision/stable/models.html#models-and-pre-trained-weights)

In [8]:
from torchvision import models


net = models.resnet34(pretrained=True)


In [9]:
net

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

Different methods can be used to iterate over the layers / parameters of the network. This includes `.parameters()`, `.named_parameters()` and `.state_dict()`

In [10]:
count = 0
for param in net.parameters():
    print(param.shape)
    count += 1

print("*"*30)
print(f"number of parameters is {count}")

torch.Size([64, 3, 7, 7])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([128, 64, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 64, 1, 1])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Siz

In [11]:
count = 0
for key, value in net.named_parameters():
    count += 1 
    print(f"{key} has shape: {value.shape}")

print("*"*30)
print(f"number of named parameters is {count}")

conv1.weight has shape: torch.Size([64, 3, 7, 7])
bn1.weight has shape: torch.Size([64])
bn1.bias has shape: torch.Size([64])
layer1.0.conv1.weight has shape: torch.Size([64, 64, 3, 3])
layer1.0.bn1.weight has shape: torch.Size([64])
layer1.0.bn1.bias has shape: torch.Size([64])
layer1.0.conv2.weight has shape: torch.Size([64, 64, 3, 3])
layer1.0.bn2.weight has shape: torch.Size([64])
layer1.0.bn2.bias has shape: torch.Size([64])
layer1.1.conv1.weight has shape: torch.Size([64, 64, 3, 3])
layer1.1.bn1.weight has shape: torch.Size([64])
layer1.1.bn1.bias has shape: torch.Size([64])
layer1.1.conv2.weight has shape: torch.Size([64, 64, 3, 3])
layer1.1.bn2.weight has shape: torch.Size([64])
layer1.1.bn2.bias has shape: torch.Size([64])
layer1.2.conv1.weight has shape: torch.Size([64, 64, 3, 3])
layer1.2.bn1.weight has shape: torch.Size([64])
layer1.2.bn1.bias has shape: torch.Size([64])
layer1.2.conv2.weight has shape: torch.Size([64, 64, 3, 3])
layer1.2.bn2.weight has shape: torch.Size([6

In [12]:
for key, value in net.state_dict().items():
    print(f"{key} has shape: {value.shape}")

print("*"*30)
print(f"number of named parameters is {len(net.state_dict())}")

conv1.weight has shape: torch.Size([64, 3, 7, 7])
bn1.weight has shape: torch.Size([64])
bn1.bias has shape: torch.Size([64])
bn1.running_mean has shape: torch.Size([64])
bn1.running_var has shape: torch.Size([64])
bn1.num_batches_tracked has shape: torch.Size([])
layer1.0.conv1.weight has shape: torch.Size([64, 64, 3, 3])
layer1.0.bn1.weight has shape: torch.Size([64])
layer1.0.bn1.bias has shape: torch.Size([64])
layer1.0.bn1.running_mean has shape: torch.Size([64])
layer1.0.bn1.running_var has shape: torch.Size([64])
layer1.0.bn1.num_batches_tracked has shape: torch.Size([])
layer1.0.conv2.weight has shape: torch.Size([64, 64, 3, 3])
layer1.0.bn2.weight has shape: torch.Size([64])
layer1.0.bn2.bias has shape: torch.Size([64])
layer1.0.bn2.running_mean has shape: torch.Size([64])
layer1.0.bn2.running_var has shape: torch.Size([64])
layer1.0.bn2.num_batches_tracked has shape: torch.Size([])
layer1.1.conv1.weight has shape: torch.Size([64, 64, 3, 3])
layer1.1.bn1.weight has shape: torc

Let's have a look at one parameter returned by the method `.parameters()`

In [13]:
type(param)

torch.nn.parameter.Parameter

In [14]:
help(param)

Help on Parameter in module torch.nn.parameter object:

class Parameter(torch.Tensor)
 |  Parameter(data=None, requires_grad=True)
 |  
 |  A kind of Tensor that is to be considered a module parameter.
 |  
 |  Parameters are :class:`~torch.Tensor` subclasses, that have a
 |  very special property when used with :class:`Module` s - when they're
 |  assigned as Module attributes they are automatically added to the list of
 |  its parameters, and will appear e.g. in :meth:`~Module.parameters` iterator.
 |  Assigning a Tensor doesn't have such effect. This is because one might
 |  want to cache some temporary state, like last hidden state of the RNN, in
 |  the model. If there was no such class as :class:`Parameter`, these
 |  temporaries would get registered too.
 |  
 |  Args:
 |      data (Tensor): parameter tensor.
 |      requires_grad (bool, optional): if the parameter requires gradient. See
 |          :ref:`locally-disable-grad-doc` for more details. Default: `True`
 |  
 |  Metho

In [15]:
print(param.data)

tensor([ 2.4252e-05, -3.5940e-03, -1.1231e-02, -1.3950e-02,  1.2392e-02,
        -2.2924e-03, -5.5472e-03,  6.4447e-03,  5.3315e-03, -1.0412e-02,
        -9.0748e-03, -5.0182e-03, -9.0289e-03, -1.1763e-02, -1.2532e-02,
        -1.3428e-02, -9.4909e-03, -1.2215e-02,  1.7320e-03, -1.0958e-02,
         5.8611e-03,  1.3109e-02, -3.9436e-04,  1.1868e-03, -1.9697e-03,
         3.4549e-03,  1.2648e-02, -9.3525e-03,  1.6830e-03, -2.6058e-03,
        -7.1322e-03, -2.1987e-03, -5.6401e-04, -4.7328e-03, -4.6197e-03,
        -1.3864e-02, -1.1094e-03, -1.3028e-02,  4.6251e-03, -2.2866e-03,
        -2.6645e-03, -1.6517e-03,  3.4688e-03, -1.1326e-02,  1.1942e-02,
        -7.5590e-04,  1.6800e-02, -1.9134e-02, -1.7240e-02, -3.0990e-03,
         1.4421e-02, -1.9972e-02,  1.4029e-02,  9.2369e-03,  1.3318e-02,
         5.3759e-03,  1.9041e-02, -1.8332e-03,  6.9949e-03,  1.6575e-02,
        -4.4368e-03,  7.7840e-03,  6.2909e-04,  9.6192e-03,  1.9924e-02,
         1.0974e-02, -8.5309e-03, -9.9378e-04,  9.3

In [16]:
print(param)

Parameter containing:
tensor([ 2.4252e-05, -3.5940e-03, -1.1231e-02, -1.3950e-02,  1.2392e-02,
        -2.2924e-03, -5.5472e-03,  6.4447e-03,  5.3315e-03, -1.0412e-02,
        -9.0748e-03, -5.0182e-03, -9.0289e-03, -1.1763e-02, -1.2532e-02,
        -1.3428e-02, -9.4909e-03, -1.2215e-02,  1.7320e-03, -1.0958e-02,
         5.8611e-03,  1.3109e-02, -3.9436e-04,  1.1868e-03, -1.9697e-03,
         3.4549e-03,  1.2648e-02, -9.3525e-03,  1.6830e-03, -2.6058e-03,
        -7.1322e-03, -2.1987e-03, -5.6401e-04, -4.7328e-03, -4.6197e-03,
        -1.3864e-02, -1.1094e-03, -1.3028e-02,  4.6251e-03, -2.2866e-03,
        -2.6645e-03, -1.6517e-03,  3.4688e-03, -1.1326e-02,  1.1942e-02,
        -7.5590e-04,  1.6800e-02, -1.9134e-02, -1.7240e-02, -3.0990e-03,
         1.4421e-02, -1.9972e-02,  1.4029e-02,  9.2369e-03,  1.3318e-02,
         5.3759e-03,  1.9041e-02, -1.8332e-03,  6.9949e-03,  1.6575e-02,
        -4.4368e-03,  7.7840e-03,  6.2909e-04,  9.6192e-03,  1.9924e-02,
         1.0974e-02, -8.5309e

In [17]:
print(param.grad)

None


We remark that gradients are intialized to None initially. Let's have a look on what happens after one forward pass and what happens after one backward pass. 

**Exercice:** generate a random batch $(X, y)$, where $X$ has shape `(8, 3, 224, 224)` and $y$ has shape `(8, 1)`. Run a forward pass on the generated batch then a backward pass on the generated batch (use `nn.CrossEntropyLoss` as a loss function), and print a parameter (for example the bias of the output layer).

In [43]:
import torch
import torch.nn as nn

criterion = nn.CrossEntropyLoss()

data = torch.randn(size=(8, 3 , 224, 224))
targets = torch.randint(low=0, high=999, size=(8,))

out = net(data)
loss = criterion(out, targets)

loss.backward()
loss

tensor(7.8546, grad_fn=<NllLossBackward0>)

In [35]:
for param in net.parameters():
    print(param.grad.shape)


torch.Size([64, 3, 7, 7])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([64, 64, 3, 3])
torch.Size([64])
torch.Size([64])
torch.Size([128, 64, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 64, 1, 1])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Size([128])
torch.Size([128, 128, 3, 3])
torch.Size([128])
torch.Siz

In [None]:
print(param.is_leaf)

True


In [None]:
print(loss.is_leaf)

False
