# Modules

In PyTorch, most of what you need to define your neural network is inside `torch.nn`. The central class for representing neural networks that is equipped with the necessary methods for automatic differentiation and training neural networks is the `torch.nn.Module` class. All pre-defined modules (also called layers, blocks, models, etc) are a subclass of `torch.nn.Module`. First, we will introduce some basic pre-defined modules that you will use most often. Then, we will go over defining a custom module, which can constitutes an arbitrary model that you have designed and want to implement in code.

First, we will use a simple linear layer as an example and go over some key methdos and attributes of it.

In [None]:
import torch
from torch import nn

linear_module = nn.Linear(in_features=3, out_features=4, bias=True)

print(linear_module)
print(20*'-')

print('Parameters:')
for name, param in linear_module.named_parameters():
    print('parameter name:', name)
    print('parameter shape:', param.shape)
    print('parameter value:', param)
print(20*'-')


print('Weight:')
print(linear_module.weight)
print(20*'-')

print('Bias:')
print(linear_module.bias)
print(20*'-')


# Moving the module to GPU:

linear_module.to('cuda')

print('Weight device:')
print(linear_module.weight.device)
print(20*'-')

# Freezing the module so it is not trainable anymore:
linear_module.requires_grad_(False)

print('Is the weight trainable?')
print(linear_module.weight.requires_grad)
print(20*'-')

# Making it trainable again:
linear_module.requires_grad_(True)

# Let's try to pass an input

input_tensor = torch.tensor([[1.0, 2.0, 3.0]]).to('cuda')
output = linear_module(input_tensor)
print('Output:')
print(output)

Except the specific parameter names like weight and bias, the rest of the methods above are the methods of the super class `nn.Module` and you can call or accesss them for any instance of the `nn.Module` class.

# Defining a fully connected network

Now, let's use some pre-defined modules to create a fully connected network. We will use linear modules and some activation modules like ReLU, Sigmoid, Tanh, LeakyReLU. There are more that you can find in PyTorch documentation online or by a simple google search or asking your favorite AI chatbot.

In [None]:
import torch
from torch import nn

"""
You can create a neural network that is simply a sequence of layers by using the nn.Sequential class.
YOu can define linear layers and activation layers from nn as well.
"""

fully_connected_network = nn.Sequential(
    nn.Linear(in_features=3, out_features=32),
    nn.ReLU(),
    nn.Linear(in_features=32, out_features=64),
    nn.LeakyReLU(negative_slope=0.01),
    nn.Linear(in_features=64, out_features=128),
    nn.Tanh(),
    nn.Linear(in_features=128, out_features=1),
    nn.Sigmoid(),
)

# Let's try to pass an input tensor to this model and check the output shape:

batch_size = 10

input_tensor = torch.randn(batch_size, 3)

output_tensor : torch.Tensor = fully_connected_network(input_tensor)

print('Input tensor shape:', input_tensor.shape)
print('Does the input have gradients?', input_tensor.requires_grad)
print('Does the output have gradients?', output_tensor.requires_grad)


In [None]:
# Let's inspect the model:

print(fully_connected_network)

In [None]:
# getting a certain module by indexing:
print(fully_connected_network[0])

In [None]:
# Going over the parameters:
for name, param in fully_connected_network.named_parameters():
    print('parameter name:', name)
    print('parameter shape:', param.shape)
    print(20*'-')

In [None]:
"""
Can we get them all in one package instead of using a loop?
"""

# STATE DICT: a dictionary containing the name and content of the parameters of the model.

state_dict = fully_connected_network.state_dict()

for key, value in state_dict.items():
    print('key:', key)
    print('value shape:', value.shape)
    print(20*'-')

In [None]:
"""
For a model with a defined architecture, you can save and load the state_dict to and from a file.
"""

# Saving the state_dict to a file:
torch.save(state_dict, 'state_dict.pth')

# Loading the state_dict from a file:
loaded_state_dict = torch.load('state_dict.pth')

# Loading the state_dict to the model:
fully_connected_network.load_state_dict(loaded_state_dict)

Remember these methods as these are how model parameters can be saved for publication and submission purposes as well. As mentioned before, you can use these methods on all modules, specially your customized modules! We will go over them in the next section.

# Defining custom modules

In order to define an arbitrary model in PyTorch, you need to define as a class that inherits (has all the methods and attributes of, and will have some more) from `nn.Module`. Then, you will have to implement at least two methods for your model to be able to train it and use it as any module.

- `.__init__(self, ...)`
    - First, you have to call `super().__init__()` to run the initialization of the superclass, which is `nn.Module`.

    - Then, you define all the components of your model. You can use pre-defined modules like `nn.Linear`, `nn.Sequential`, `nn.Conv2d`, and many more. If you want to know if PyTorch has a certain model or layer pre-defined and ready to use, you can do a simple search and find out. 

    - To define a flexible number of modules, use `nn.ModuleList` and `nn.ModuleDict` and fill them in with your desired modules. You can then do everything with them like a normal list or dictionary (by indexing them with their location or key). **BE CAREFUL** to not define a normal list or dictionary for containing your modules, as they will not be detectable for backpropagation and training them. The elements of `nn.ModuleList` and `nn.ModuleDict` and be any `nn.Module`, including another `nn.ModuleList` or `nn.ModuleDict`. Try to keep things soft-coded, flexible, and not too complicated. You should be able to modify the size of your model (like number of layers, neurons, etc) by just changing the arguments you pass in the instantiation, and without any need to change the code of the model class.

    - You can define custom parameters using `nn.Parameter`, or a list or dictionary of parameters using `nn.ParameterList` and `nn.ParameterDict`. You will find these useful for defining models with flexible sizes. You can fill in the ParameterList or ParameterDict in a loop whose size depends on how to define and initialize your model. You can use these parameters like any other tensor in elementwise calculations or matrix and tensor multiplications. Make sure to not store them in normal lists or dictionaries, because they will not be properly connected to the rest of the model and backpropagation will not be able to detect them! You will probably not need to define custom paramteres and use them, but keep this option in mind just in case.

- `.forward(self, ...)`
    - Here, you will use the modules and parameters you have defined in `__init__`, as well as any calculations you can, to define the forward pass of the model. You cannot define anything in this method. All model components that you want to train should have been defined. Then, you can access them as attributes of `self` and pass your input to them.


PyTorch will take care of the rest. We will teach you how to utilize a dataset and write the code to train a model next week. 

For now, let's go over some examples of defining a layer with arbitrary components and operations.

In [None]:
import torch
from torch import nn


# A 2-layer network with a hidden layer
class Custom_Module_1(nn.Module):

    def __init__(
            self,
            in_features: int,
            out_features: int,
            hidden_dim: int
    ):
        super().__init__()
        self.layer1 = nn.Linear(in_features, hidden_dim)
        self.activation1 = nn.ReLU()
        self.layer2 = nn.Linear(hidden_dim, out_features)
        self.activation2 = nn.Sigmoid()

    def forward(self, x):
        x = self.layer1(x)
        x = self.activation1(x)
        x = self.layer2(x)
        x = self.activation2(x)
        return x


# creating an instance of the model:

custom_model_1_example_1 = Custom_Module_1(
    in_features = 3, 
    out_features = 1, 
    hidden_dim = 32
    )

batch_size, in_features = 10, 3

input_tensor = torch.randn(batch_size, in_features)

output_tensor = custom_model_1_example_1(input_tensor)

print('Input tensor shape:', input_tensor.shape)

In [None]:
"""
What if I want two hidden layers?
"""

# The noob way is to rewrite the code of your class like this:
class Custom_Module_2(nn.Module):

    def __init__(
            self,
            in_features: int,
            out_features: int,
            hidden_dim1: int,
            hidden_dim2: int,
            ):
        super().__init__()
        self.layer1 = nn.Linear(in_features, hidden_dim1)
        self.activation1 = nn.ReLU()
        self.layer2 = nn.Linear(hidden_dim1, hidden_dim2)
        self.activation2 = nn.ReLU()
        self.layer3 = nn.Linear(hidden_dim2, out_features)
        self.activation3 = nn.Sigmoid()

    def forward(self, x):
        x = self.layer1(x)
        x = self.activation1(x)
        x = self.layer2(x)
        x = self.activation2(x)
        x = self.layer3(x)
        x = self.activation3(x)
        return x

# creating an instance of the model:

custom_model_2_example_1 = Custom_Module_2(
    in_features = 3, 
    out_features = 1, 
    hidden_dim1 = 32,
    hidden_dim2 = 64
    )

batch_size, in_features = 10, 3
input_tensor = torch.randn(batch_size, in_features)
output_tensor = custom_model_2_example_1(input_tensor)
print('Input tensor shape:', input_tensor.shape)


In [None]:
# The professional way is to use nn.ModuleList

# If you want to have a variable number of hidden layers, you can use nn.ModuleList to create a list of layers.

class Custom_Module_3(nn.Module):

    def __init__(
            self,
            in_features: int,
            out_features: int,
            hidden_dims: list,
            ):
        super().__init__()
        dims = [in_features] + hidden_dims + [out_features]

        # define a for loop inside a list. This is called list comprehension.
        self.layers = nn.ModuleList([
            nn.Linear(dims[i], dims[i+1])
            for i in range(len(dims)-1)
            ])
        
        # define a for loop inside a list with conditionals. 
        self.activations = nn.ModuleList([
            nn.ReLU() if i != len(dims)-2 else nn.Sigmoid()
            for i in range(len(dims)-1)
            ])
        
    def forward(self, x):
        for layer, activation in zip(self.layers, self.activations):
            x = layer(x)
            x = activation(x)
        return x
    
# Now your model is more flexible and you won't need to modify the code if you want to add more hidden layers.

# creating an instance of the model:
custom_model_3_example_1 = Custom_Module_3(
    in_features = 3, 
    out_features = 1, 
    hidden_dims = [32, 64]
    )

# Checking the model with an input tensor:
batch_size, in_features = 10, 3
input_tensor = torch.randn(batch_size, in_features)
output_tensor = custom_model_3_example_1(input_tensor)
print('Input tensor shape:', input_tensor.shape)

In [None]:
"""
Let's try this with nn.Sequential:
"""

class Custom_Module_4(nn.Module):

    def __init__(
            self,
            in_features: int,
            out_features: int,
            hidden_dims: list,
            ):
        super().__init__()
        dims = [in_features] + hidden_dims + [out_features]

        # define an empty list
        layers = []
        for i in range(len(dims)-1):
            layers.append(nn.Linear(dims[i], dims[i+1]))
            if i != len(dims)-2:
                layers.append(nn.ReLU())
            else:
                layers.append(nn.Sigmoid())

        # The * notation unpacks the elements in the list and passes them as arguments to the nn.Sequential class.
        self.network = nn.Sequential(*layers)

        # Because you give the list to nn.Sequential, they will be detectable for training.
        
    def forward(self, x):
        return self.network(x)
    

# creating an instance of the model:
custom_model_4_example_1 = Custom_Module_4(
    in_features = 3, 
    out_features = 1, 
    hidden_dims = [32, 64]
    )

# Checking the model with an input tensor:
batch_size, in_features = 10, 3
input_tensor = torch.randn(batch_size, in_features)
output_tensor = custom_model_4_example_1(input_tensor)
print('Input tensor shape:', input_tensor.shape)

In [None]:
"""
Let's try using nn.ModuleDict this time:
"""

class Custom_Module_5(nn.Module):
    def __init__(
        self,
        in_features: int,
        out_features: int,
        hidden_dims: list,
        ):
        super().__init__()

        dims = [in_features] + hidden_dims + [out_features]

        self.n_layers = len(dims) - 1

        self.layers = nn.ModuleDict({
            'linear_modules': nn.ModuleList([
                nn.Linear(dims[i], dims[i+1])
                for i in range(self.n_layers)
            ]),

            'activation_modules': nn.ModuleList([
                nn.ReLU() if i != len(dims)-2 else nn.Sigmoid()
                for i in range(self.n_layers)
            ])
        })

    def forward(self, x):
        for i in range(self.n_layers):
            x = self.layers['linear_modules'][i](x)
            x = self.layers['activation_modules'][i](x)
        return x
                
# creating an instance of the model:

custom_model_5_example_1 = Custom_Module_5(
    in_features = 3, 
    out_features = 1, 
    hidden_dims = [32, 64]
    )

# Checking the model with an input tensor:
batch_size, in_features = 10, 3
input_tensor = torch.randn(batch_size, in_features)
output_tensor = custom_model_5_example_1(input_tensor)
print('Input tensor shape:', input_tensor.shape)