A module could describe a single layer, a component consisting of multiple layers, or the entire model itself! One benefit of working with the module abstraction is that they can be combined into larger artifacts, often recursively.

From a programming standpoint, a module is represented by a class. Any subclass of it must define a forward propagation method that transforms its input into output and must store any necessary parameters. Note that some modules do not require any parameters at all. Finally a module must possess a backpropagation method, for the purposes of calculating gradients. Fortunately, due to some behind-the-scenes magic supplied by the auto-differentiation when defining our own module, we only need to worry about parameters and the forward propagation method.

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

In [2]:
# Generates a network with one fully connected hidden layer with 256 units and ReLU activation
# followed by a fully connected output layer with ten units (no activation function)
net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

X = torch.rand(2, 20)
net(X).shape



torch.Size([2, 10])

In this example, we constructed our model by instantiating an nn.Sequential, with layers in the order that they should be executed passed as arguments. In short, nn.Sequential defines a specific kind of Module, the class that presents a module in PyTorch. It maintains an ordered list of consistent Modules. Note that each of the two fully connected layers is an instance of the Linear class which itself is a subclass of Module. The forward propagation method is also remarkably simple: it chains each module in the list together, passing the output of each as input to the next. Note that until now, we have been invoking our models via the construction net(X) to obtain their outputs. This is acutally just shorthand for net.__call__(X).

The basic functionality that each module must provide:

1. Ingest input data as arguments to its forward propagation method.
2. Generate an output by having the forward propagation method return a value. Note that the output may have a different shape from the input. For example, the first fully connected layer in out model above ingests an input of arbitrary dimension but returns an output of dimension 256.
3. Calculate the gradient of its output with respect to its input, which can be accessed via it backpropagation method. Typically this happens automatically.
4. Store and provide access to those parameters necessary for executing the forward propagation computation.
5. Initialize model parameters as needed.

In [3]:
class MLP(nn.Module):
    def __init__(self):
        # Call the constructor of the parent class nn.Module to perform
        # the necessary initialization
        super().__init__()
        self.hidden = nn.LazyLinear(256)
        self.out = nn.LazyLinear(10)
        
    # Define the forward propagation of the model, that is, how to return the
    # required model output based on the input X
    def forward(self, X):
        return self.out(F.relu(self.hidden(X)))

Let's first focus on the forward propagation method. Note that it takes X as input, calculates the hidden representation with the activation function applied, and outputs its logits. In this MLP implementation, both layers are instance variables. To see why this is reasonable, imaging instantiating two MLPs, net1 and net2, and training them on different data. Naturally, we would expect them to represent two different learned models.

We instantiate the MLP's layers in the constructor and subsequently invoke these layers on each call to the forward propagation method. NOte a few key details. First, out customized __init__ method invokes the parent class's __init__ method via super().__init__() sparing us the pain of restating boilerplate code applicable to most modules. We then instantiate our two fully connected layers, assigning them to self.hidden and self.out. Note that unless we implement a new layer, we need not worry about the backpropagation method or parameter initializaiton. The system will generate these methods automatically.

In [4]:
net = MLP()
net(X).shape



torch.Size([2, 10])

To build out own simplified MySequential, we just need to define two key methods:

1. A method for appending modules one by one to a list.
2. A forward propagation method for passing an input through the chain of modules, in the same order as they were appended.

In [5]:
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()
        for idx, module in enumerate(args):
            self.add_module(str(idx), module)
            
    def forward(self, X):
        for module in self.children():
            X = module(X)
        return X

In the __init__ method, we add every module by calling the add_mdoules method. These modules can be accessed by the children method at a later data. In this way the system knows the added modules, and it will properly initialize each module's parameters.

In [6]:
net = MySequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))
net(X).shape



torch.Size([2, 10])

In [7]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # Random weight parameters that will not compute gradients and
        # therefore keep constant during training
        self.rand_weight = torch.rand((20, 20))
        self.linear = nn.LazyLinear(20)
        
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(X @ self.rand_weight + 1)
        # Reuse the fully connected layer. This is equivalent to sharing
        # parameters with two fully connected layers
        X = self.linear(X)
        # Control flow
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()