# Layers & Modules

To begin, we revisit the code
that we used to implement MLPs.
The following code generates a network
with one fully connected hidden layer
with 256 units and ReLU activation,
followed by a fully connected output layer
with 10 units (no activation function).

In [1]:
import torch
from torch import nn
from torch.nn import functional as F

In [2]:
net = nn.Sequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))

X = torch.rand(2, 20)
net(X)

tensor([[ 0.2687,  0.1114,  0.0348, -0.0264,  0.0153,  0.0631,  0.1488,  0.1025,
         -0.0260,  0.1954],
        [ 0.1807,  0.0679,  0.1163,  0.0109,  0.1157, -0.0018,  0.1001,  0.2016,
         -0.1585,  0.2452]], grad_fn=<AddmmBackward0>)

In this example, we constructed
our model by instantiating an `nn.Sequential`, with layers in the order
that they should be executed passed as arguments.
In short, (**`nn.Sequential` defines a special kind of `Module`**),
the class that presents a module in PyTorch.
It maintains an ordered list of constituent `Module`s.
Note that each of the two fully connected layers is an instance of the `Linear` class
which is itself a subclass of `Module`.
The forward propagation (`forward`) method is also remarkably simple:
it chains each module in the list together,
passing the output of each as input to the next.
Note that until now, we have been invoking our models
via the construction `net(X)` to obtain their outputs.

### A Custom Module

In the following snippet,
we code up a module from scratch
corresponding to an MLP
with one hidden layer with 256 hidden units,
and a 10-dimensional output layer.
Note that the `MLP` class below inherits the class that represents a module.
We will heavily rely on the parent class's methods,
supplying only our own constructor (the `__init__` method in Python) and the forward propagation method.

In [3]:
class MLP(nn.Module):
    def __init__(self):
        # Call the constructor of the parent class nn.Module to perform
        # the necessary initialization
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.out = nn.Linear(256, 10)
    
    def forward(self, X):
        # Define the forward propagation of the model, that is, how to return the
        # required model output based on the input X
        return self.out(F.relu(self.hidden(X)))  

We **instantiate the MLP's layers**
in the constructor
(**and subsequently invoke these layers**)
on each call to the forward propagation method.
Note a few key details.
First, our customized `__init__` method
invokes the parent class's `__init__` method
via `super().__init__()`
sparing us the pain of restating
boilerplate code applicable to most modules.
We then instantiate our two fully connected layers,
assigning them to `self.hidden` and `self.out`.
Note that unless we implement a new layer,
we need not worry about the backpropagation method
or parameter initialization.
The system will generate these methods automatically.

In [4]:
net = MLP()
net(X)

tensor([[-0.0368,  0.1148,  0.1351, -0.0403,  0.0866, -0.0416,  0.0717, -0.0378,
          0.3210, -0.0767],
        [-0.0743,  0.0385,  0.1518, -0.0586,  0.1375,  0.0369,  0.0390, -0.1233,
          0.3423, -0.1094]], grad_fn=<AddmmBackward0>)

### The Sequential Module

We can now take a closer look at how the Sequential class works. Recall that Sequential was designed to daisy-chain other modules together. To build our own simplified MySequential, we just need to define two key methods:

A method to append modules one by one to a list.
A forward propagation method to pass an input through the chain of modules, in the same order as they were appended.
The following MySequential class delivers the same functionality of the default Sequential class.


In [5]:
class MySequential(nn.Module):
    def __init__(self, *args): # *args: list of input arguments
        super().__init__()
        for block in args:
            # variable _modules is OrderedDict
            self._modules[block] = block 
    
    def forward(self, X):
        for block in self._modules.values():
            X = block(X)
        return X

net = MySequential(nn.Linear(20, 256), nn.ReLU(), nn.Linear(256, 10))
net(X)

tensor([[-0.2972, -0.0054, -0.0942,  0.0424,  0.1033, -0.1773,  0.1433,  0.0695,
          0.0633, -0.1021],
        [-0.3194,  0.1335, -0.1565,  0.1891,  0.0288, -0.0912,  0.1853, -0.0072,
         -0.0239, -0.0228]], grad_fn=<AddmmBackward0>)

Note that this use of `MySequential`
is identical to the code we previously wrote
for the `Sequential` class.


### Executing Code in the Forward Propagation Method

You might have noticed that until now,
all of the operations in our networks
have acted upon our network's activations
and its parameters.
Sometimes, however, we might want to
incorporate terms
that are neither the result of previous layers
nor updatable parameters.
We call these *constant parameters*.
Say for example that we want a layer
that calculates the function
$f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}$,
where $\mathbf{x}$ is the input, $\mathbf{w}$ is our parameter,
and $c$ is some specified constant
that is not updated during optimization.
So we implement a `FixedHiddenMLP` class as follows.


In [6]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()
        # torch.rand() returns random samples from a uniform distribution over the half-open interval [0.0, 1.0)
        # Random weight parameters will not compute gradients and
        # therefore keep constant during training
        self.rand_weight = torch.rand((20, 20), requires_grad=False) ###
        self.linear = nn.Linear(20, 20)
    
    def forward(self, X):
        X = self.linear(X)
        X = F.relu(torch.mm(X, self.rand_weight) + 1) # torch.mm: matrix multiplication without broadcasting
        X = self.linear(X)
        while X.abs().sum() > 1:
            X /= 2
        return X.sum()

net = FixedHiddenMLP()
net(X)

tensor(-0.1612, grad_fn=<SumBackward0>)

In this `FixedHiddenMLP` model,
we implement a hidden layer whose weights
(`self.rand_weight`) are initialized randomly
at instantiation and are thereafter constant.
This weight is not a model parameter
and thus it is never updated by backpropagation.
The network then passes the output of this "fixed" layer
through a fully connected layer.

Note that before returning the output,
our model did something unusual.
We ran a while-loop, testing
on the condition its $\ell_1$ norm is larger than $1$,
and dividing our output vector by $2$
until it satisfied the condition.
Finally, we returned the sum of the entries in `X`.

We can **mix and match various ways of assembling modules together**. In the following example, we nest modules in some creative ways.

In [11]:
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(nn.Linear(20, 64), nn.ReLU(), nn.Linear(64, 32), nn.ReLU())
        self.linear = nn.Linear(32, 16)
    
    def forward(self, X):
        return self.linear(self.net(X))

chimera = nn.Sequential(NestMLP(), nn.Linear(16, 20), FixedHiddenMLP())
chimera(X)

tensor(0.0237, grad_fn=<SumBackward0>)

### Summary

Layers are modules.
Many layers can comprise a module.
Many modules can comprise a module.

A module can contain code.
Modules take care of lots of housekeeping, including parameter initialization and backpropagation.
Sequential concatenations of layers and modules are handled by the `Sequential` module.