# Chapter 6 - Builders' Guide

## 6.1. Layers and Modules

A *module* could describe a single layer, a component consisting of multiple layers, or the entire model itself. Working with the module abstraction allows use to combine them into larger artifacts, and to reuse them across multiple models.

From a programming standpoint, a module is represented by a *class*. Any subclass of it must define a froward propagation method that transforms its input into output and must store any necessary parameters. The module must possess a backpropagation method, for purpose of calculating gradients.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
# build a simple MLP with nn.Sequential
net = nn.Sequential(
    nn.LazyLinear(256),
    nn.ReLU(),
    nn.LazyLinear(10)
)

X = torch.randn(2, 20)
net(X).shape



torch.Size([2, 10])

We just built our model by instantiating an `nn.Sequential` with layers in the order that they should be executed passed as arguments. The `nn.Sequential` class defines a special kind of `Module`, the class that presents a module in PyTorch. It maintains an ordered list of constituent `Module`s.

The `Linear` class itself is a subclass of `Module`. The forward propagation (`forward`) method chains each module in the list together, passing the output of each as input to the next.

We invoked our models via the construction `net(X)` to obtain the outputs, which is a shorthand for `net.__cal__(X)`.

### 6.1.1. A Custom Module

Each module must provide the following functionalities:
1. Ingest input data as arguments to its forward propagation method.
2. Generate an output by having the forward propagation method return a value. Note that the output may have a different shape from the input. 
3. Calculate the gradient of its output with respect to its input, which can be accessed via its backpropagation method.
4. Store and provide access to those parameters necessary to execute the forward propagation computation.
5. Initialize these parameters as needed.

We can code up a module from scratch by subclassing the `Module` class. Note that the `MLP` class inherits the `Module` class. We will heavily rely on the parent class's methods, supplying only our own constructor (the `__init__` method) and forward propagation method.

In [3]:
class MLP(nn.Module):
    def __init__(self):
        # call the constructor of the parent class nn.Module to perform the necessary initialization
        super().__init__()

        # define the layers
        self.hidden = nn.LazyLinear(256)
        self.out = nn.LazyLinear(10)

    # define the forward pass, that is,
    # how to return the required model output based on the input X
    def forward(self, X):
        h = self.hidden(X)
        h = F.relu(h)
        out = self.out(h)
        return out

In this `MLP` implementation, both `self.hidden` and `self.output` are `Linear` instances. Each has its own weight and bias parameters. We instantiate the `MLP`’s layers in the constructor and subsequently invoke these layers on each call to the forward propagation method. 

The `__init__` method in `MLP` invokes the parent class's `__init__` method via `super().__init__()` sparing us the pain of restating boilerplate code applicable to most modules.

In [4]:
net = MLP()
net(X).shape



torch.Size([2, 10])

### 6.1.2. The Sequential Module

The `Sequential` class was designed to daisy-chain other modules together.

In [5]:
class MySequential(nn.Module):
    def __init__(self, *args):
        super().__init__()

        # chain the layers and store them in a ModuleList
        for idx, module in enumerate(args):
            self.add_module(
                str(idx), # name of the child module
                module  # child module
            )

    def forward(self, X):
        # apply each module sequentially
        for module in self.children():
            X = module(X)
        return X

In the `__init__` method, we add every module by calling the `add_modules` method. These modules can be accessed by the `children` method at a later time.

In [6]:
net = MySequential(
    nn.LazyLinear(256),
    nn.ReLU(),
    nn.LazyLinear(10)
)

net(X).shape



torch.Size([2, 10])

### 6.1.3. Executing Code in the Forward Propagation Method

Sometimes we might want to incorporate terms that are neither the result of previous layers nor updatable parameters. We call these *constant parameters*. For example, we want a layer that calculates the function $f(\mathbf{x},\mathbf{w}) = c \cdot \mathbf{w}^\top \mathbf{x}$, where $\mathbf{x}$ is the input, $\mathbf{w}$ is our parameter, and $c$ is some specified constant that is not updated during optimization:

In [7]:
class FixedHiddenMLP(nn.Module):
    def __init__(self):
        super().__init__()

        # random weight parameters that will not compute gradients and 
        # therefore keep constant during training
        self.rand_weight = torch.rand((20,20))
        
        self.linear = nn.LazyLinear(20)

    def forward(self, X):
        X = self.linear(X)
        X = F.relu(X @ self.rand_weight + 1) # @ stands for matrix multiplication

        # reuse the fully-connected layer. This is equivalent to sharing parameters
        # with two fully-connected layers
        X = self.linear(X)

        # control flow
        while X.abs().sum() > 1:
            X /= 2

        return X.sum()

In this model, we implement a hidden layer whose weights (`self.rand_weight`) are initialized randomly at instantiation and are thereafter constant. This weight is not a model parameter and thus it is never updated by backpropagation. The network then passes the output of this **"fixed"** layer through a fully-connected layer.

Before returning the output, this model ran a while-loop, testing on the condition its $\ell_1$ norm is larger than 1, and dividing the output by 2 until it satisfies the condition. This is not a standard practice in deep learning, but it helps illustrate that arbitrary code can be inserted in the forward propagation method.

In [8]:
net = FixedHiddenMLP()
net(X)



tensor(0.0612, grad_fn=<SumBackward0>)

We can mix and match various ways of assembling modules together:

In [9]:
class NestMLP(nn.Module):
    def __init__(self):
        super().__init__()

        self.net = nn.Sequential(
            nn.LazyLinear(64),
            nn.ReLU(),
            nn.LazyLinear(32),
            nn.ReLU()
        )

        self.linear = nn.LazyLinear(16)

    def forward(self, X):
        return self.linear(self.net(X))

In [10]:
net = nn.Sequential(
    NestMLP(),
    nn.LazyLinear(20),
    FixedHiddenMLP()
)

net(X)



tensor(0.1345, grad_fn=<SumBackward0>)

## 6.2. Parameter Management