<a href="https://colab.research.google.com/github/quyettranvu/deep_learning_hands_on/blob/main/chapter_builders-guide/custom-layer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The following additional libraries are needed to run this
notebook. Note that running on Colab is experimental, please report a Github
issue if you have any problem.

In [2]:
# keep pip up to date
%pip install -U pip

# install d2l but skip its (too strict) dependencies
%pip install d2l==1.0.3 --no-deps

# install dependencies compatible with Python 3.12
# NumPy >= 1.26 has Py3.12 wheels
%pip install "numpy>=1.26,<2" matplotlib pandas jupyter

# Choose the right index for your runtime (CPU vs CUDA). Example for CUDA 12.4:
# %pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
# Or CPU-only:
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

import d2l, numpy as np
print("d2l OK, numpy:", np.__version__)

Collecting pip
  Downloading pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.3-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m24.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.3
Collecting d2l==1.0.3
  Using cached d2l-1.0.3-py3-none-any.whl.metadata (556 bytes)
Downloading d2l-1.0.3-py3-none-any.whl (111 kB)
Installing collected packages: d2l
Successfully installed d2l-1.0.3
Collecting numpy<2,>=1.26
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting jupyter
  Downloading jupyter-1.1.1-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting jupyterlab (from jupyter)
  Downloading jupyterlab-4.5.0-py3-none-any.whl.metadata (16 kB)
Collecting jedi>=0.16 (from i

Looking in indexes: https://download.pytorch.org/whl/cpu
[31mERROR: Operation cancelled by user[0m[31m
[0m^C
d2l OK, numpy: 2.0.2


# Custom Layers

One factor behind deep learning's success
is the availability of a wide range of layers
that can be composed in creative ways
to design architectures suitable
for a wide variety of tasks.
For instance, researchers have invented layers
specifically for handling images, text,
looping over sequential data,
and
performing dynamic programming.
Sooner or later, you will need
a layer that does not exist yet in the deep learning framework.
In these cases, you must build a custom layer.
In this section, we show you how.


In [10]:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

## (**Layers without Parameters**)

To start, we construct a custom layer
that does not have any parameters of its own.
This should look familiar if you recall our
introduction to modules in :numref:`sec_model_construction`.
The following `CenteredLayer` class simply
subtracts the mean from its input.
To build it, we simply need to inherit
from the base layer class and implement the forward propagation function.


In [11]:
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

Let's verify that our layer works as intended by feeding some data through it.


In [12]:
layer = CenteredLayer()
layer(torch.tensor([1.0, 2, 3, 4, 5]))

tensor([-2., -1.,  0.,  1.,  2.])

We can now [**incorporate our layer as a component
in constructing more complex models.**]


In [13]:
net = nn.Sequential(nn.LazyLinear(128), CenteredLayer())

As an extra sanity check, we can send random data
through the network and check that the mean is in fact 0.
Because we are dealing with floating point numbers,
we may still see a very small nonzero number
due to quantization.


In [14]:
Y = net(torch.rand(4, 8))
Y.mean()

tensor(-2.7940e-09, grad_fn=<MeanBackward0>)

## [**Layers with Parameters**]

Now that we know how to define simple layers,
let's move on to defining layers with parameters
that can be adjusted through training.
We can use built-in functions to create parameters, which
provide some basic housekeeping functionality.
In particular, they govern access, initialization,
sharing, saving, and loading model parameters.
This way, among other benefits, we will not need to write
custom serialization routines for every custom layer.

Now let's implement our own version of the  fully connected layer.
Recall that this layer requires two parameters,
one to represent the weight and the other for the bias.
In this implementation, we bake in the ReLU activation as a default.
This layer requires two input arguments: `in_units` and `units`, which
denote the number of inputs and outputs, respectively.


In [15]:
class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

Next, we instantiate the `MyLinear` class
and access its model parameters.


In [16]:
linear = MyLinear(5, 3)
linear.weight

Parameter containing:
tensor([[ 1.1899, -1.0921, -2.7986],
        [-1.2624,  0.1724, -0.9442],
        [-0.7176, -0.6655, -0.7894],
        [ 1.8790,  0.4902, -0.3400],
        [-1.3403, -0.5052,  1.3210]], requires_grad=True)

We can [**directly carry out forward propagation calculations using custom layers.**]


In [17]:
linear(torch.rand(2, 5))

tensor([[0., 0., 0.],
        [0., 0., 0.]])

We can also (**construct models using custom layers.**)
Once we have that we can use it just like the built-in fully connected layer.


In [18]:
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

tensor([[1.2796],
        [1.8415]])

## Summary

We can design custom layers via the basic layer class. This allows us to define flexible new layers that behave differently from any existing layers in the library.
Once defined, custom layers can be invoked in arbitrary contexts and architectures.
Layers can have local parameters, which can be created through built-in functions.


## Exercises

1. Design a layer that takes an input and computes a tensor reduction,
   i.e., it returns $y_k = \sum_{i, j} W_{ijk} x_i x_j$.
1. Design a layer that returns the leading half of the Fourier coefficients of the data.


[Discussions](https://discuss.d2l.ai/t/59)


In [19]:
class CustomLayer(nn.Module):
  def __init__(self, d, K):
    super().__init__()
    self.weight = nn.Parameter(torch.rand(d, d, K))

  def forward(self, X):
    return torch.matmul('i,j,ijk->k', X, X, self.W) # i, j: index of features, ijk -> composition of (i, j) to output k

linear = CustomLayer(5, 3)
linear.weight

Parameter containing:
tensor([[[0.9135, 0.7240, 0.5305],
         [0.4252, 0.7319, 0.3249],
         [0.4637, 0.2332, 0.1652],
         [0.4122, 0.8824, 0.7133],
         [0.7037, 0.3551, 0.6865]],

        [[0.5846, 0.1588, 0.4340],
         [0.9824, 0.4072, 0.6166],
         [0.4433, 0.5207, 0.2645],
         [0.7052, 0.9073, 0.0407],
         [0.5046, 0.0585, 0.8108]],

        [[0.7907, 0.0938, 0.8083],
         [0.4546, 0.2032, 0.3419],
         [0.6810, 0.3629, 0.4926],
         [0.5947, 0.0471, 0.3445],
         [0.4191, 0.9288, 0.9777]],

        [[0.7954, 0.1230, 0.8309],
         [0.5258, 0.9794, 0.4427],
         [0.4179, 0.1696, 0.3099],
         [0.9257, 0.6087, 0.8471],
         [0.7483, 0.9589, 0.8342]],

        [[0.1150, 0.7954, 0.1520],
         [0.6233, 0.1954, 0.4672],
         [0.6432, 0.1304, 0.2014],
         [0.7062, 0.5512, 0.9724],
         [0.6453, 0.7165, 0.7334]]], requires_grad=True)

In [20]:
class FourierHalfLayer(nn.Module):
    def __init__(self):
      super().__init__()

    def forward(self, X):
        # X: (B, N) real-valued
        # F = torch.fft.fft(X, dim=-1)
        # return F[..., :X.shape[-1]//2 + 1]
        F = torch.fft.rfft(X, dim=-1) # dim = -1 is last dimension in shape of tensor
        return F