# Custom Layers

One factor behind deep learning's success
is the availability of a wide range of layers
that can be composed in creative ways
to design architectures suitable
for a wide variety of tasks.
For instance, researchers have invented layers
specifically for handling images, text,
looping over sequential data,
and
performing dynamic programming.
Sooner or later, you will need
a layer that does not exist yet in the deep learning framework.
In these cases, you must build a custom layer.
In this section, we show you how.

# 自定义层

深度学习成功的一个重要因素是可以用创造性的方式组合各种层，从而设计出适用于多种任务的架构。例如，研究者发明了专门用于处理图像、文本、序列数据以及执行动态编程的层。很快你就会遇到需要深度学习框架中尚不存在的层的情况。这时你必须构建自定义层。本节将展示如何实现。

In [1]:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

## (**Layers without Parameters**)

To start, we construct a custom layer
that does not have any parameters of its own.
This should look familiar if you recall our
introduction to modules in :numref:`sec_model_construction`.
The following `CenteredLayer` class simply
subtracts the mean from its input.
To build it, we simply need to inherit
from the base layer class and implement the forward propagation function.


## （无参数层）

我们首先构造一个没有任何参数的自定义层。
如果回忆一下 :numref:`sec_model_construction` 对模块的介绍，
这应该看起来很眼熟。
下面的`CenteredLayer`类要从其输入中减去均值。
要构建它，我们只需继承基础层类并实现前向传播函数。

In [3]:
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, X):
        return X - X.mean()

In [2]:
class ChiCenteredLayer(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        
    def forward(self, X):
        return X - X.mean()

Let's verify that our layer works as intended by feeding some data through it.

通过输入数据验证我们的层是否按预期工作

In [4]:
layer = CenteredLayer()
layer(torch.tensor([1.0, 2, 3, 4, 5]))

tensor([-2., -1.,  0.,  1.,  2.])

In [5]:
layer = ChiCenteredLayer()
layer(torch.tensor([2.0,3.0]))

tensor([-0.5000,  0.5000])

We can now [**incorporate our layer as a component
in constructing more complex models.**]

我们现在可以[**将我们的层作为组件整合到构建更复杂的模型中。**]

In [6]:
net = nn.Sequential(nn.LazyLinear(128), CenteredLayer())

In [7]:
net_chi = nn.Sequential(nn.LazyLinear(128), ChiCenteredLayer())

As an extra sanity check, we can send random data
through the network and check that the mean is in fact 0.
Because we are dealing with floating point numbers,
we may still see a very small nonzero number
due to quantization.  
作为额外的合理性检查，我们可以向网络发送随机数据并验证均值确实为0。由于处理的是浮点数，可能会因量化误差导致出现极小的非零值。


In [8]:
Y = net(torch.rand(4, 8))
Y.mean()

tensor(9.3132e-10, grad_fn=<MeanBackward0>)

In [10]:
chi_y = net_chi(torch.rand(5,50))
chi_y.mean()

tensor(4.4703e-09, grad_fn=<MeanBackward0>)

## [**Layers with Parameters**]

Now that we know how to define simple layers,
let's move on to defining layers with parameters
that can be adjusted through training.
We can use built-in functions to create parameters, which
provide some basic housekeeping functionality.
In particular, they govern access, initialization,
sharing, saving, and loading model parameters.
This way, among other benefits, we will not need to write
custom serialization routines for every custom layer.

Now let's implement our own version of the  fully connected layer.
Recall that this layer requires two parameters,
one to represent the weight and the other for the bias.
In this implementation, we bake in the ReLU activation as a default.
This layer requires two input arguments: `in_units` and `units`, which
denote the number of inputs and outputs, respectively.

## [**带参数的层**]

现在我们已经掌握了如何定义简单层，接下来我们转向定义具有可训练参数的层。这些参数可以通过内置函数创建，这些函数提供了基础管理功能，包括参数访问、初始化、共享、保存和加载等。这种设计使我们无需为每个自定义层编写序列化例程。

接下来我们实现一个自定义的全连接层。该层需要两个参数：权重矩阵和偏置向量。在本实现中，我们将ReLU激活函数设为默认选项。该层接收两个输入参数：`in_units`（输入维度）和`units`（输出维度）。


In [11]:
class MyLinear(nn.Module):
    def __init__(self, in_units, units):
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))

    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

In [14]:
class ChiLinear(nn.Module):
    def __init__(self, in_units, units) -> None:
        super().__init__()
        self.weight = nn.Parameter(torch.randn(in_units, units))
        self.bias = nn.Parameter(torch.randn(units,))
        
    def forward(self, X):
        linear = torch.matmul(X, self.weight.data) + self.bias.data
        return F.relu(linear)

Next, we instantiate the `MyLinear` class
and access its model parameters.

接下来我们实例化`MyLinear`类并访问其模型参数。

In [16]:
linear = MyLinear(5, 3)
linear.weight

Parameter containing:
tensor([[ 0.1445, -0.7544, -0.1263],
        [-0.4094, -0.7189,  0.0768],
        [-0.7084,  1.5421, -0.3582],
        [ 1.2273,  1.2663, -0.3321],
        [ 1.2490,  1.8487, -0.6465]], requires_grad=True)

In [17]:
chi_linear = ChiLinear(5,3)
chi_linear.bias 

Parameter containing:
tensor([-1.1973,  0.4563,  1.3573], requires_grad=True)

We can [**directly carry out forward propagation calculations using custom layers.**]

我们可以[直接使用自定义层执行前向传播计算。]




In [18]:
linear(torch.rand(2, 5))

tensor([[1.6911, 0.0697, 1.6517],
        [1.0260, 1.4526, 1.4564]])

In [19]:
chi_linear(torch.rand(3,5))

tensor([[0.0000, 0.0000, 1.9289],
        [0.0927, 1.7660, 0.9098],
        [0.0000, 0.0000, 2.7838]])

We can also (**construct models using custom layers.**)
Once we have that we can use it just like the built-in fully connected layer.

我们也可以[**使用自定义层构建模型**]。定义完成后，可以像使用内置全连接层一样调用它。

In [20]:
net = nn.Sequential(MyLinear(64, 8), MyLinear(8, 1))
net(torch.rand(2, 64))

tensor([[9.7059],
        [8.4347]])

In [25]:
chi_net = nn.Sequential(ChiLinear(128,256), ChiLinear(256,3))
chi_net(torch.rand(2,128))

tensor([[56.7026,  0.0000, 17.5639],
        [69.1863,  0.0000,  0.0000]])

## Summary

We can design custom layers via the basic layer class. This allows us to define flexible new layers that behave differently from any existing layers in the library.
Once defined, custom layers can be invoked in arbitrary contexts and architectures.
Layers can have local parameters, which can be created through built-in functions.


## Exercises

1. Design a layer that takes an input and computes a tensor reduction,
   i.e., it returns $y_k = \sum_{i, j} W_{ijk} x_i x_j$.
1. Design a layer that returns the leading half of the Fourier coefficients of the data.


## 小结

我们可以通过基本层类设计自定义层。这允许我们定义灵活的新层，其行为与库中现有的任何层不同。
一旦定义完成，自定义层可以在任意上下文中和架构中被调用。
层可以拥有通过内置函数创建的局部参数。

## 练习

1. 设计一个层，使其接受输入并计算张量缩减，即返回$y_k = \sum_{i, j} W_{ijk} x_i x_j$。
1. 设计一个层，使其返回数据傅里叶系数的前半部分。

[Discussions](https://discuss.d2l.ai/t/59)
