# Lazy Initialization
:label:`sec_lazy_init`

So far, it might seem that we got away
with being sloppy in setting up our networks.
Specifically, we did the following unintuitive things,
which might not seem like they should work:

* We defined the network architectures
  without specifying the input dimensionality.
* We added layers without specifying
  the output dimension of the previous layer.
* We even "initialized" these parameters
  before providing enough information to determine
  how many parameters our models should contain.

You might be surprised that our code runs at all.
After all, there is no way the deep learning framework
could tell what the input dimensionality of a network would be.
The trick here is that the framework *defers initialization*,
waiting until the first time we pass data through the model,
to infer the sizes of each layer on the fly.


Later on, when working with convolutional neural networks,
this technique will become even more convenient
since the input dimensionality
(e.g., the resolution of an image)
will affect the dimensionality
of each subsequent layer.
Hence the ability to set parameters
without the need to know,
at the time of writing the code,
the value of the dimension
can greatly simplify the task of specifying
and subsequently modifying our models.
Next, we go deeper into the mechanics of initialization.


# 延迟初始化
:label:`sec_lazy_init`

到目前为止，我们建立网络时似乎避开了以下这些看似矛盾的需求：
* 在定义网络架构时无需指定输入维度
* 添加层时无需告知前一层的输出维度
* 在信息不足的情况下仍能"初始化"参数

这种看似不合理的做法能够成功，核心在于深度学习框架的**延迟初始化**机制。该机制的工作流程如下：

1. **架构定义阶段**  
   框架允许开发者：
   - 定义网络结构时不指定输入维度
   - 添加层时不显式声明前一层的输出维度
   - 创建未完全确定的参数对象

2. **数据驱动推断**  
   当首次通过模型传递数据时：
   - 框架自动推导各层的输入/输出维度
   - 根据实际数据形状确定参数张量的具体维度
   - 执行真正的参数初始化

3. **动态适应优势**  
   这种机制特别适合：
   - 卷积神经网络（输入分辨率影响后续层维度）
   - 可变尺寸输入场景（如图像分类中的不同分辨率）
   - 快速原型设计（方便修改网络结构）

例如在卷积网络中：
- 输入图像尺寸 → 决定各卷积层的特征图尺寸
- 特征图尺寸 → 决定全连接层的输入维度
- 框架自动完成这些维度计算，开发者只需关注层间逻辑关系

该设计带来的实际效益：
- 提高代码可维护性（维度变更只需修改输入配置）
- 增强模型灵活性（自动适应不同输入规格）
- 简化开发流程（无需手动计算各层维度）

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

To begin, let's instantiate an MLP.

首先，我们实例化一个多层感知机（MLP）。

In [2]:
net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))

At this point, the network cannot possibly know
the dimensions of the input layer's weights
because the input dimension remains unknown.

此时，由于输入维度仍未确定，网络无法确定输入层权重的具体维度。

Consequently the framework has not yet initialized any parameters.
We confirm by attempting to access the parameters below.  
因此，框架尚未初始化任何参数。我们可以通过尝试访问下方参数来验证这一点。

In [3]:
net[0].weight

<UninitializedParameter>

Next let's pass data through the network
to make the framework finally initialize parameters.


In [3]:
X = torch.rand(2, 20)
net(X)

net[0].weight.shape

torch.Size([256, 20])

As soon as we know the input dimensionality,
20,
the framework can identify the shape of the first layer's weight matrix by plugging in the value of 20.
Having recognized the first layer's shape, the framework proceeds
to the second layer,
and so on through the computational graph
until all shapes are known.
Note that in this case,
only the first layer requires lazy initialization,
but the framework initializes sequentially.
Once all parameter shapes are known,
the framework can finally initialize the parameters.

一旦我们获知输入维度为20，框架可以通过代入该值来确定第一层权重矩阵的形状。在识别第一层的形状后，框架会继续处理第二层，沿着计算图依次推导直至所有层的形状都被确定。值得注意的是，虽然本例中只有第一层需要延迟初始化，但框架会按顺序执行完整的初始化流程。当所有参数形状都确定后，框架最终将完成参数的初始化过程。

The following method
passes in dummy inputs
through the network
for a dry run
to infer all parameter shapes
and subsequently initializes the parameters.
It will be used later when default random initializations are not desired.


以下方法
通过向网络传入虚拟输入
进行一次干运行
以推断所有参数形状
并随后初始化参数。
该方法将在后续需要非默认随机初始化时使用。

In [4]:
@d2l.add_to_class(d2l.Module)  #@save
def apply_init(self, inputs, init=None):
    self.forward(*inputs)
    if init is not None:
        self.net.apply(init)

## Summary

Lazy initialization can be convenient, allowing the framework to infer parameter shapes automatically, making it easy to modify architectures and eliminating one common source of errors.
We can pass data through the model to make the framework finally initialize parameters.


## Exercises

1. What happens if you specify the input dimensions to the first layer but not to subsequent layers? Do you get immediate initialization?
1. What happens if you specify mismatching dimensions?
1. What would you need to do if you have input of varying dimensionality? Hint: look at the parameter tying.


## 总结

延迟初始化可以非常方便，允许框架自动推断参数形状，使修改架构变得容易，并消除了一个常见的错误源。
我们可以通过模型传递数据，使框架最终初始化参数。

## 练习

1. 如果你为第一层指定输入维度但不为后续层指定会发生什么？你会得到立即初始化吗？
2. 如果你指定不匹配的维度会发生什么？
3. 如果你有不同维度的输入，你需要做什么？提示：查看参数绑定。


[Discussions](https://discuss.d2l.ai/t/8092)
