# Lazy Initialization

In the previous sections, we have done a few things which appear sloppy.

- We defined network architectures without specifying the input dimensionality
- We added layers without specifying the output dimension of the previous layer
- We initialized these parameters without giving enough information to initialize them

We were able to do this because the framework defers initialization to the first time a tensor is run through the network; the dimensionality required for initalization is provided at runtime. This is particularly convenient when working with convolutional neural networks, where the dimensionality of the input (i.e. the resolution of an image) will affect the dimensionality of each subsequent layer. 

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

In [2]:
# Begin by instantiating an MLP

net = nn.Sequential(nn.LazyLinear(256), nn.ReLU(), nn.LazyLinear(10))



In [3]:
# Because we have only specified the output dimension of the first layer (256) the network cannot possibly know the input dimensionality, so there can't be any weights. 

# Confirm this..
net[0].weight

<UninitializedParameter>

In [6]:
# We can finish the initialization by passing data throught he parameters
X = torch.rand(2, 20)
net(X)

net[0].weight, net[0].weight.shape

(Parameter containing:
 tensor([[ 0.0730, -0.1219,  0.0104,  ..., -0.1055,  0.2209, -0.0531],
         [ 0.0086, -0.0976, -0.0732,  ...,  0.0623, -0.1105,  0.1594],
         [ 0.0951, -0.1505, -0.1371,  ..., -0.1153, -0.0783, -0.1681],
         ...,
         [ 0.0842, -0.2100,  0.0313,  ...,  0.1497, -0.0366,  0.0430],
         [ 0.1479, -0.1569, -0.0685,  ..., -0.1573, -0.0784, -0.1526],
         [-0.1824,  0.1225, -0.0379,  ..., -0.1767,  0.1990,  0.0970]],
        requires_grad=True),
 torch.Size([256, 20]))