Now that we know how to access the parameters, let's look at how to initialize them properly.

The deep learning framwork provides default random initialization to its layers. However, we often want to initalize our weights according to various other protocols. The framework provides most commonly used protocols, and also allows us to create a custom initializer.

In [1]:
import torch
from torch import nn

By default, PyTorch initializes weight and bias matrices uniformly by drawing from a range that is computed according to the input and output dimension. PyTorch's nn.init module provides a variety of preset initialization methods.

In [2]:
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(), nn.LazyLinear(1))
X = torch.rand(size=(2,4))
net(X).shape



torch.Size([2, 1])

Let's begin by calling on built-in initializers. The code below initializes all weight parameters as Gaussian random variables with standard deviation 0.01, while bias parameters are cleared to zero.

In [3]:
def init_normal(module):
    if type(module) == nn.Linear:
        nn.init.normal_(module.weight, mean=0, std=0.1)
        nn.init.zeros_(module.bias)

net.apply(init_normal)
net[0].weight.data[0], net[0].bias.data[0]

(tensor([-0.0509,  0.0014,  0.0188,  0.0171]), tensor(0.))