In [1]:
import torch
import torch.nn as nn

In [3]:
layer = nn.Linear(5, 5)

In [4]:
# Are initialized by some technique, which is usually the Xavier initialization technique.
layer.weight

Parameter containing:
tensor([[-0.2203,  0.0538, -0.1189,  0.3283, -0.2518],
        [-0.1706, -0.4330,  0.3557,  0.2809, -0.3913],
        [-0.0950, -0.0915, -0.3074,  0.0701, -0.1901],
        [ 0.2460,  0.0159, -0.1827,  0.4065, -0.2886],
        [-0.2148, -0.1444, -0.1781, -0.0085,  0.0493]], requires_grad=True)

##### **Uniform Initialization**
``torch.nn.init.uniform_(tensor, a=0.0, b=1.0)``
- `tensor`: an n-dimensional tensor
- `a`: the lower bound of the uniform distribution
- `b`: the upper bound of the uniform distribution

In [7]:
nn.init.uniform_(layer.weight, a=0.0, b=3.0)

Parameter containing:
tensor([[1.3863, 0.0039, 0.9148, 1.4397, 1.2851],
        [1.6788, 0.7291, 0.9323, 0.1349, 0.5874],
        [2.6834, 1.0229, 0.7339, 2.3156, 0.6781],
        [1.2110, 0.8513, 2.3238, 2.9968, 2.7035],
        [1.5223, 2.2078, 2.3442, 0.6581, 0.5872]], requires_grad=True)

##### **Normal Initialization**
``torch.nn.init.normal_(tensor, mean=0.0, std=1.0)``
- `tensor`: an n-dimensional tensor
- `mean`: the mean of the normal distribution
- `std`: the standard deviation of the normal distribution

In [8]:
nn.init.normal_(layer.weight, mean=0.0, std=1.0)

Parameter containing:
tensor([[-0.3308,  0.5445,  0.3410,  0.9435,  0.5938],
        [-0.8397, -0.7162, -0.9159, -0.0985, -0.5954],
        [ 0.4796, -0.1454, -0.8953, -0.3458,  1.8096],
        [ 0.6664,  1.4431,  0.2402,  0.5557, -0.8578],
        [ 0.3430, -0.6350, -0.0654,  0.2488,  1.0524]], requires_grad=True)

##### **Constant Initialization**
Usually it's very wrong to initialize the weights to a constant value. However, this is useful when you want to initialize your biases.
``torch.nn.init.constant_(tensor, val)``
- `tensor`: an n-dimensional tensor
- `val`: the value to fill the tensor with

In [9]:
nn.init.constant_(layer.bias, 0)

Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True)

In [11]:
nn.init.zeros_(layer.bias)

Parameter containing:
tensor([0., 0., 0., 0., 0.], requires_grad=True)

##### **Xavier Initialization**
``torch.nn.init.xavier_uniform_(tensor, gain=1.0)``

The resulting tensor will have values sampled from $U(-a, a)$ where $a = gain \times \sqrt(\frac{6}{n_{in} + n_{out}})$.

``torch.nn.init.xavier_normal_(tensor, gain=1.0)``

The resulting tensor will have values sampled from $N(0, std^2)$ where $std = gain \times \sqrt(\frac{2}{n_{in} + n_{out}})$.
- `tensor`: an n-dimensional tensor
- `gain`: an optional scaling factor

In [12]:
nn.init.xavier_normal(layer.weight, gain=1.0)

  nn.init.xavier_normal(layer.weight, gain=1.0)


Parameter containing:
tensor([[ 0.1719,  0.4397, -0.4870, -0.2959,  0.4742],
        [ 0.6008, -0.0152, -0.1795,  0.1005,  0.0235],
        [ 0.4696, -0.0157, -0.4571, -0.2879, -0.5624],
        [ 0.1625,  0.0645, -0.5367,  0.0483, -0.0528],
        [-1.0488, -0.0843,  0.5984,  0.0740,  0.0256]], requires_grad=True)