# Parameter Management

When we have trained a modle, we may need access to the parameters in order to make future predictions, save our model to disk, or for examination for debugging or gaining scientific insight. 

Often, we can leave these details to the deep learning frameworks, but sometimes we will need to deal with them directly. In this notebook we will look at 
1. Accessing parameters for debugging, diagnostics and visualisation
2. sharing parameters across different model components

In [1]:
import torch 
from torch import nn

In [2]:
net = nn.Sequential(
    nn.LazyLinear(8),
    nn.ReLU(),
    nn.LazyLinear(1)
)

X = torch.rand(size=(2, 4))
net(X).shape



torch.Size([2, 1])

## Parameter Access

When a model is defined via the `Sequential` class, we can access any individual layer by indexing into it as though it were a list. Each layers parameters are conveniently located in its attributes.

For the net defined below, we can see that the model is composed of a tensor of weights, and a single bias.

In [3]:
net[2].state_dict()

OrderedDict([('weight',
              tensor([[-0.0765,  0.3355, -0.1366,  0.1230, -0.2265, -0.0055, -0.1866, -0.1136]])),
             ('bias', tensor([0.2870]))])

### Accessing Individual Parameters

In [7]:
# We can access the individual parameters via the parameter class
# The parameter is a complex object containing all sorts of metadata, so the value must be requested explicitly.

print(net[2].bias)
print(type(net[2].bias))
print()
print(net[2].bias.data)

Parameter containing:
tensor([0.2870], requires_grad=True)
<class 'torch.nn.parameter.Parameter'>

tensor([0.2870])


In [9]:
# We can also access the gradient, though this does not exist yet for this parameter
print(net[2].bias.grad)

None


### All parameters at once

In [10]:
[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

## Tied Parameters

Often, we may wish to share parameters across multiple layers...

In this example, we ensure not just that the values are the same, but that the tensor is literally the same obvject

In [13]:
shared = nn.LazyLinear(8)

net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.LazyLinear(1))
net(X)

print(net[2].weight.data[0] == net[4].weight.data[0])

net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])
