# Parameter Management

## Section Summary
This section discusses parameter management in deep learning and covers accessing, manipulating, and sharing parameters across different model components. It explains how to access and manipulate parameters in PyTorch, using an MLP with one hidden layer as an example. The article shows how to access parameters in a sequential model and provides code examples to extract and manipulate individual parameters. It also demonstrates how to access all parameters at once and how to share parameters across different model components.





In [1]:
import torch
from torch import nn

In [2]:
net = nn.Sequential(nn.LazyLinear(8),
                    nn.ReLU(),
                    nn.LazyLinear(1))

X = torch.rand(size=(2, 4))
net(X).shape



torch.Size([2, 1])

## [**Parameter Access**]
:label:`subsec_param-access`



In [3]:
net[2].state_dict()

OrderedDict([('weight',
              tensor([[ 0.2116, -0.3472, -0.2912, -0.1954,  0.2635, -0.2421, -0.2910,  0.1562]])),
             ('bias', tensor([0.0848]))])



### [**Targeted Parameters**]


In [4]:
type(net[2].bias), net[2].bias.data

(torch.nn.parameter.Parameter, tensor([0.0848]))

In [5]:
net[2].weight.grad == None

True

### [**All Parameters at Once**]


In [6]:
[(name, param.shape) for name, param in net.named_parameters()]

[('0.weight', torch.Size([8, 4])),
 ('0.bias', torch.Size([8])),
 ('2.weight', torch.Size([1, 8])),
 ('2.bias', torch.Size([1]))]

## [**Tied Parameters**]


In [7]:
# We need to give the shared layer a name so that we can refer to its
# parameters
shared = nn.LazyLinear(8)
net = nn.Sequential(nn.LazyLinear(8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.LazyLinear(1))

net(X)
# Check whether the parameters are the same
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# Make sure that they are actually the same object rather than just having the
# same value
print(net[2].weight.data[0] == net[4].weight.data[0])

tensor([True, True, True, True, True, True, True, True])
tensor([True, True, True, True, True, True, True, True])
