In [1]:
# So far we have discussed how to process data and how to build, train, and test deep learning models. 
# However, at some point we will hopefully be happy enough with the learned models that we will want to 
# save the results for later use in various contexts (perhaps even to make predictions in deployment). 
# Additionally, when running a long training process, the best practice is to periodically save intermediate results (checkpointing) 
# to ensure that we do not lose several days’ worth of computation if we trip over the power cord of our server. 
# Thus it is time to learn how to load and store both individual weight vectors and entire models. This section addresses both issues.
import torch
from torch import nn
from torch.nn import functional as F

In [2]:
# For individual tensors, we can directly invoke the load and save functions to read and write them respectively. 
# Both functions require that we supply a name, and save requires as input the variable to be saved.

x = torch.arange(4)
torch.save(x, 'x-file')

In [3]:
x2 = torch.load('x-file')
x2

tensor([0, 1, 2, 3])

In [4]:
# We can store a list of tensors and read them back into memory.
y = torch.zeros(4)
torch.save([x, y], 'x-files')
x2, y2 = torch.load('x-files')
(x2, y2)

(tensor([0, 1, 2, 3]), tensor([0., 0., 0., 0.]))

In [5]:
# We can even write and read a dictionary that maps from strings to tensors. This is convenient when we want to read or write all the weights in a model.
mydict = {'x': x, 'y': y}
torch.save(mydict, 'mydict')
mydict2 = torch.load('mydict')
mydict2

{'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}

In [6]:
# Saving individual weight vectors (or other tensors) is useful, 
# but it gets very tedious if we want to save (and later load) an entire model. 
# After all, we might have hundreds of parameter groups sprinkled throughout. 
# For this reason the deep learning framework provides built-in functionalities to load and save entire networks. 
# An important detail to note is that this saves model parameters and not the entire model. 
# For example, if we have a 3-layer MLP, we need to specify the architecture separately. 
# The reason for this is that the models themselves can contain arbitrary code, 
# hence they cannot be serialized as naturally. Thus, in order to reinstate a model, 
# we need to generate the architecture in code and then load the parameters from disk. Let’s start with our familiar ML
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.LazyLinear(256)
        self.output = nn.LazyLinear(10)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))


net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)



In [7]:
torch.save(net.state_dict(), 'mlp.params')

In [8]:
clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

MLP(
  (hidden): LazyLinear(in_features=0, out_features=256, bias=True)
  (output): LazyLinear(in_features=0, out_features=10, bias=True)
)

In [9]:
Y_clone = clone(X)
Y_clone == Y

tensor([[True, True, True, True, True, True, True, True, True, True],
        [True, True, True, True, True, True, True, True, True, True]])

1. Practical benefits of storing model parameters:

Even if there is no need to deploy trained models to a different device, storing model parameters has several practical benefits:

- **Model checkpointing**: During the training process, you can save intermediate model parameters as checkpoints. This allows you to resume training from the last checkpoint if the training process is interrupted or if you want to fine-tune the model with different hyperparameters.
- **Model evaluation**: Storing model parameters allows you to evaluate the model's performance on different datasets or tasks without retraining the model each time.
- **Transfer learning**: By storing model parameters, you can use pre-trained models as a starting point for training new models on related tasks or datasets. This can save a significant amount of training time and computational resources.
- **Model ensembling**: Storing model parameters allows you to combine multiple models (e.g., trained with different hyperparameters or initializations) to create an ensemble model, which can improve performance and generalization.
- **Model versioning**: Saving model parameters allows you to keep track of different versions of the model during development, making it easier to compare their performance and revert to a previous version if needed.

2. Reusing parts of a network in a new network with a different architecture:

To reuse parts of a network in a new network with a different architecture, you can simply create a new network that includes the desired layers from the previous network. Here's an example of how to use the first two layers from a previous network in a new network:

```python
import torch.nn as nn

class OldNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 20),
            nn.ReLU(),
            nn.Linear(20, 30),
            nn.ReLU(),
            nn.Linear(30, 5)
        )

    def forward(self, x):
        return self.layers(x)

class NewNetwork(nn.Module):
    def __init__(self, old_network):
        super().__init__()
        self.first_two_layers = nn.Sequential(*list(old_network.layers.children())[:2])
        self.new_layers = nn.Sequential(
            nn.Linear(20, 40),
            nn.ReLU(),
            nn.Linear(40, 5)
        )

    def forward(self, x):
        x = self.first_two_layers(x)
        return self.new_layers(x)

old_network = OldNetwork()
new_network = NewNetwork(old_network)
```

In this example, we create a new network `NewNetwork` that takes an instance of `OldNetwork` as input. The new network uses the first two layers from the old network and adds new layers with a different architecture.

3. Saving the network architecture and parameters:

To save the network architecture and parameters, you can use a combination of Python's `pickle` module and PyTorch's `state_dict()` method. The `state_dict()` method returns a dictionary containing the model's parameters, while `pickle` can be used to save and load the model architecture.

However, using `pickle` to save the entire model (including architecture) can be problematic due to potential compatibility issues between different PyTorch versions or environments. A safer approach is to save only the model's `state_dict` and recreate the model architecture manually when loading the model.

```python
# Save the model parameters
torch.save(new_network.state_dict(), 'new_network_parameters.pt')

# Load the model parameters
loaded_parameters = torch.load('new_network_parameters.pt')

# Recreate the model architecture and load the parameters
old_network = OldNetwork()
new_network = NewNetwork(old_network)
new_network.load_state_dict(loaded_parameters)
```

Regarding restrictions on the architecture, it's essential to ensure that the model architecture is compatible with the saved parameters when loading the model. This means that the layers' dimensions, types, and order should match the saved parameters. If there are any mismatches, you may encounter errors when loading the parameters or during the forward pass of the model.