# Saving and loading the model
· 主要涉及3个core function：\
1. <font color=green>**torch.save()**</font>: 存serialized object to disk.用Python’s pickle utility来实现serialization. 可以用于：Models, tensors, and dictionaries of all kinds of objects
2. <font color=green>**torch.load()**</font>: 用pickle’s unpickling facilities来deserialize pickled object files到memory.
3. <font color=green>**torch.nn.Module.load_state_dict()**</font>: Loads a model’s parameter dictionary using a deserialized state_dict.

## 1. state_dict
1. **什么是state_dict**：a Python dictionary object that maps each layer to its parameter tensor
2. 哪些module中有state_dict：module objects(即models)和Optimizer objects\
· optimizer中的state_dict存放optimizer的state和超参数
3. 哪些layers在model的state_dict中有对应的entry：\
(1)有learnable parameters的layers，如：convolutional layers, linear layers, etc.\
(2)registered buffers，如：batchnorm’s running_mean

In [4]:
## 例
# Define model
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model
model = TheModelClass()

# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

In [7]:
# model's state_dict：包括weights和bias
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

# 分隔线
print('-' * 50)

# optimizer's state_dict：包括state和超参数
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

Model's state_dict:
conv1.weight 	 torch.Size([6, 3, 5, 5])
conv1.bias 	 torch.Size([6])
conv2.weight 	 torch.Size([16, 6, 5, 5])
conv2.bias 	 torch.Size([16])
fc1.weight 	 torch.Size([120, 400])
fc1.bias 	 torch.Size([120])
fc2.weight 	 torch.Size([84, 120])
fc2.bias 	 torch.Size([84])
fc3.weight 	 torch.Size([10, 84])
fc3.bias 	 torch.Size([10])
--------------------------------------------------
Optimizer's state_dict:
state 	 {}
param_groups 	 [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'maximize': False, 'foreach': None, 'differentiable': False, 'params': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}]


## 2. saving & loading Model for Inference
### 2.1 save/load state_dict
· 建议用这种方式

In [16]:
# save
PATH = 'rk_models/savedClassModelState.pt'   # 路径的文件名后缀一般取pt或者pth
torch.save(model.state_dict(), PATH)

# load
model = TheModelClass()
model.load_state_dict(torch.load(PATH)) # 先用torch.load(PATH)是load整个model
model.eval()                            # 一定要切换到evaluation mode

TheModelClass(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

### 2.2 save/load entire model
最好不用这种方式，这种方式的缺点：\
the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

In [17]:
# save
PATH = 'rk_models/savedClassModel.pt'   # 路径的文件名后缀一般取pt或者pth
torch.save(model, PATH)

# load
model = torch.load(PATH)
model.eval()

TheModelClass(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

### 2.3 export/load model in transcript format
规模化的推理和部署建议用这种方式。因为，此时model可以在python和高性能的c++环境中运行。 you will be able to load the exported model and run inference without defining the model class.

In [18]:
# export:
model_scripted = torch.jit.script(model) # Export to TorchScript
model_scripted.save('model_scripted.pt') # Save

# load:
model = torch.jit.load('model_scripted.pt')
model.eval()

RecursiveScriptModule(
  original_name=TheModelClass
  (conv1): RecursiveScriptModule(original_name=Conv2d)
  (pool): RecursiveScriptModule(original_name=MaxPool2d)
  (conv2): RecursiveScriptModule(original_name=Conv2d)
  (fc1): RecursiveScriptModule(original_name=Linear)
  (fc2): RecursiveScriptModule(original_name=Linear)
  (fc3): RecursiveScriptModule(original_name=Linear)
)

## 3. saving & loading checkpoint for Inference/Resuming trainning
1. 此时要保存的内容包括：\
(1)model的state_dict \
(2)optimizer的state_dict，因为它包括了buffers and parameters that are updated as the model trains.\
(3)当前epoch \
(4)最近的training loss \
(5)外部的torch.nn.Embedding layers，等等
2. 由于保存的内容多，所以存checkpoint的大小通常比只存model更大，一般2-3倍
3. 存储的时候，将不同的这些内容用dictionary的结构存储，一般存为后缀.tar的文件名中
4. load的时候，先load model和optimizer，然后根据需要从dictionary中load其他item

In [None]:
# save
torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
            ...
            }, PATH)
# load
model = TheModelClass(*args, **kwargs)
optimizer = TheOptimizerClass(*args, **kwargs)

checkpoint = torch.load(PATH)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# - or -
model.train()

## 4. saving multiple models in one file
和存checkpoint相似

In [None]:
# save
torch.save({
            'modelA_state_dict': modelA.state_dict(),
            'modelB_state_dict': modelB.state_dict(),
            'optimizerA_state_dict': optimizerA.state_dict(),
            'optimizerB_state_dict': optimizerB.state_dict(),
            ...
            }, PATH)

# load
modelA = TheModelAClass(*args, **kwargs)
modelB = TheModelBClass(*args, **kwargs)
optimizerA = TheOptimizerAClass(*args, **kwargs)
optimizerB = TheOptimizerBClass(*args, **kwargs)

checkpoint = torch.load(PATH)
modelA.load_state_dict(checkpoint['modelA_state_dict'])
modelB.load_state_dict(checkpoint['modelB_state_dict'])
optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])

modelA.eval()
modelB.eval()
# - or -
modelA.train()
modelB.train()

## 5. warmstarting model using parameters from another model

In [None]:
# save
torch.save(modelA.state_dict(), PATH)
# load
modelB = TheModelBClass(*args, **kwargs)
modelB.load_state_dict(torch.load(PATH), strict=False)

## 6. saving & loading model across devices
1. transfer learning中用得多。可以是loading from a partial state_dict, which is missing some keys,或者loading a state_dict with more keys than the model that you are loading into。这两种情况下都可以设置'strict =False'来ignore non-matching keys.

2. 如果想要load parameters from one layer to another, 但有的keys不match, 只要改变被loading的state_dict中的parameter的key name，使他们与model that you are loading into中的key name相match就行

### 6.1 save on GPU, load on CPU

In [None]:
# save
torch.save(model.state_dict(), PATH)
# load
device = torch.device('cpu')
model = TheModelClass(*args, **kwargs)

model.load_state_dict(torch.load(PATH, map_location=device))

### 6.2 save on GPU, load on GPU

In [None]:
# save
torch.save(model.state_dict(), PATH)
# load
device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)

model.load_state_dict(torch.load(PATH))
model.to(device)
# Make sure to call input = input.to(device) on any input tensors that you feed to the model

### 6.3 save on CPU, load on GPU

In [None]:
# save
torch.save(model.state_dict(), PATH)
# load
device = torch.device("cuda")
model = TheModelClass(*args, **kwargs)

model.load_state_dict(torch.load(PATH, map_location="cuda:0"))
model.to(device)
# Make sure to call input = input.to(device) on any input tensors that you feed to the model

### 6.4 saving torch.nn.DataParallel Models

In [None]:
# save
torch.save(model.module.state_dict(), PATH)
# load
# Load to whatever device you want

## Saving and loading model weights
1. pytorch model可以将weights存储在model的internal state dictionary中。方法是用**torch.save** method，之后weights就会存入**state_dict**
2. 如果要加载这些weights，可以先创建一个相同model的实例，然后用**load_stat_dict** method来加载权重参数。
3. 由于weights只能用在生成它的相同的网络结构上，所以在保存weights的时候，一般还要保存model structure。

In [1]:
import torch
import torchvision.models as models

# saving
model = models.vgg16(weights='IMAGENET1K_V1')
torch.save(model.state_dict(), 'model_weights.pth')

  warn(
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/roark/.cache/torch/hub/checkpoints/vgg16-397923af.pth
51.8%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

58.0%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

64.7%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.io

In [2]:
# loading
model = models.vgg16() # we do not specify ``weights``, i.e. create untrained model
model.load_state_dict(torch.load('model_weights.pth'))
# 在inference之前，一定要调用eval() method。
# 作用是set dropout and Batchnorm layers到evaluation mode
model.eval()

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [3]:
# 保存model
torch.save(model, 'model.pth')
# load model
model = torch.load('model.pth')