## PyTorch Tutorial 17 - Saving and Loading Models
https://www.youtube.com/watch?v=9L9jEOwRrCg

We have tree different methods

```python
torch.save(arg, PATH)

torch.load(PATH)

model.load_state_dict(arg)

```

We must remember these

1. **torch.save** can save any dictionary and models.

2. **torch.load** 

3. **model.load_state_dict**


```python
#### 2 DIFFERENT WAYS OF SAVING
# 1) lazy way: save whole model
torch.save(model, PATH)
# model class must be defined somewhere
model = torch.load(PATH)
model.eval()
# 2) recommended way: save only the state_dict
torch.save(model.state_dict(), PATH)
# model must be created again with parameters
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()
```

Note that ***model.eval()*** has serialized data bound to the class and exact directory struture when it was saved


In [1]:
import torch
import torch.nn as nn



class Model(nn.Module):
    def __init__(self, n_input_features):
        super(Model, self).__init__()
        self.linear = nn.Linear(n_input_features, 1)
        
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred
    
model = Model(n_input_features = 6)

# train your model..

In [2]:
FILE = "model.pth" # it is a common pratice to name with .pth extension which means pytorch

torch.save(model, FILE)


In [3]:
# we can load our model now
model = torch.load(FILE)
model.eval()  # set in evaluation mode

Model(
  (linear): Linear(in_features=6, out_features=1, bias=True)
)

In [4]:
for param in model.parameters():
    print(param)

Parameter containing:
tensor([[ 0.2125, -0.0848, -0.2367, -0.2764,  0.3437, -0.2928]],
       requires_grad=True)
Parameter containing:
tensor([0.2459], requires_grad=True)




Above was a lazy way - lets see the **recommended way** :

In [5]:
model = Model(n_input_features = 6)
FILE = "model.pth" # it is a common pratice to name with .pth extension which means pytorch
torch.save(model.state_dict(), FILE)  # Note - we are saving model state_dict()

# We will see that this models parameters which are saved , will be same as that
# with "NEW" constucted model below and loaded dict of this save model
for param in model.parameters():
    print(param)

Parameter containing:
tensor([[-0.3336, -0.2853,  0.2218,  0.2969, -0.0191,  0.1007]],
       requires_grad=True)
Parameter containing:
tensor([0.3542], requires_grad=True)


In [6]:
# 
loaded_model  = Model(n_input_features = 6)
loaded_model.load_state_dict(torch.load(FILE))
loaded_model.eval()  # set in evaluation mode

Model(
  (linear): Linear(in_features=6, out_features=1, bias=True)
)

In [7]:
for param in loaded_model.parameters():
    print(param)

Parameter containing:
tensor([[-0.3336, -0.2853,  0.2218,  0.2969, -0.0191,  0.1007]],
       requires_grad=True)
Parameter containing:
tensor([0.3542], requires_grad=True)


### We see above that paramters are same.

Note that when we create model, it is initialized with random parameters.


Let us try below with some learning of parameters.

In [8]:
model = Model(n_input_features = 6)
# train the model ..
learning_rate = 0.01
optimizer     = torch.optim.SGD(model.parameters(), lr = learning_rate)
# Sometimes during training - we may want to save a checkpoint 
print("optimizer.state_dict : {}".format(optimizer.state_dict()))

checkpoint = {
                "epoch": 90,
                "model_state" : model.state_dict(),
                "optim_state" : optimizer.state_dict()
             }

torch.save(checkpoint, "checkpoint.pth")

optimizer.state_dict : {'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [0, 1]}]}


In [9]:
loaded_checkpoint = torch.load("checkpoint.pth")

In [10]:
# Now we have to set up different model and optimizer once again
epoch = loaded_checkpoint["epoch"]

model = Model(n_input_features=6)
optimizer = torch.optim.SGD(model.parameters(), lr = 0) #put 0 , later we load correct rate

# This will load all paremeters in model
model.load_state_dict(checkpoint["model_state"])  

# This will load optimizer state
optimizer.load_state_dict(checkpoint["optim_state"])

print("optimizer.state_dict : {}".format(optimizer.state_dict()))

optimizer.state_dict : {'state': {}, 'param_groups': [{'lr': 0.01, 'momentum': 0, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [0, 1]}]}


We can see that we have **same learning rate** (= 0.01) as in optimizer earlier

In case you save your model to GPU and  you want to load on CPU

Remember that you must call model.eval() to set dropout and batch normalization layers 
to evaluation mode before running inference. Failing to do this will yield 
inconsistent inference results. If you wish to resuming training, 


call model.train() to ensure these layers are in training mode.


### SAVING ON GPU/CPU

```python
# 1) Save on GPU, Load on CPU
device = torch.device("cuda")
model.to(device)
torch.save(model.state_dict(), PATH)
device = torch.device('cpu')
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location=device))


# 2) Save on GPU, Load on GPU
device = torch.device("cuda")
model.to(device)
torch.save(model.state_dict(), PATH)
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.to(device)


# Note: Be sure to use the .to(torch.device('cuda')) function 
# on all model inputs, too!
# 3) Save on CPU, Load on GPU
torch.save(model.state_dict(), PATH)
device = torch.device("cuda")
model = Model(*args, **kwargs)
model.load_state_dict(torch.load(PATH, map_location="cuda:0"))  # Choose whatever GPU device number you want
model.to(device)
# This loads the model to a given GPU device. 
# Next, be sure to call model.to(torch.device('cuda')) to convert the model’s parameter tensors to CUDA tensors
```