# Transfer Learning + Model 저장하고 불러오기

## Model 저장하고 불러오기

#### (0) Background
* model.state_dict(): torch.nn.Module 를 상속받은 모델의 학습 가능한 매개변수(ex. weight, bias)가 python dict형태로 저장된 객체

    ex)
    ```
    for param_tensor in model.state_dict():
        print(param_tensor, "\t", model.state_dict()[param_tensor].size())
    >>>
    Model's state_dict:
    conv1.weight     torch.Size([6, 3, 5, 5])
    conv1.bias   torch.Size([6])
    ```

#### (1) save / load 기본구조
저장하기: 저장하고자 하는 python OBJ를 pickle을 사용하여 binary로 저장
    * pickle: python obj를 binary형태로 직렬화(serialize)

ex)
```
torch.save(model.state_dict(), PATH)
```

불러오기: 불러오고자 하는 binary형태의 python OBJ를 pickle을 사용하여 병렬화(de-serialize)**한 후** model에 로드

ex)
```
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# 방법 1
ckpt = torch.load('results/pths/NAME.pth', map_location=device)
model.load_state_dict(ckpt)
# 방법 2
model.load_state_dict(torch.load('results/pths/NAME.pth', map_location=device))
```

#### (2) 상황에 따른 save / load 방법
1. state_dict 만 저장(권장) - **PATH 확장자는 .pt / .pth 가 일반적**

    ex 저장하기)
    ```
    torch.save(model.state_dict(), PATH)
    ```

    ex 불러오기) **model 초기화를 먼저 수행해줘야**
    ```
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    model = TheModelClass(*args, **kwargs)
    model = model.to(device)
    model.load_state_dict(torch.load(PATH))
    ```
<hr>
1. 학습 재개를 위한 Model 저장하기 (**PATH 확장자는 .tar 가 일반적**) - 학습 재개를 위해 ckpt를 저장할 때는 state_dict 뿐만 아니라 학습에 필요한 다른 정보(ex. optimizer state_dict, epoch, loss 등)도 함께 저장 

    ex 저장하기)
    ```
    torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'loss': loss,
                ...
                }, PATH)
    ```

    ex 불러오기) **model, optimizer 초기화를 먼저 수행해줘야**
    ```
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    model = TheModelClass(*args, **kwargs)
    model.to(device)
    optimizer = TheOptimizerClass(*args, **kwargs)

    checkpoint = torch.load(PATH)
    model.load_state_dict(checkpoint['model_state_dict'])
    optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
    epoch = checkpoint['epoch']
    loss = checkpoint['loss']

    # then take either action
    model.eval()
    # or
    model.train()
    ```
<hr>
1. 여러 Model 한번에 저장하기(**PATH 확장자는 .tar 가 일반적**) - 여러개의 torch.nn.Module을 상속받은 모델들을 저장할 때 

    ex 저장하기)
    ```
    torch.save({
                'modelA_state_dict': modelA.state_dict(),
                'modelB_state_dict': modelB.state_dict(),
                'optimizerA_state_dict': optimizerA.state_dict(),
                'optimizerB_state_dict': optimizerB.state_dict(),
                ...
                }, PATH)
    ```

    ex 불러오기) **model들, optimizer들 초기화를 먼저 수행해줘야**
    ```
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    modelA = TheModelAClass(*args, **kwargs)
    modelA.to(device)
    modelB = TheModelBClass(*args, **kwargs)
    modelB.to(device)
    optimizerA = TheOptimizerAClass(*args, **kwargs)
    optimizerB = TheOptimizerBClass(*args, **kwargs)

    checkpoint = torch.load(PATH)
    modelA.load_state_dict(checkpoint['modelA_state_dict'])
    modelB.load_state_dict(checkpoint['modelB_state_dict'])
    optimizerA.load_state_dict(checkpoint['optimizerA_state_dict'])
    optimizerB.load_state_dict(checkpoint['optimizerB_state_dict'])

    # then take either action
    modelA.eval()
    modelB.eval()
    # - or -
    modelA.train()
    modelB.train()
    ```
<hr>
1. Model 전체 저장(비권장)

    ex 저장하기)
    ```
    torch.save(model, PATH)
    ```
    
    ex 불러오기) 
    ```
    model = torch.load(PATH)
    ```
    

## Transfer Learning - ex) CIFAR10 pre-trained model to FONT-50 (final project dataset)

#### (0) Find suitable pre-trained model to our cusom dataset

<img src="../../shared/TL_final.png" alt="Drawing" style="width: 1000px;" align="left"/>

#### (1) Load Pre-trained Model

In [1]:
from cnn import ConvNet
import torch
import torch.nn as nn

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [2]:
model = ConvNet().to(device)
model.load_state_dict(torch.load('./pths/cifar10_pre_model.pth', map_location=device))

<All keys matched successfully>

In [3]:
print("===== Loaded Pre-trained Model =====", "\n", model)

===== Loaded Pre-trained Model ===== 
 ConvNet(
  (layer1): Sequential(
    (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=2048, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=10, bias=True)
)


#### (2) Edit Model (Freeze + Edit)

* Freeze Loaded Model's Parameters

In [4]:
for params in model.parameters():
    params.requires_grad = False

* Edit Loaded Model

In [5]:
model.fc2 = nn.Linear(120, 50)

model = model.to(device) # optional for running on CPU or CUDA

In [6]:
print("===== Loaded Pre-trained Model =====", "\n", model)

===== Loaded Pre-trained Model ===== 
 ConvNet(
  (layer1): Sequential(
    (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=2048, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=50, bias=True)
)


In [7]:
# cf) See what model.parameters are
for params in model.parameters():
    print(params.size())

torch.Size([16, 3, 5, 5])
torch.Size([16])
torch.Size([16])
torch.Size([16])
torch.Size([32, 16, 5, 5])
torch.Size([32])
torch.Size([32])
torch.Size([32])
torch.Size([120, 2048])
torch.Size([120])
torch.Size([50, 120])
torch.Size([50])


#### (3) Train

In [8]:
from cnn import ConvNet
from font_dataset import FontDataset
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as transforms
import os


lr = 0.001
num_epochs = 1
batch_size = 100

### Config
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### Load Data
train_dir = '~/datasets/font/npy_train'.replace('~', os.path.expanduser('~'))
train_data = FontDataset(train_dir)

test_dir = '~/datasets/font/npy_test'.replace('~', os.path.expanduser('~'))
test_data = FontDataset(test_dir)

### Define Dataloader
train_loader = torch.utils.data.DataLoader(dataset=train_data,
                                           batch_size=batch_size)

test_loader = torch.utils.data.DataLoader(dataset=test_data,
                                           batch_size=batch_size)

### Define Model and Load Params
model = ConvNet().to(device)
print("========================== Original Model =============================", "\n", model)
model.load_state_dict(torch.load('./pths/cifar10_pre_model.pth', map_location=device))

### User pre-trained model and Only change last layer
for param in model.parameters():
    param.requires_grad = False

model.fc2 = nn.Linear(120, 50)
modle = model.to(device)

print("========================== Modified Model =============================", "\n", model)

### Define Loss and Optim
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

### Train
if __name__ == '__main__':
    total_step = len(train_loader)
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(train_loader):
            images = images.to(device)
            labels = labels.to(device)

            outputs = model(images).to(device)

            loss = criterion(outputs, labels)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            # Print Loss for Tracking Training
            if (i+1) % 100 == 0:
                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()))
                test_image, test_label = next(iter(test_loader))
                _, test_predicted = torch.max(model(test_image.to(device)).data, 1)

    # Test after Training is done
    model.eval() # Set model to Evaluation Mode (Batchnorm uses moving mean/var instead of mini-batch mean/var)
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in test_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        print('Accuracy of the network on the {} test images: {} %'.format(len(test_loader)*batch_size, 100 * correct / total))

 ConvNet(
  (layer1): Sequential(
    (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (layer2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc1): Linear(in_features=2048, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=10, bias=True)
)
 ConvNet(
  (layer1): Sequential(
    (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, c

  x = F.softmax(self.fc2(x))


Epoch [1/1], Step [100/200], Loss: 3.8136
Epoch [1/1], Step [200/200], Loss: 3.6390
Accuracy of the network on the 10000 test images: 25.15 %


<hr>

## ref

* [pytorch.org - transfer learning for computer vision tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)