## 모델 저장과 로드  
여러분은 클라우드를 사용하면서 세션이 끊어지는 것을 한 번쯤은 경험해보셨을 것입니다.  
이때, 만약 학습한 가중치를 저장하지 않는다면 몇 시간을 학습한 것이 날아갈 것입니다.   
이번에는 학습 과정에서 모델을 저장하는 방법과, 학습 전에 모델을 불러오는 방법을 배우겠습니다.  

## Quiz (Easy)  
0) run_cnn2 파일을 만들어서 기존의 코드를 리팩터링 해봅시다.  
1) 앞에서 배웠던 argparser를 이용해 config_path, save_path, pre_trained, model_name 인자를 추가하세요  
2) 상위 폴더에 weights 폴더를 만드세요.   
3) save_path의 default 값은 './weights'이고 config_path의 default는 './configs' 입니다.  
4) pre_trained의 type은 bool이고 defaut 값은 False 입니다.  
 

In [1]:
%%writefile arg_tutorial3.py

import yaml
import os
import argparse
    
parser = argparse.ArgumentParser(description='quiz')
parser.add_argument('--config_path', type=str, default='./configs/', help='config_path')
parser.add_argument('--save_path', type=str, default='./weights/', help='save_path')
parser.add_argument('--pre_trained', type=bool, default=False, help='pre_trained')
parser.add_argument('--model_name', type=str, default='cnn.pth', help='model_name')
args = parser.parse_args()

# 1) args를 출력하세요. 
print(args)
# 2) args들 중 config_path를 통해 yaml 파일을 with open구문을 활용해 불러오고 config 변수에 할당하세요.
#    yaml.load()를 활용합니다. 
with open(args.config_path) as f:
    config = yaml.load(f, Loader=yaml.FullLoader)
# 3) config를 출력하세요. 
print(config)
# 마지막으로 셀을 저장하고 파일을 실행해보세요. 



Writing arg_tutorial3.py


In [4]:
!python arg_tutorial3.py --help

usage: arg_tutorial3.py [-h] [--config_path CONFIG_PATH]
                        [--save_path SAVE_PATH] [--pre_trained PRE_TRAINED]
                        [--model_name MODEL_NAME]

quiz

optional arguments:
  -h, --help            show this help message and exit
  --config_path CONFIG_PATH
                        config_path
  --save_path SAVE_PATH
                        save_path
  --pre_trained PRE_TRAINED
                        pre_trained
  --model_name MODEL_NAME
                        model_name


In [2]:
!python arg_tutorial3.py --config_path ./configs/cnn.yaml --model_name cnn.pth

Namespace(config_path='./configs/cnn.yaml', model_name='cnn.pth', pre_trained=False, save_path='./weights/')
{'learning_rate': 0.001, 'epochs': 10, 'batch_size': 32, 'kernel_size': 3, 'stride': 2}


## Quiz (Easy)  
모델을 로드하고 저장하는 부분을 구현하기 위해 train, test 코드를 수정해야 합니다.  
아래에서 어떤 부분에 추가해야할까요??  

In [None]:
def train(epoch, model, loss_func, train_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        loss = loss_func(y_pred, y)
        loss.backward()
        optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {loss.item():.6f}')
            

def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')

## Save, Load  
모델의 저장과 로드는 torch.load_state_dict(), torch.load(), torch.save()를 활용합니다.  

In [8]:
from models.CNN import CNN
cnn = CNN(C=1, W=28, H=28, K=3, S=2) 

# state_dict() : 모델의 상태 딕셔너리를 반환한다. 
print(cnn.state_dict())

13
6
2
OrderedDict([('conv1.weight', tensor([[[[ 9.5682e-02, -8.6843e-02, -2.8802e-01],
          [-1.8895e-01,  1.1838e-01, -2.0885e-02],
          [-3.0595e-01,  2.4018e-01, -2.7028e-01]]],


        [[[ 3.1410e-01,  7.4856e-02,  1.2415e-01],
          [ 2.7332e-01,  2.6599e-01,  9.8109e-02],
          [ 3.3008e-01,  1.1670e-01,  8.4203e-03]]],


        [[[ 2.3589e-01, -1.6729e-01,  2.5012e-01],
          [ 3.6465e-02, -2.1320e-01,  2.6948e-01],
          [-9.3775e-02,  2.9670e-01, -3.0916e-01]]],


        [[[ 2.0486e-02,  3.3180e-01, -1.3820e-01],
          [ 1.4076e-02, -7.1417e-02, -1.2187e-01],
          [ 1.5870e-01,  2.7203e-01, -1.0635e-01]]],


        [[[-2.2986e-01, -1.6460e-01, -1.2521e-01],
          [-1.0145e-01, -4.3318e-02,  3.2621e-01],
          [-1.2050e-01, -1.2446e-01, -1.3269e-01]]],


        [[[ 1.6695e-01, -2.6399e-01,  9.6524e-02],
          [ 5.7806e-02,  1.9964e-01, -3.4832e-02],
          [-1.2158e-01,  2.7812e-02, -4.5214e-02]]],


        [[[ 2.1840e-0

In [9]:
# if pre_trained:
#     model_dict = torch.load(save_path+model_name)
#     model.load_state_dict(model_dict)

def train(epoch, model, loss_func, train_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        loss = loss_func(y_pred, y)
        loss.backward()
        optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {loss.item():.6f}')
            
            
    #torch.save(모델 파라미터 정보, 경로+이름)
    torch.save(model.state_dict(), save_path + model_name)
            

def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')

In [24]:
!python run_cnn3.py --config_path ./configs/cnn.yaml

^C


In [12]:
for epoch in range(epochs):
    train(epoch, cnn, ce_loss, train_loader, valid_loader, optimizer)

NameError: name 'ce_loss' is not defined

# 모델 불러오기

In [30]:
# 모델 불러오기
# torch.load()
import torch
import os
save_path = os.curdir + '/weights/'
model_name = 'cnn.pth'
state_dict = torch.load(save_path+model_name)
#print(state_dict)



In [31]:
# 토치 모델의 load_state_dict()를 활용
cnn.load_state_dict(state_dict)

# load는 언제 활용할까?
# 1) 학습된 모델을 평가하기 위해 불러올 때
# 2) 학습을 하다가 끊어졌을 때 다시 불러옴.
# 3) 전이학습 (학습된 가중치를 불러와서 특정 영역의 레이어만을 학습)

<All keys matched successfully>

## Tensorboard  
tensorboard는 모델학습 과정의 loss나 기타 지표를 확인해서 학습이 잘되고 있는지, 모델 테스트 성능이  
어떻게 나오는지를 시각화해줍니다.   

In [32]:
!pip install tensorboard



In [33]:
from torch.utils.tensorboard import SummaryWriter

먼저 runs 폴더를 만들고 그 안에 cnn 폴더를 만들어주세요.  

In [None]:
# writer를 정의합시다. 
writer = SummaryWriter('runs/cnn/')

# writer.add_scalar를 통해서 손실함수 값, 또는 정확도를 기록할 수 있습니다.
writer.add_scalar("그룹/변수명", 변수, iter)
# ex : 그룹 = train or valid 변수명 : loss or acc
writer.add_scalar("train/loss", train_loss, bactch_iter)

# 여러 개의 값을 dictionary 활용
writer.add_scalar("그룹/변수명", 변수 dictionary, iter)


## Quiz (Normal)  
add_scalar는 train, test함수에서 어느 줄에 삽입해야 할까요?  

In [None]:
if pre_trained:
    model_dict = torch.load(save_path+model_name)
    model.load_state_dict(model_dict)

# 후보 1)
writer = SummaryWriter('runs/cnn/')

def train(epoch, model, loss_func, train_loader, valid_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        # 후보 2)
        train_loss = loss_func(y_pred, y)
        
        train_loss.backward()
        optimizer.step()
        # 후보 3)
        # wandb.log({"train_loss": train_loss})
        writer.add_scalar("train/loss", train_loss, epoch*batch_size + batch_index)
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {train_loss.item():.6f}')
    torch.save(model.state_dict(), save_path + model_name)

    for batch_index, (x, y) in enumerate(valid_loader):
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        val_loss = loss_func(y_pred, y)
        
def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from torch.utils.data import DataLoader, Dataset 
from torchvision import datasets, transforms

# Hyperparameters
batch_size = 32
learning_rate = 0.001
epochs = 5
kernel_size = 3
stride = 2
pre_trained = False

In [8]:
config_path = './configs/'
save_path = './weights/'
model_name = 'cnn.pth'

In [9]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
cnn = CNN(C=1, W=28, H=28, K=3, S=2) 
cnn = cnn.to(device)
ce_loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(cnn.parameters(), lr=0.001)
writer = SummaryWriter('runs/cnn/')

13
6
2


In [10]:
for epoch in range(epochs):
    train(epoch, cnn, ce_loss, train_loader, valid_loader, optimizer)
test(cnn, ce_loss, test_loader)
writer.close()



Train Epoch: 1 | Batch Status: 0/60000             (0% | Loss: 2.293186
Train Epoch: 1 | Batch Status: 3200/60000             (5% | Loss: 1.113714
Train Epoch: 1 | Batch Status: 6400/60000             (11% | Loss: 1.064491
Train Epoch: 1 | Batch Status: 9600/60000             (16% | Loss: 0.686955
Train Epoch: 1 | Batch Status: 12800/60000             (21% | Loss: 0.965053
Train Epoch: 1 | Batch Status: 16000/60000             (27% | Loss: 0.459951
Train Epoch: 1 | Batch Status: 19200/60000             (32% | Loss: 0.819730
Train Epoch: 1 | Batch Status: 22400/60000             (37% | Loss: 0.690969
Train Epoch: 1 | Batch Status: 25600/60000             (43% | Loss: 0.686268
Train Epoch: 1 | Batch Status: 28800/60000             (48% | Loss: 0.903146
Train Epoch: 1 | Batch Status: 32000/60000             (53% | Loss: 0.559102
Train Epoch: 1 | Batch Status: 35200/60000             (59% | Loss: 0.894578
Train Epoch: 1 | Batch Status: 38400/60000             (64% | Loss: 0.765228
Train Ep

## 텐서보드의 실행

In [None]:
!tensorboard --logdir "경로"