## 최적화 함수
- 손실의 경사 값(편미분의 계산 결과)을 기반으로, 어떤 알고리즘을 통해 파라미터를 수정해 나갈 것인지에 관한 함수

### SGD(Stochastic Gradient Descent)
- 경사에 일정한 학습률을 곱해서 파라미터를 수정해 나가는 방식

### Momentum
- SGD는 가장 근접한 경사 값만을 파라미터 갱신에 사용하지 않는데 모멘텀은 과거에 계산했던 경사 값을 저장했다가 파라미터를 일정 비율 감소시킨다.

## Dropout

- 설정한 비율만큼 학습에 사용하지 않는다. -> 출력층에 반영이 되지 않는다.

ex> 0.3일 경우 30%의 확률로 이전 레이어의 출력값을 다음 레이어로 전달하지 않는다.

In [1]:
import torch
import torch.nn as nn

In [2]:
torch.manual_seed(42)

<torch._C.Generator at 0x7f63c867e910>

In [None]:
inputs = torch.randn(1, 10)
inputs

tensor([[-1.4285, -0.2810,  0.7489,  1.1164,  1.2931,  0.4137, -0.5710, -0.9749,
          0.1863,  1.6273]])

In [None]:
dropout = nn.Dropout(0.3)

In [None]:
dropout.train()

Dropout(p=0.3, inplace=False)

In [None]:
dropout(inputs)

tensor([[-0.0000, -0.4014,  1.0699,  0.0000,  0.0000,  0.5910, -0.8157, -0.0000,
          0.2662,  2.3247]])

드롭아웃 비율을 p로 설정하면 출력값은 1/(1-p)를 곱한 값이 반환된다.

In [None]:
0.7489 * (1 / (1 - 0.3)) # 3번째 요소

1.0698571428571428

## Batch Normalization
- 미니 배치 학습에서 이전 레이어의 출력 값을 미니 배치 단위로 정규화 처리를 한 후 다음 레이어의 입력값으로 사용하면 학습 효율 향상 및 과적합 예방 가능
- 합성곱 연산에서는 nn.BatchNorm2d, 선형 함수 연산에서는 nn.BatchNorm1d를 사용

## Data Augmentation
- 데이터 증강은 학습 전 입력 데이터를 가공하여 학습 데이터의 다양성을 증가시키는 방법이다.

In [3]:
from torchvision.transforms import *
import torchvision.datasets as datasets

import numpy as np

from tqdm import tqdm_notebook

In [4]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

In [5]:
device

'cuda:0'

In [6]:
transform = Compose([
    RandomHorizontalFlip(p = 0.5),
    ToTensor(),
    Normalize(0.5, 0.5),
    RandomErasing(p = 0.5, scale = (0.02, 0.33), ratio = (0.3, 3.3), value = 0, inplace = False)
])

**RandomHorizontalFlip**은 **무작위로 좌우 반전을 수행**하고 **RandomErasing**은 **무작위로 일부 영역을 삭제**한다.

In [7]:
data_root = './data'

In [8]:
train_set = datasets.CIFAR10(root = data_root, train = True, download = True, transform = transform)
test_set = datasets.CIFAR10(root = data_root, train = False, download = True, transform = transform)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [9]:
from torch.utils.data import DataLoader
from torch.optim import Adam

In [10]:
bs = 1024

In [11]:
tr_loader = DataLoader(train_set, batch_size = bs, shuffle = True)
te_loader = DataLoader(test_set, batch_size = bs, shuffle = False)

In [13]:
class CNN(nn.Module) :

  def __init__(self) :

    super().__init__()

    self.c1 = nn.Conv2d(3, 32, 3, padding = (1, 1))
    self.c2 = nn.Conv2d(32, 32, 3, padding = (1, 1))
    self.c3 = nn.Conv2d(32, 64, 3, padding = (1, 1))
    self.c4 = nn.Conv2d(64, 64, 3, padding = (1, 1))
    self.c5 = nn.Conv2d(64, 128, 3, padding = (1, 1))
    self.c6 = nn.Conv2d(128, 128, 3, padding = (1, 1))
    self.relu = nn.ReLU(inplace = True)
    self.flatten = nn.Flatten()
    self.maxpool = nn.MaxPool2d((2, 2))
    self.l1 = nn.Linear(4 * 4 * 128, 128)
    self.l2 = nn.Linear(128, 10)
    self.d1 = nn.Dropout(0.2)
    self.d2 = nn.Dropout(0.3)
    self.d3 = nn.Dropout(0.4)
    self.bn1 = nn.BatchNorm2d(32)
    self.bn2 = nn.BatchNorm2d(32)
    self.bn3 = nn.BatchNorm2d(64)
    self.bn4 = nn.BatchNorm2d(64)
    self.bn5 = nn.BatchNorm2d(128)
    self.bn6 = nn.BatchNorm2d(128)

    self.features = nn.Sequential(
        self.c1,
        self.bn1,
        self.relu,
        self.c2,
        self.bn2,
        self.relu,
        self.maxpool,
        self.d1,
        self.c3,
        self.bn3,
        self.relu,
        self.c4,
        self.bn4,
        self.relu,
        self.maxpool,
        self.d2
        self.c5,
        self.bn5,
        self.relu,
        self.c6,
        self.bn6,
        self.relu,
        self.maxpool,
        self.d3
    )

    self.cls = nn.Sequential(
        self.l1,
        self.relu,
        self.d3,
        self.l2
    )

  def forward(self, x) :
    x = self.features(x)
    x = self.flatten(x)
    x = self.cls(x)

    return x

In [14]:
def fit(net, optimizer, criterion, num_epohcs, train_loader, test_loader, device) :

  history = np.zeros((0, 5))

  for epoch in tqdm_notebook(range(num_epochs)) :

    tr_acc, tr_loss, tr_cnt = 0.0, 0.0, 0.0
    te_acc, te_loss, te_cnt = 0.0, 0.0, 0.0

    net.train()

    for image, label in train_loader :
      
      tr_cnt += len(label)

      image = image.to(device)
      label = label.to(device)

      optimizer.zero_grad()

      pred = net(image)

      loss = criterion(pred, label)
      tr_loss += loss.item()
      loss.backward()

      optimizer.step()

      cls = torch.max(pred, 1)[1]
      tr_acc += (cls == label).sum().item()

    avg_tr_acc = tr_acc / tr_cnt
    avg_tr_loss = tr_loss / tr_cnt

    net.eval()
    with torch.no_grad() :
      for image, label in test_loader :

        te_cnt += len(label)

        image = image.to(device)
        label = label.to(device)

        pred = net(image)

        loss = criterion(pred, label)
        te_loss += loss.item()

        cls = torch.max(pred, 1)[1]

        te_acc += (cls == label).sum().item()

      avg_te_acc = te_acc / te_cnt
      avg_te_loss = te_loss / te_cnt

    print(f"Epoch {epoch + 1} Train Accuracy : {avg_tr_acc} / Test Accuracy : {avg_te_acc}")
    history = np.vstack((history, np.array([epoch + 1, avg_tr_acc, avg_tr_loss, avg_te_acc, avg_te_loss])))
  
  return history

In [20]:
net = CNN().to(device)
criterion = nn.CrossEntropyLoss()
lr = 0.003
optimizer = Adam(net.parameters(), lr = lr)
net   

CNN(
  (c1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (c2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (c3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (c4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (c5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (c6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu): ReLU(inplace=True)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (maxpool): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
  (l1): Linear(in_features=2048, out_features=128, bias=True)
  (l2): Linear(in_features=128, out_features=10, bias=True)
  (features): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kern

In [21]:
num_epochs = 20

In [24]:
history = fit(net, optimizer, criterion, num_epochs, tr_loader, te_loader, device)

Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
  for epoch in tqdm_notebook(range(num_epochs)) :


  0%|          | 0/20 [00:00<?, ?it/s]

Epoch 1 Train Accuracy : 0.78388 / Test Accuracy : 0.7517
Epoch 2 Train Accuracy : 0.79014 / Test Accuracy : 0.7609
Epoch 3 Train Accuracy : 0.79686 / Test Accuracy : 0.764
Epoch 4 Train Accuracy : 0.8088 / Test Accuracy : 0.7663
Epoch 5 Train Accuracy : 0.8136 / Test Accuracy : 0.76
Epoch 6 Train Accuracy : 0.8193 / Test Accuracy : 0.7635
Epoch 7 Train Accuracy : 0.82206 / Test Accuracy : 0.7731
Epoch 8 Train Accuracy : 0.8271 / Test Accuracy : 0.7709
Epoch 9 Train Accuracy : 0.83132 / Test Accuracy : 0.7749
Epoch 10 Train Accuracy : 0.84258 / Test Accuracy : 0.7774
Epoch 11 Train Accuracy : 0.84296 / Test Accuracy : 0.7717
Epoch 12 Train Accuracy : 0.84624 / Test Accuracy : 0.7731
Epoch 13 Train Accuracy : 0.85198 / Test Accuracy : 0.7635
Epoch 14 Train Accuracy : 0.85484 / Test Accuracy : 0.7746
Epoch 15 Train Accuracy : 0.8643 / Test Accuracy : 0.7756
Epoch 16 Train Accuracy : 0.86544 / Test Accuracy : 0.7763
Epoch 17 Train Accuracy : 0.86782 / Test Accuracy : 0.7872
Epoch 18 Train

## net.train() & net.eval()

- 학습시 파라미터 수정을 위한 **훈련 페이즈**와 수정한 파라미터 값으로 예측 값을 구하는 **예측 페이즈**를 교대로 반복한다.