## 예제 3-3) 사람의 손글씨 데이터인 MNIST를 이용해 MLP 설계할 때 Dropout & ReLU & Batch Normalization 적용해보기

Batch Normalization은 각 Layer마다 Input의 분포가 달라짐에 따라 학습 속도가 현저히 느려지는 것을 방지하기 위해 이용되는 기법.  
Batch Normalization은 1차원, 2차원, ... 등 다양한 차원에 따라 적용되는 함수명이 다르므로 유의해야 함.  
MLP 내 각 layer에서 데이터는 1차원 크기의 벡터 값을 계산하므로 **nn.BatchNorm1d( )** 를 사용. 

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn                           
import torch.nn.functional as F
from torchvision import transforms, datasets 

In [2]:
if torch.cuda.is_available() :
    DEVICE = torch.device('cuda')
else :
    DEVICE = torch.device('cpu')
    
print('Using PyTorch version : ', torch.__version__, ' Device : ', DEVICE)

Using PyTorch version :  1.9.0  Device :  cpu


  return torch._C._cuda_getDeviceCount() > 0


In [3]:
BATCH_SIZE = 32
EPOCHS = 10

In [4]:
train_dataset = datasets.MNIST(root = "../data/MNIST",
                              train = True,
                              download = True,
                              transform = transforms.ToTensor())
test_dataset = datasets.MNIST(root = "../data/MNIST",
                              train = False,
                              transform = transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(dataset = train_dataset,
                                          batch_size = BATCH_SIZE,
                                          shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset,
                                          batch_size = BATCH_SIZE,
                                          shuffle = False)

  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


nn.BatchNorm( ) 함수를 적용하는 부분은 논문 / 코드에 따라 Activation Function 전, 후가 달라질 수 있음.  
이 예제에서는 이전에 적용해보자.

In [5]:
class Net(nn.Module) : 
    def __init__(self) :  
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)
        self.dropout_prob = 0.5
        self.batch_norm1 = nn.BatchNorm1d(512)                                     # (1)
        self.batch_norm2 = nn.BatchNorm1d(256)                                     # (2)
        
    def forward(self, x) :
        x = x.view(-1, 28 * 28)
        x = self.fc1(x)
        x = self.batch_norm1(x)                                                    # (3)
        x = F.relu(x)
        x = F.dropout(x, training = self.training, p = self.dropout_prob)
        x = self.fc2(x)
        x = self.batch_norm2(x)                                                    # (4)
        x = F.relu(x)
        x = F.dropout(x, training = self.training, p = self.dropout_prob)
        x = self.fc3(x)
        x = F.log_softmax(x, dim = 1)
        return x

(1) nn.BatchNorm() : Class 내에서 이용하기 위해 정의. 첫 번째 Fully Connected Layer의 Output이 512 크기의 벡터 값이므로 512 차원으로 설정  
(2) nn.BatchNorm() : Class 내에서 이용하기 위해 정의. 두 번째 Fully Connected Layer의 Output이 256 크기의 벡터 값이므로 256 차원으로 설정  
(3) 첫 번째 Fully Connected Layer의 Output을 'self.batch_norm1'의 input으로 이용  
(4) 두 번째 Fully Connected Layer의 Output을 'self.batch_norm2'의 input으로 이용  

In [6]:
model = Net().to(DEVICE)                                                      # (1)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.5)    # (2)
criterion = nn.CrossEntropyLoss()                                             # (3)

print(model)

Net(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=10, bias=True)
  (batch_norm1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batch_norm2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


In [7]:
def train(model, train_loader, optimizer, log_interval) :
    model.train()                                                                              # (1)
    for batch_idx, (image, label) in enumerate(train_loader) :                                 # (2)
        image = image.to(DEVICE)                                                               # (3)
        label = label.to(DEVICE)                                                               # (4)
        optimizer.zero_grad()                                                                  # (5)
        output = model(image)                                                                  # (6)
        loss = criterion(output, label)                                                        # (7)
        loss.backward()                                                                        # (8)
        optimizer.step()                                                                       # (9)
        
        if batch_idx % log_interval == 0 :
            print("Train Eppoch : {} [{}/{}({:.0f}%)]\tTrain Loss : {:.6f}".format(
                Epoch, batch_idx * len(image), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [8]:
def evaluate(model, test_loader) :
    model.eval()                                                                     # (1)
    test_loss = 0                                                                    # (2)
    correct = 0                                                                      # (3)
    
    with torch.no_grad() :                                                           # (4)
        for image, label in test_loader :                                            # (5)
            image = image.to(DEVICE)                                                 # (6)
            label = label.to(DEVICE)                                                 # (7)
            output = model(image)                                                    # (8)
            test_loss += criterion(output, label).item()                             # (9)
            prediction = output.max(1, keepdim = True)[1]                            # (10)
            correct += prediction.eq(label.view_as(prediction)).sum().item()         # (11)
            
    test_loss /= len(test_loader.dataset)                                            # (12)
    test_accuracy = 100. * correct / len(test_loader.dataset)                        # (13)
    return test_loss, test_accuracy                                                  # (14)

In [9]:
for Epoch in range(1, EPOCHS + 1) :
    train(model, train_loader, optimizer ,log_interval = 200)    # (1)
    test_loss, test_accuracy = evaluate(model, test_loader)      # (2)
    print("\n[EPOCH : {}], \tTest Loss : {:.4f}, \tTest Accuracy : {:.2f} %\n".format(Epoch, test_loss, test_accuracy))


[EPOCH : 1], 	Test Loss : 0.0050, 	Test Accuracy : 94.95 %


[EPOCH : 2], 	Test Loss : 0.0036, 	Test Accuracy : 96.52 %


[EPOCH : 3], 	Test Loss : 0.0031, 	Test Accuracy : 96.88 %


[EPOCH : 4], 	Test Loss : 0.0028, 	Test Accuracy : 97.33 %


[EPOCH : 5], 	Test Loss : 0.0025, 	Test Accuracy : 97.41 %


[EPOCH : 6], 	Test Loss : 0.0023, 	Test Accuracy : 97.63 %


[EPOCH : 7], 	Test Loss : 0.0023, 	Test Accuracy : 97.72 %


[EPOCH : 8], 	Test Loss : 0.0021, 	Test Accuracy : 97.94 %


[EPOCH : 9], 	Test Loss : 0.0020, 	Test Accuracy : 97.97 %


[EPOCH : 10], 	Test Loss : 0.0020, 	Test Accuracy : 97.98 %



Batch Normalization을 적용 했을 때 성능이 향상된 것을 확인 가능.