## 예제 3-2) 사람의 손글씨 데이터인 MNIST를 이용해 MLP 설계할 때 Dropout & ReLU 적용해보기

sigmoid()를 relu()로 변경해보자

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn                           
import torch.nn.functional as F
from torchvision import transforms, datasets 

In [3]:
if torch.cuda.is_available() :
    DEVICE = torch.device('cuda')
else :
    DEVICE = torch.device('cpu')
    
print('Using PyTorch version : ', torch.__version__, ' Device : ', DEVICE)

Using PyTorch version :  1.9.0  Device :  cpu


  return torch._C._cuda_getDeviceCount() > 0


In [4]:
BATCH_SIZE = 32
EPOCHS = 10

In [5]:
train_dataset = datasets.MNIST(root = "../data/MNIST",
                              train = True,
                              download = True,
                              transform = transforms.ToTensor())
test_dataset = datasets.MNIST(root = "../data/MNIST",
                              train = False,
                              transform = transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(dataset = train_dataset,
                                          batch_size = BATCH_SIZE,
                                          shuffle = True)
test_loader = torch.utils.data.DataLoader(dataset = test_dataset,
                                          batch_size = BATCH_SIZE,
                                          shuffle = False)

  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


In [6]:
class Net(nn.Module) : 
    def __init__(self) :  
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)
        self.dropout_prob = 0.5
        
    def forward(self, x) :
        x = x.view(-1, 28 * 28)
        x = self.fc1(x)
        x = F.relu(x)                                                              # (1)
        x = F.dropout(x, training = self.training, p = self.dropout_prob)
        x = self.fc2(x)
        x = F.relu(x)                                                              # (2)
        x = F.dropout(x, training = self.training, p = self.dropout_prob)
        x = self.fc3(x)
        x = F.log_softmax(x, dim = 1)
        return x

ReLU() 함수는 0 미만인 값은 0으로, 양수 값은 그대로 반영하는 비선형 함수이며 Gradient를 빠르게 계산하고 Back Propagation을 효과적으로 이용할 수 있으므로 많은 딥러닝 모형을 설계할 때 많이 이용됨.  
이와 반대로 sigmoid() 비선형 함수는 0에서 멀어질수록 Gradient 값이 0에 가까워 Back Propagation이 효과적으로 이용되기 어려움. 

In [7]:
model = Net().to(DEVICE)                                                      # (1)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.5)    # (2)
criterion = nn.CrossEntropyLoss()                                             # (3)

print(model)

Net(
  (fc1): Linear(in_features=784, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=10, bias=True)
)


In [8]:
def train(model, train_loader, optimizer, log_interval) :
    model.train()                                                                              # (1)
    for batch_idx, (image, label) in enumerate(train_loader) :                                 # (2)
        image = image.to(DEVICE)                                                               # (3)
        label = label.to(DEVICE)                                                               # (4)
        optimizer.zero_grad()                                                                  # (5)
        output = model(image)                                                                  # (6)
        loss = criterion(output, label)                                                        # (7)
        loss.backward()                                                                        # (8)
        optimizer.step()                                                                       # (9)
        
        if batch_idx % log_interval == 0 :
            print("Train Eppoch : {} [{}/{}({:.0f}%)]\tTrain Loss : {:.6f}".format(
                Epoch, batch_idx * len(image), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

In [9]:
def evaluate(model, test_loader) :
    model.eval()                                                                     # (1)
    test_loss = 0                                                                    # (2)
    correct = 0                                                                      # (3)
    
    with torch.no_grad() :                                                           # (4)
        for image, label in test_loader :                                            # (5)
            image = image.to(DEVICE)                                                 # (6)
            label = label.to(DEVICE)                                                 # (7)
            output = model(image)                                                    # (8)
            test_loss += criterion(output, label).item()                             # (9)
            prediction = output.max(1, keepdim = True)[1]                            # (10)
            correct += prediction.eq(label.view_as(prediction)).sum().item()         # (11)
            
    test_loss /= len(test_loader.dataset)                                            # (12)
    test_accuracy = 100. * correct / len(test_loader.dataset)                        # (13)
    return test_loss, test_accuracy                                                  # (14)

In [10]:
for Epoch in range(1, EPOCHS + 1) :
    train(model, train_loader, optimizer ,log_interval = 200)    # (1)
    test_loss, test_accuracy = evaluate(model, test_loader)      # (2)
    print("\n[EPOCH : {}], \tTest Loss : {:.4f}, \tTest Accuracy : {:.2f} %\n".format(Epoch, test_loss, test_accuracy))


[EPOCH : 1], 	Test Loss : 0.0099, 	Test Accuracy : 91.08 %


[EPOCH : 2], 	Test Loss : 0.0070, 	Test Accuracy : 93.40 %


[EPOCH : 3], 	Test Loss : 0.0055, 	Test Accuracy : 94.65 %


[EPOCH : 4], 	Test Loss : 0.0045, 	Test Accuracy : 95.53 %


[EPOCH : 5], 	Test Loss : 0.0041, 	Test Accuracy : 96.00 %


[EPOCH : 6], 	Test Loss : 0.0036, 	Test Accuracy : 96.38 %


[EPOCH : 7], 	Test Loss : 0.0033, 	Test Accuracy : 96.57 %


[EPOCH : 8], 	Test Loss : 0.0030, 	Test Accuracy : 96.98 %


[EPOCH : 9], 	Test Loss : 0.0028, 	Test Accuracy : 97.30 %


[EPOCH : 10], 	Test Loss : 0.0027, 	Test Accuracy : 97.38 %



sigmoid() 함수를 적용했을 때보다 ReLU() 적용해 비선형 함수를 이용했을 때 학습시작부터 높은 성능을 유지하며 학습이 진행될수록 성능이 더욱 좋아짐.