<a href="https://colab.research.google.com/github/imhyunho99/2023-1--Deaplearning_Framework/blob/main/7_2_%EC%A0%95%ED%98%95%ED%99%94(Weight_Regularization).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 오버피팅과 언더피팅


- 수용력과 오차 그래프
- 일반화 : 학습 오차와 테스트 오차의 차이



# 정형화 Weight Regularization 

- 최적화 함수의 weight_decay 로 강도를 조절할 수 있습니다.
- ex) torch.optim.SGD(params, lr=1, momentum=0, dampening=0, weight_decay=0, nesterov=False)
- 모델이 오버피팅할 경우, 적절한 강도로 정형화를 걸어주면 이를 어느정도 극복할 수 있습니다.
- 정형화 부분 빼고는 컨볼루션 인공신경망 코드와 동일합니다.

In [1]:
# 런타임 유형을 GPU로 바꾸시길 추천드립니다.
!pip install torch torchvision

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## 1. Settings
### 1) Import required libraries

In [2]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.init as init
import torchvision.datasets as dset
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

### 2) Set hyperparameters

In [3]:
batch_size = 256
learning_rate = 0.0002
num_epoch = 10

## 2. Data

### 1) Download Data

In [4]:
mnist_train = dset.MNIST("./", train=True, transform=transforms.ToTensor(), target_transform=None, download=True)
mnist_test = dset.MNIST("./", train=False, transform=transforms.ToTensor(), target_transform=None, download=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 457227661.33it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 94342440.67it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 84241514.77it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz





Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 17224709.56it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



### 2) Check Dataset

In [5]:
print(mnist_train.__getitem__(0)[0].size(), mnist_train.__len__())
mnist_test.__getitem__(0)[0].size(), mnist_test.__len__()

torch.Size([1, 28, 28]) 60000


(torch.Size([1, 28, 28]), 10000)

### 3) Set DataLoader

In [6]:
train_loader = torch.utils.data.DataLoader(mnist_train,batch_size=batch_size, shuffle=True,num_workers=2,drop_last=True)
test_loader = torch.utils.data.DataLoader(mnist_test,batch_size=batch_size, shuffle=False,num_workers=2,drop_last=True)

## 3. Model & Optimizer

### 1) CNN Model

In [7]:
class CNN(nn.Module):
    def __init__(self):
        super(CNN,self).__init__()
        self.layer = nn.Sequential(
            nn.Conv2d(1,16,3,padding=1),  # 28 x 28
            nn.ReLU(),
            nn.Conv2d(16,32,3,padding=1), # 28 x 28
            nn.ReLU(),
            nn.MaxPool2d(2,2),            # 14 x 14
            nn.Conv2d(32,64,3,padding=1), # 14 x 14
            nn.ReLU(),
            nn.MaxPool2d(2,2)             #  7 x 7
        )
        self.fc_layer = nn.Sequential(
            nn.Linear(64*7*7,100),
            nn.ReLU(),
            nn.Linear(100,10)
        )       
        
    def forward(self,x):
        out = self.layer(x)
        out = out.view(batch_size,-1)
        out = self.fc_layer(out)
        return out

### 2) Loss func & Optimizer

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

model = CNN().to(device)
loss_func = nn.CrossEntropyLoss()

# 정형화는 weight_decay로 줄 수 있습니다.
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay=0.1) #==> weight_decay ==> weight_regularization

cuda:0


## 4. Train 

In [9]:
for i in range(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device)
        y_= label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()
        
    if i % 10 == 0:
        print(loss)          

tensor(2.3116, device='cuda:0', grad_fn=<NllLossBackward0>)


In [11]:
param_list = list(model.parameters())
print(param_list)

[Parameter containing:
tensor([[[[ 0.3172,  0.1493,  0.1448],
          [ 0.2709,  0.0267,  0.0459],
          [-0.1938,  0.2383, -0.2284]]],


        [[[ 0.2079, -0.0940, -0.0869],
          [-0.0885, -0.2786,  0.1699],
          [ 0.2008, -0.3180,  0.1161]]],


        [[[ 0.1323, -0.1896,  0.0920],
          [-0.2731, -0.2385, -0.2633],
          [-0.2280, -0.1860, -0.2539]]],


        [[[-0.2300,  0.1232, -0.1815],
          [-0.2030, -0.0852,  0.1175],
          [-0.1707, -0.1044,  0.1873]]],


        [[[-0.1605,  0.3062,  0.0791],
          [ 0.0794,  0.1429,  0.1589],
          [-0.0321,  0.1616,  0.1737]]],


        [[[ 0.0981, -0.0520, -0.0936],
          [-0.2720, -0.1416,  0.3103],
          [ 0.1587, -0.2344, -0.0570]]],


        [[[ 0.0821,  0.2333, -0.0171],
          [ 0.2988,  0.2303,  0.1808],
          [-0.2048, -0.2305, -0.2287]]],


        [[[ 0.0273, -0.0383,  0.2402],
          [-0.1951, -0.2679,  0.1215],
          [ 0.2135, -0.3055, -0.0791]]],


        [

## 5. Test

In [12]:
correct = 0
total = 0

with torch.no_grad():
  for image,label in test_loader:
      x = image.to(device)
      y_= label.to(device)

      output = model.forward(x)
      _,output_index = torch.max(output,1)

      total += label.size(0)
      correct += (output_index == y_).sum().float()

  print("Accuracy of Test Data: {}".format(100*correct/total))

Accuracy of Test Data: 10.09615421295166


## 정확도가 왜 이렇게 낮을까?

- 먼저 다른 기법들을 배워보면 이유를 알 수 있습니다!


5/10 현호 생각
1.   Optim = SGD? Adam 써야 좋은 결과
2.   Num_Epoch = 10? 장난하나 ㅋ



In [13]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=0.1)
num_epoch = 100

In [14]:
for i in range(num_epoch):
    for j,[image,label] in enumerate(train_loader):
        x = image.to(device)
        y_= label.to(device)
        
        optimizer.zero_grad()
        output = model.forward(x)
        loss = loss_func(output,y_)
        loss.backward()
        optimizer.step()
        
    if i % 10 == 0:
        print(loss)          

tensor(0.5654, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.3326, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.3059, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward0>)


KeyboardInterrupt: ignored

In [15]:
correct = 0
total = 0

with torch.no_grad():
  for image,label in test_loader:
      x = image.to(device)
      y_= label.to(device)

      output = model.forward(x)
      _,output_index = torch.max(output,1)

      total += label.size(0)
      correct += (output_index == y_).sum().float()

  print("Accuracy of Test Data: {}".format(100*correct/total))

#시간 너무 오래걸려서 멈춤...ㅋㅋㅋ 근데 에폭 50 돌렸는데 10->93?

Accuracy of Test Data: 93.359375
