# timm 과 Hugging Face를 통한 전이학습

## 실습 개요

1) **실습 목적**


이번 실습은 이론으로 배웠던 **전이 학습**에 대해 timm 과 Hugging Face로 직접 구현할 수 있게 구성하였습니다. Pretrained model community의 조작법을 알아보며 응용할 수 있을 것입니다.😊


2) **수강 목표**

- timm 라이브러리를 이용하여 pretrained model을 불러오고, 이를 이용하여 transfer learing을 할 수 있다.
- Hugging Face를 이용하여 pretrained model을 불러오고, 이를 이용하여 transfer learning을 할 수 있다.

### 실습 목차
* 1. timm을 활용한 pretrained model 사용법
  * 1-1. timm으로 pretrained model 불러오기
  * 1-2. timm을 활용한 전이 학습 실습
* 2. Hugging Face를 활용한 pretrained model 사용법
  * 2-1. Hugging Face로 pretrained model 불러오기
  * 2-2. Hugging Face을 활용한 전이 학습 실습


### 환경 설정
> Pytorch 설치 및 불러오기

> 런타임 GPU로 변경

> seed 고정

<font color = blue><b>
- 패키지 설치 및 임포트
</font><b>

In [None]:
# !pip install scikit-learn==1.3.0 -q
# !pip install torch==2.0.1 -q
# !pip install torchvision==0.15.2 -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m85.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [1]:
import torch # pytorch 불러오기
import numpy as np # numpy 불러오기
import warnings # 경고 문구 제거
import matplotlib.pyplot as plt # 그래프를 그리기 위한 라이브러리
import pandas as pd # 데이터 프레임을 읽기 위한 라이브러리
from sklearn.model_selection import train_test_split # train test 를 나누기 위한 라이브러리
from sklearn.metrics import accuracy_score # 정확도 계산 라이브러리
from tqdm.notebook import tqdm # 진행상황 바 표현

warnings.filterwarnings('ignore')

In [2]:
import torch.nn as nn # 모델 구성을 위한 라이브러리
from torchvision.datasets import CIFAR10 # CIFAR10 데이터셋 불러오는 라이브러리
import torchvision.transforms as T # 이미지 변환을 위한 라이브러리
import torch.optim as optim # optimizer 설정을 위한 라이브러리

In [4]:
# seed 고정
import random
import torch.backends.cudnn as cudnn

def random_seed(seed_num):
    torch.manual_seed(seed_num)
    torch.cuda.manual_seed(seed_num)
    torch.cuda.manual_seed_all(seed_num)
    np.random.seed(seed_num)
    cudnn.benchmark = False
    cudnn.deterministic = True
    random.seed(seed_num)
random_seed(42)

In [5]:
device = 'cpu'

## 1. timm을 활용한 pretrained model 사용법

```
💡 목차 개요 : timm을 이용하여 다양한 모델 구조와 pretrained weight를 불러오고, 이를 활용할 수 있다.
```

- 1-1. timm으로 pretrained model 불러오기
- 1-2. timm을 활용한 전이 학습 실습 (CIFAR10)


### 1-1 timm으로 pretrained model 불러오기

> timm 라이브러리를 통해 다양한 pretrained model을 불러올 수 있습니다.


#### 📝 설명 : timm을 이용하여 다양한 모델 불러오기
* list_models : timm library에 있는 다양한 모델 리스트 반환
* create_model : 특정한 모델의 구조와 파라미터를 그대로 가져와 모델 구성

📚 참고할만한 자료:
* [docs] : https://timm.fast.ai/
* [github] : https://github.com/huggingface/pytorch-image-models
* [guide blog] : https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055




In [6]:
!pip install timm==0.9.2 -q # timm 라이브러리 설치

[33mDEPRECATION: Loading egg at /opt/anaconda3/lib/python3.11/site-packages/PyBioMed-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0m

In [7]:
import timm # timm 라이브러리 불러오기

In [8]:
timm.list_models() # timm이 지원하는 모든 모델 리스트

['bat_resnext26ts',
 'beit_base_patch16_224',
 'beit_base_patch16_384',
 'beit_large_patch16_224',
 'beit_large_patch16_384',
 'beit_large_patch16_512',
 'beitv2_base_patch16_224',
 'beitv2_large_patch16_224',
 'botnet26t_256',
 'botnet50ts_256',
 'caformer_b36',
 'caformer_m36',
 'caformer_s18',
 'caformer_s36',
 'cait_m36_384',
 'cait_m48_448',
 'cait_s24_224',
 'cait_s24_384',
 'cait_s36_384',
 'cait_xs24_384',
 'cait_xxs24_224',
 'cait_xxs24_384',
 'cait_xxs36_224',
 'cait_xxs36_384',
 'coat_lite_medium',
 'coat_lite_medium_384',
 'coat_lite_mini',
 'coat_lite_small',
 'coat_lite_tiny',
 'coat_mini',
 'coat_small',
 'coat_tiny',
 'coatnet_0_224',
 'coatnet_0_rw_224',
 'coatnet_1_224',
 'coatnet_1_rw_224',
 'coatnet_2_224',
 'coatnet_2_rw_224',
 'coatnet_3_224',
 'coatnet_3_rw_224',
 'coatnet_4_224',
 'coatnet_5_224',
 'coatnet_bn_0_rw_224',
 'coatnet_nano_cc_224',
 'coatnet_nano_rw_224',
 'coatnet_pico_rw_224',
 'coatnet_rmlp_0_rw_224',
 'coatnet_rmlp_1_rw2_224',
 'coatnet_rmlp_1_r

In [9]:
timm.list_models('resnet*') # 쿼리를 통해 모델을 검색할 수 있음.

['resnet10t',
 'resnet14t',
 'resnet18',
 'resnet18d',
 'resnet26',
 'resnet26d',
 'resnet26t',
 'resnet32ts',
 'resnet33ts',
 'resnet34',
 'resnet34d',
 'resnet50',
 'resnet50_gn',
 'resnet50c',
 'resnet50d',
 'resnet50s',
 'resnet50t',
 'resnet51q',
 'resnet61q',
 'resnet101',
 'resnet101c',
 'resnet101d',
 'resnet101s',
 'resnet152',
 'resnet152c',
 'resnet152d',
 'resnet152s',
 'resnet200',
 'resnet200d',
 'resnetaa34d',
 'resnetaa50',
 'resnetaa50d',
 'resnetaa101d',
 'resnetblur18',
 'resnetblur50',
 'resnetblur50d',
 'resnetblur101d',
 'resnetrs50',
 'resnetrs101',
 'resnetrs152',
 'resnetrs200',
 'resnetrs270',
 'resnetrs350',
 'resnetrs420',
 'resnetv2_50',
 'resnetv2_50d',
 'resnetv2_50d_evos',
 'resnetv2_50d_frn',
 'resnetv2_50d_gn',
 'resnetv2_50t',
 'resnetv2_50x1_bit',
 'resnetv2_50x3_bit',
 'resnetv2_101',
 'resnetv2_101d',
 'resnetv2_101x1_bit',
 'resnetv2_101x3_bit',
 'resnetv2_152',
 'resnetv2_152d',
 'resnetv2_152x2_bit',
 'resnetv2_152x4_bit']

In [10]:
timm.list_models('resnet50', pretrained=True) # resnet 모델 중 pretrained weight 가 있는 모델 리스트

['resnet50.a1_in1k',
 'resnet50.a1h_in1k',
 'resnet50.a2_in1k',
 'resnet50.a3_in1k',
 'resnet50.am_in1k',
 'resnet50.b1k_in1k',
 'resnet50.b2k_in1k',
 'resnet50.bt_in1k',
 'resnet50.c1_in1k',
 'resnet50.c2_in1k',
 'resnet50.d_in1k',
 'resnet50.fb_ssl_yfcc100m_ft_in1k',
 'resnet50.fb_swsl_ig1b_ft_in1k',
 'resnet50.gluon_in1k',
 'resnet50.ra_in1k',
 'resnet50.ram_in1k',
 'resnet50.tv2_in1k',
 'resnet50.tv_in1k']

In [11]:
model = timm.create_model('resnet50', pretrained=True) # resnet50을 imagenet으로 pretrain한 모델 불러오기, 첫번째 모델로 불러옴.

model.safetensors:   0%|          | 0.00/102M [00:00<?, ?B/s]

In [12]:
model.default_cfg # resnet50 모델의 기본 정보

{'url': 'https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-rsb-weights/resnet50_a1_0-14fe96d1.pth',
 'hf_hub_id': 'timm/resnet50.a1_in1k',
 'architecture': 'resnet50',
 'tag': 'a1_in1k',
 'custom_load': False,
 'input_size': (3, 224, 224),
 'test_input_size': (3, 288, 288),
 'fixed_input_size': False,
 'interpolation': 'bicubic',
 'crop_pct': 0.95,
 'test_crop_pct': 1.0,
 'crop_mode': 'center',
 'mean': (0.485, 0.456, 0.406),
 'std': (0.229, 0.224, 0.225),
 'num_classes': 1000,
 'pool_size': (7, 7),
 'first_conv': 'conv1',
 'classifier': 'fc',
 'origin_url': 'https://github.com/huggingface/pytorch-image-models',
 'paper_ids': 'arXiv:2110.00476'}

In [13]:
model # 모델의 아키텍쳐

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act1): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (drop_block): Identity()
      (act2): ReLU(inplace=True)
      (aa): Identity()
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     

In [14]:
model2 = timm.create_model('resnet50', pretrained = True, num_classes = 10) # 마지막 output class 개수 10개로 조정
model2 # num class 를 임의로 조정하면 fc layer 의 weight가 초기화됨

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (act1): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (act1): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (drop_block): Identity()
      (act2): ReLU(inplace=True)
      (aa): Identity()
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
     

In [17]:
model3 = timm.create_model('resnet50', pretrained = True, num_classes = 10) # 비교를 위한 모델 생성
model2.fc.weight == model3.fc.weight # fc layer weight 비교 => fc layer의 weight는 초기화되는 것을 확인

tensor([[False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        ...,
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False],
        [False, False, False,  ..., False, False, False]])

In [16]:
model2.conv1.weight == model3.conv1.weight # fc layer 이전의 weight는 동일함

tensor([[[[True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          ...,
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True]],

         [[True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          ...,
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True]],

         [[True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          ...,
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ..., True, True, True],
          [True, True, True,  ...

### 1-2. timm을 활용한 전이 학습 실습

> timm을 이용하여 pretrained 모델을 불러오고, 이를 이용하여 전이 학습을 직접 해봅니다.

###  데이터 셋 개요 </b>

* 데이터 셋: CIFAR10/100 데이터베이스(Modified National Institute of Standards and Technology database)
* 데이터 셋 개요: CIFAR10은 10개의 클래스 ()를 가지는 이미지로 데이터셋입니다. 총 5만 개의 학습 데이터와 1만 개의 테스트 데이터로 이루어져 있으며 이미지와 그에 대응하는 라벨로 구성됩니다.
* 라벨 구성: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck
* [CIFAR10 공식 홈페이지](https://www.cs.toronto.edu/~kriz/cifar.html)

#### 📝 설명 : CIFAR10 데이터셋 불러오기
torchvision library를 이용하여 CIFAR10 데이터셋을 불러옵니다.

📚 참고할만한 자료:
* [torchvision CIFAR10] : https://pytorch.org/vision/stable/generated/torchvision.datasets.CIFAR10.html

In [18]:
# 데이터 불러오기
cifar_transform = T.Compose([
    T.ToTensor(), # 텐서 형식으로 변환
])
download_root = './CIFAR10_DATASET'

trainval_dataset = CIFAR10(download_root, transform=cifar_transform, train=True, download=True) # train dataset 다운로드
test_dataset = CIFAR10(download_root, transform=cifar_transform, train=False, download=True) # test dataset 다운로드

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./CIFAR10_DATASET/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:40<00:00, 4256362.11it/s]


Extracting ./CIFAR10_DATASET/cifar-10-python.tar.gz to ./CIFAR10_DATASET
Files already downloaded and verified


In [19]:
train_num, valid_num = int(len(trainval_dataset) * 0.8), int(len(trainval_dataset) * 0.2) # 8 : 2 = train : valid
print("Train dataset 개수 : ",train_num)
print("Validation dataset 개수 : ",valid_num)
train_dataset,val_dataset = torch.utils.data.random_split(trainval_dataset, [train_num, valid_num]) # train - valid set 나누기

Train dataset 개수 :  40000
Validation dataset 개수 :  10000


In [20]:
BATCH_SIZE = 64 # 배치사이즈 설정
# 데이터로더 설정
train_dataloader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=BATCH_SIZE,
                                          shuffle=True,
                                          drop_last=False, num_workers = 8) # train dataloader 구성
val_dataloader = torch.utils.data.DataLoader(dataset=val_dataset,
                                          batch_size=BATCH_SIZE,
                                          shuffle=False,
                                          drop_last=False, num_workers = 8) # valid dataloader 구성
test_dataloader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=BATCH_SIZE,
                                          shuffle=False,
                                          drop_last=False, num_workers = 8) # test dataloader 구성

#### 📝 설명 : pretrained 된 모델로 추론하기
* ImageNet으로 사전훈련된 모델의 파라미터를 **그대로** cifar10에 적용해봅니다.
이 때, input 값의 shape은 (Batch size, Channel, Height, Width) 형태로 구성되어야 합니다.

* ImageNet 이란?
  * 대규모 이미지 데이터셋으로, 1000개의 다양한 카테고리로 구성된 이미지들로 이루어져 있습니다.
  * 컴퓨터 비젼 분야에서 모델을 학습하고 평가하기 위해 널리 사용되며, 이미지 분류, 객체 감지, 객체 인식 등 다양한 작업에 활용됩니다.

📚 참고할만한 자료:
* [Guide blog] : https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055
* [ImageNet] : https://www.image-net.org/

In [21]:
# pretrained model 불러오기 (resnet50 불러오기)
device = 'cpu' # gpu 설정
model = timm.create_model('resnet50', pretrained=True, num_classes = 10).to(device) # 10개의 클래스 예측

In [22]:
# 이미지 하나 추론하기
img, label = train_dataset[0]
img = img.unsqueeze(0) # 배치 추가

In [23]:
model.eval() # evaluation 상태로 만듦 (freeze)
preds = model(img.to(device)) # model inference, image 도 gpu에 올리기
pred_label = torch.argmax(preds).item() # 가장 큰 값의 index 반환
print(f'True Label : {label} \nPredict Label : {pred_label}')

True Label : 5 
Predict Label : 6


#### 📝 설명 : timm을 이용하여 fine tuning 하기
* 사전 훈련된 모델을 이용하여 fine tuning 후 성능 변화를 확인해봅니다.
* 모델 전체를 fine tuning 한 것과 마지막 fully connected layer만 fine tuning 한 것에 대한 차이를 알아봅니다.
  * CIFAR10은 사전 학습된 데이터셋(ImageNet)과 도메인이 매우 다르고, 데이터셋이 크기 때문에 꽤 많은 layer를 학습을 해야합니다.  
* 전이 학습시 learning rate의 크기 전략에 대해 실습해봅니다.
  * 전이 학습시 일반적으로 learning rate를 작게 설정해야합니다.

📚 참고할만한 자료:
* [Guide blog] : https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055
* [Fine tuning 전략] : https://jeinalog.tistory.com/13

In [25]:
# training 코드, evaluation 코드, training_loop 코드
def training(model, dataloader, train_dataset, criterion, optimizer, device, epoch, num_epochs):
  model.train()  # 모델을 학습 모드로 설정
  train_loss = 0.0
  train_accuracy = 0

  tbar = tqdm(dataloader)
  for images, labels in tbar:
      images = images.to(device)
      labels = labels.to(device)

      # 순전파
      outputs = model(images)
      loss = criterion(outputs, labels)

      # 역전파 및 가중치 업데이트
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      # 손실과 정확도 계산
      train_loss += loss.item()
      # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
      _, predicted = torch.max(outputs, 1)
      train_accuracy += (predicted == labels).sum().item()

      # tqdm의 진행바에 표시될 설명 텍스트를 설정
      tbar.set_description(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {loss.item():.4f}")

  # 에폭별 학습 결과 출력
  train_loss = train_loss / len(dataloader)
  train_accuracy = train_accuracy / len(train_dataset)

  return model, train_loss, train_accuracy

def evaluation(model, dataloader, val_dataset, criterion, device, epoch, num_epochs):
  model.eval()  # 모델을 평가 모드로 설정
  valid_loss = 0.0
  valid_accuracy = 0

  with torch.no_grad(): # model의 업데이트 막기
      tbar = tqdm(dataloader)
      for images, labels in tbar:
          images = images.to(device)
          labels = labels.to(device)

          # 순전파
          outputs = model(images)
          loss = criterion(outputs, labels)

          # 손실과 정확도 계산
          valid_loss += loss.item()
          # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
          _, predicted = torch.max(outputs, 1)
          valid_accuracy += (predicted == labels).sum().item()

          # tqdm의 진행바에 표시될 설명 텍스트를 설정
          tbar.set_description(f"Epoch [{epoch+1}/{num_epochs}], Valid Loss: {loss.item():.4f}")

  valid_loss = valid_loss / len(dataloader)
  valid_accuracy = valid_accuracy / len(val_dataset)

  return model, valid_loss, valid_accuracy


def training_loop(model, train_dataloader, valid_dataloader, train_dataset, val_dataset, criterion, optimizer, device, num_epochs, patience, model_name):
    best_valid_loss = float('inf')  # 가장 좋은 validation loss를 저장
    early_stop_counter = 0  # 카운터
    valid_max_accuracy = -1

    for epoch in range(num_epochs):
        model, train_loss, train_accuracy = training(model, train_dataloader, train_dataset, criterion, optimizer, device, epoch, num_epochs)
        model, valid_loss, valid_accuracy = evaluation(model, valid_dataloader, val_dataset, criterion, device, epoch, num_epochs)

        if valid_accuracy > valid_max_accuracy:
          valid_max_accuracy = valid_accuracy

        # validation loss가 감소하면 모델 저장 및 카운터 리셋
        if valid_loss < best_valid_loss:
            best_valid_loss = valid_loss
            torch.save(model.state_dict(), f"./model_{model_name}.pt")
            early_stop_counter = 0

        # validation loss가 증가하거나 같으면 카운터 증가
        else:
            early_stop_counter += 1

        print(f"Epoch [{epoch + 1}/{num_epochs}], Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f} Valid Loss: {valid_loss:.4f}, Valid Accuracy: {valid_accuracy:.4f}")

        # 조기 종료 카운터가 설정한 patience를 초과하면 학습 종료
        if early_stop_counter >= patience:
            print("Early stopping")
            break

    return model, valid_max_accuracy

In [26]:
# 모델 전체 fine tuning
num_epochs = 100
patience = 3
scores = dict()
model_name = 'exp1'

lr = 1e-3
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = lr)
model, valid_max_accuracy = training_loop(model, train_dataloader, val_dataloader, train_dataset, val_dataset, criterion, optimizer, device, num_epochs, patience, model_name)
scores[model_name] = valid_max_accuracy

  0%|          | 0/625 [00:00<?, ?it/s]

In [None]:
model.load_state_dict(torch.load("./model_exp1.pt")) # 모델 불러오기
model = model.to(device)
model.eval()
total_labels = []
total_preds = []
with torch.no_grad():
    for images, labels in tqdm(test_dataloader):
        images = images.to(device)
        labels = labels

        outputs = model(images)
        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(outputs.data, 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
full_model_tuning_acc = accuracy_score(total_labels, total_preds) # 정확도 계산
print("Full Fine tuning model accuracy : ",full_model_tuning_acc) # 전체 모델을 fine tuning 한 것이 점수

  0%|          | 0/157 [00:00<?, ?it/s]

Full Fine tuning model accuracy :  0.8075


In [None]:
# 마지막 layer 만 fine tuning
num_epochs = 100
patience = 3
scores = dict()
model_name = 'exp2'

model = timm.create_model('resnet50', pretrained=True, num_classes= 10).to(device)

for para in model.parameters(): # 모든 layer freeze 하기
    para.requires_grad = False
for para in model.fc.parameters(): # fc layer 만 학습하기
    para.requires_grad = True

lr = 1e-3
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = lr)
model, valid_max_accuracy = training_loop(model, train_dataloader, val_dataloader, train_dataset, val_dataset, criterion, optimizer, device, num_epochs, patience, model_name)
scores[model_name] = valid_max_accuracy

  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/100], Train Loss: 1.8901, Train Accuracy: 0.3684 Valid Loss: 1.9369, Valid Accuracy: 0.4394


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/100], Train Loss: 1.6947, Train Accuracy: 0.4258 Valid Loss: 2.0128, Valid Accuracy: 0.4464


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3/100], Train Loss: 1.6495, Train Accuracy: 0.4371 Valid Loss: 1.8081, Valid Accuracy: 0.4492


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4/100], Train Loss: 1.6260, Train Accuracy: 0.4419 Valid Loss: 2.3619, Valid Accuracy: 0.4448


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5/100], Train Loss: 1.6128, Train Accuracy: 0.4468 Valid Loss: 2.0574, Valid Accuracy: 0.4502


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6/100], Train Loss: 1.6041, Train Accuracy: 0.4454 Valid Loss: 2.5392, Valid Accuracy: 0.4520
Early stopping


In [None]:
model.load_state_dict(torch.load("./model_exp2.pt")) # 모델 불러오기
model = model.to(device)
model.eval()
total_labels = []
total_preds = []
with torch.no_grad():
    for images, labels in tqdm(test_dataloader):
        images = images.to(device)
        labels = labels

        outputs = model(images)
        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(outputs.data, 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
fc_tuning_acc = accuracy_score(total_labels, total_preds) # 정확도 계산
print("Only FC Layer Fine tuning model accuracy : ",fc_tuning_acc) # 전체 layer를 fine tuning 한 것보다 점수가 낮음

  0%|          | 0/157 [00:00<?, ?it/s]

Only FC Layer Fine tuning model accuracy :  0.4421


In [None]:
# learning rate 에 따른 결과 비교

model3 = timm.create_model('resnet50', pretrained=True, num_classes= 10).to(device)
num_epochs = 100
patience = 3
scores = dict()
model_name = 'exp3'

lr = 1e-1 # learning rate 높게 설정
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model3.parameters(), lr = lr)
model, valid_max_accuracy = training_loop(model3, train_dataloader, val_dataloader, train_dataset, val_dataset, criterion, optimizer, device, num_epochs, patience, model_name)
scores[model_name] = valid_max_accuracy

  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/100], Train Loss: 3.1196, Train Accuracy: 0.1046 Valid Loss: 5.4355, Valid Accuracy: 0.0952


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/100], Train Loss: 2.3127, Train Accuracy: 0.0994 Valid Loss: 2.5206, Valid Accuracy: 0.1026


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3/100], Train Loss: 2.3718, Train Accuracy: 0.0994 Valid Loss: 3.0868, Valid Accuracy: 0.1044


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4/100], Train Loss: 3.1052, Train Accuracy: 0.1031 Valid Loss: 2.3147, Valid Accuracy: 0.1010


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5/100], Train Loss: 2.3122, Train Accuracy: 0.0995 Valid Loss: 2.3183, Valid Accuracy: 0.0973


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6/100], Train Loss: 2.3111, Train Accuracy: 0.0978 Valid Loss: 2.3175, Valid Accuracy: 0.0967


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7/100], Train Loss: 2.3159, Train Accuracy: 0.1016 Valid Loss: 2.3052, Valid Accuracy: 0.1024


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8/100], Train Loss: 2.3123, Train Accuracy: 0.0984 Valid Loss: 2.3036, Valid Accuracy: 0.1010


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9/100], Train Loss: 2.3119, Train Accuracy: 0.0986 Valid Loss: 2.3117, Valid Accuracy: 0.1024


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10/100], Train Loss: 2.6512, Train Accuracy: 0.1003 Valid Loss: 2.3074, Valid Accuracy: 0.0973


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11/100], Train Loss: 2.3097, Train Accuracy: 0.1001 Valid Loss: 2.3292, Valid Accuracy: 0.0999
Early stopping


In [None]:
model3.load_state_dict(torch.load("./model_exp3.pt")) # 모델 불러오기
model3 = model3.to(device)
model3.eval()
total_labels = []
total_preds = []
with torch.no_grad():
    for images, labels in tqdm(test_dataloader):
        images = images.to(device)
        labels = labels

        outputs = model3(images)
        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(outputs.data, 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
big_lr_acc = accuracy_score(total_labels, total_preds)
print("Large learning rate model accuracy : ",big_lr_acc)

  0%|          | 0/157 [00:00<?, ?it/s]

Large learning rate model accuracy :  0.1


In [None]:
# learning rate 에 따른 결과 비교

model4 = timm.create_model('resnet50', pretrained=True, num_classes= 10).to(device)
num_epochs = 100
patience = 3
scores = dict()
model_name = 'exp4'

lr = 1e-5 # 기존보다 더 작게 설정
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model4.parameters(), lr = lr)
model, valid_max_accuracy = training_loop(model4, train_dataloader, val_dataloader, train_dataset, val_dataset, criterion, optimizer, device, num_epochs, patience, model_name)
scores[model_name] = valid_max_accuracy

  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/100], Train Loss: 2.2927, Train Accuracy: 0.1283 Valid Loss: 2.4319, Valid Accuracy: 0.1627


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/100], Train Loss: 2.2540, Train Accuracy: 0.1877 Valid Loss: 2.2916, Valid Accuracy: 0.2175


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [3/100], Train Loss: 2.2080, Train Accuracy: 0.2434 Valid Loss: 2.2821, Valid Accuracy: 0.2794


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [4/100], Train Loss: 2.1393, Train Accuracy: 0.2928 Valid Loss: 2.2308, Valid Accuracy: 0.3194


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [5/100], Train Loss: 2.0507, Train Accuracy: 0.3353 Valid Loss: 2.0502, Valid Accuracy: 0.3551


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [6/100], Train Loss: 1.9469, Train Accuracy: 0.3690 Valid Loss: 2.0127, Valid Accuracy: 0.3885


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [7/100], Train Loss: 1.8498, Train Accuracy: 0.3996 Valid Loss: 1.8434, Valid Accuracy: 0.4182


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [8/100], Train Loss: 1.7610, Train Accuracy: 0.4274 Valid Loss: 1.8623, Valid Accuracy: 0.4420


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [9/100], Train Loss: 1.6829, Train Accuracy: 0.4548 Valid Loss: 1.6745, Valid Accuracy: 0.4658


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [10/100], Train Loss: 1.6101, Train Accuracy: 0.4727 Valid Loss: 1.6907, Valid Accuracy: 0.4886


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [11/100], Train Loss: 1.5359, Train Accuracy: 0.4981 Valid Loss: 1.5987, Valid Accuracy: 0.5091


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [12/100], Train Loss: 1.4746, Train Accuracy: 0.5157 Valid Loss: 1.5004, Valid Accuracy: 0.5264


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [13/100], Train Loss: 1.4119, Train Accuracy: 0.5362 Valid Loss: 1.4173, Valid Accuracy: 0.5473


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [14/100], Train Loss: 1.3549, Train Accuracy: 0.5445 Valid Loss: 1.3495, Valid Accuracy: 0.5578


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [15/100], Train Loss: 1.2982, Train Accuracy: 0.5633 Valid Loss: 1.3036, Valid Accuracy: 0.5732


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [16/100], Train Loss: 1.2431, Train Accuracy: 0.5779 Valid Loss: 1.2665, Valid Accuracy: 0.5817


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [17/100], Train Loss: 1.1987, Train Accuracy: 0.5897 Valid Loss: 1.2109, Valid Accuracy: 0.5969


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [18/100], Train Loss: 1.1587, Train Accuracy: 0.6036 Valid Loss: 1.2003, Valid Accuracy: 0.6020


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [19/100], Train Loss: 1.1164, Train Accuracy: 0.6160 Valid Loss: 1.1275, Valid Accuracy: 0.6190


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [20/100], Train Loss: 1.0830, Train Accuracy: 0.6265 Valid Loss: 1.1123, Valid Accuracy: 0.6250


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [21/100], Train Loss: 1.0534, Train Accuracy: 0.6339 Valid Loss: 1.0890, Valid Accuracy: 0.6354


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [22/100], Train Loss: 1.0278, Train Accuracy: 0.6432 Valid Loss: 1.0666, Valid Accuracy: 0.6383


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [23/100], Train Loss: 0.9993, Train Accuracy: 0.6529 Valid Loss: 1.0368, Valid Accuracy: 0.6500


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [24/100], Train Loss: 0.9726, Train Accuracy: 0.6629 Valid Loss: 1.0584, Valid Accuracy: 0.6475


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [25/100], Train Loss: 0.9476, Train Accuracy: 0.6715 Valid Loss: 0.9856, Valid Accuracy: 0.6642


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [26/100], Train Loss: 0.9234, Train Accuracy: 0.6780 Valid Loss: 0.9734, Valid Accuracy: 0.6685


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [27/100], Train Loss: 0.9083, Train Accuracy: 0.6828 Valid Loss: 1.0097, Valid Accuracy: 0.6675


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [28/100], Train Loss: 0.8851, Train Accuracy: 0.6945 Valid Loss: 0.9730, Valid Accuracy: 0.6726


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [29/100], Train Loss: 0.8622, Train Accuracy: 0.6990 Valid Loss: 0.9270, Valid Accuracy: 0.6816


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [30/100], Train Loss: 0.8446, Train Accuracy: 0.7038 Valid Loss: 0.9307, Valid Accuracy: 0.6885


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [31/100], Train Loss: 0.8382, Train Accuracy: 0.7077 Valid Loss: 0.9160, Valid Accuracy: 0.6914


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [32/100], Train Loss: 0.8160, Train Accuracy: 0.7174 Valid Loss: 0.9600, Valid Accuracy: 0.6958


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [33/100], Train Loss: 0.7981, Train Accuracy: 0.7220 Valid Loss: 0.9195, Valid Accuracy: 0.7000


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [34/100], Train Loss: 0.7825, Train Accuracy: 0.7271 Valid Loss: 0.8766, Valid Accuracy: 0.7050


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [35/100], Train Loss: 0.7691, Train Accuracy: 0.7324 Valid Loss: 0.8562, Valid Accuracy: 0.7078


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [36/100], Train Loss: 0.7495, Train Accuracy: 0.7395 Valid Loss: 0.8533, Valid Accuracy: 0.7141


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [37/100], Train Loss: 0.7344, Train Accuracy: 0.7431 Valid Loss: 0.8644, Valid Accuracy: 0.7121


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [38/100], Train Loss: 0.7216, Train Accuracy: 0.7498 Valid Loss: 0.8477, Valid Accuracy: 0.7183


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [39/100], Train Loss: 0.7091, Train Accuracy: 0.7548 Valid Loss: 0.8353, Valid Accuracy: 0.7179


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [40/100], Train Loss: 0.6987, Train Accuracy: 0.7585 Valid Loss: 0.8205, Valid Accuracy: 0.7225


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [41/100], Train Loss: 0.6870, Train Accuracy: 0.7590 Valid Loss: 0.8086, Valid Accuracy: 0.7248


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [42/100], Train Loss: 0.6719, Train Accuracy: 0.7665 Valid Loss: 0.7935, Valid Accuracy: 0.7273


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [43/100], Train Loss: 0.6578, Train Accuracy: 0.7723 Valid Loss: 0.7929, Valid Accuracy: 0.7304


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [44/100], Train Loss: 0.6512, Train Accuracy: 0.7729 Valid Loss: 0.8019, Valid Accuracy: 0.7304


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [45/100], Train Loss: 0.6418, Train Accuracy: 0.7773 Valid Loss: 0.7727, Valid Accuracy: 0.7357


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [46/100], Train Loss: 0.6268, Train Accuracy: 0.7806 Valid Loss: 0.7936, Valid Accuracy: 0.7372


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [47/100], Train Loss: 0.6146, Train Accuracy: 0.7835 Valid Loss: 0.7600, Valid Accuracy: 0.7401


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [48/100], Train Loss: 0.5998, Train Accuracy: 0.7898 Valid Loss: 0.7822, Valid Accuracy: 0.7343


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [49/100], Train Loss: 0.5925, Train Accuracy: 0.7936 Valid Loss: 0.7571, Valid Accuracy: 0.7422


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [50/100], Train Loss: 0.5848, Train Accuracy: 0.7972 Valid Loss: 0.7466, Valid Accuracy: 0.7448


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [51/100], Train Loss: 0.5746, Train Accuracy: 0.7981 Valid Loss: 0.7498, Valid Accuracy: 0.7460


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [52/100], Train Loss: 0.5600, Train Accuracy: 0.8032 Valid Loss: 0.7467, Valid Accuracy: 0.7502


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [53/100], Train Loss: 0.5561, Train Accuracy: 0.8063 Valid Loss: 0.7458, Valid Accuracy: 0.7482


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [54/100], Train Loss: 0.5438, Train Accuracy: 0.8090 Valid Loss: 0.7419, Valid Accuracy: 0.7479


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [55/100], Train Loss: 0.5345, Train Accuracy: 0.8164 Valid Loss: 0.7446, Valid Accuracy: 0.7514


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [56/100], Train Loss: 0.5299, Train Accuracy: 0.8163 Valid Loss: 0.7256, Valid Accuracy: 0.7568


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [57/100], Train Loss: 0.5201, Train Accuracy: 0.8205 Valid Loss: 0.7408, Valid Accuracy: 0.7503


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [58/100], Train Loss: 0.5075, Train Accuracy: 0.8247 Valid Loss: 0.7209, Valid Accuracy: 0.7531


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [59/100], Train Loss: 0.4979, Train Accuracy: 0.8267 Valid Loss: 0.7227, Valid Accuracy: 0.7556


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [60/100], Train Loss: 0.4882, Train Accuracy: 0.8310 Valid Loss: 0.7347, Valid Accuracy: 0.7545


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [61/100], Train Loss: 0.4866, Train Accuracy: 0.8308 Valid Loss: 0.7104, Valid Accuracy: 0.7593


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [62/100], Train Loss: 0.4748, Train Accuracy: 0.8352 Valid Loss: 0.7107, Valid Accuracy: 0.7590


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [63/100], Train Loss: 0.4638, Train Accuracy: 0.8393 Valid Loss: 0.7230, Valid Accuracy: 0.7561


  0%|          | 0/625 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [64/100], Train Loss: 0.4591, Train Accuracy: 0.8404 Valid Loss: 0.7146, Valid Accuracy: 0.7577
Early stopping


In [None]:
model4.load_state_dict(torch.load("./model_exp4.pt")) # 모델 불러오기
model4 = model4.to(device)
model4.eval()
total_labels = []
total_preds = []
with torch.no_grad():
    for images, labels in tqdm(test_dataloader):
        images = images.to(device)
        labels = labels

        outputs = model4(images)
        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(outputs.data, 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
small_lr_acc = accuracy_score(total_labels, total_preds)
print("Small learning rate model accuracy : ",small_lr_acc) # learning rate를 작게 설정했을 때가 더욱 정확도가 높음

  0%|          | 0/157 [00:00<?, ?it/s]

Small learning rate model accuracy :  0.7581


## 2. Hugging Face를 활용한 pretrained model 사용법

```
💡 목차 개요 : Hugging Face의 pretrained model을 불러오고, 이를 통해 transfer learning을 하는 방법을 IMDB dataset을 통해 배워본다.
```

- 2-1. Hugging Face로 pretrained model 불러오기
- 2-2. Hugging Face을 활용한 전이 학습 실습

###  2-1. Hugging Face로 pretrained model 불러오기

> Hugging Face에 있는 여러 모델 중 원하는 모델을 불러오는 것을 실습합니다.



#### 📝 설명 : Hugging Face에 있는 pretrained model을 불러옵니다.
* Hugging Face 홈페이지에서 BERT 라는 모델을 불러옵니다.
    * BERT는 <a href='https://huggingface.co/docs/transformers/index'>transformers</a> 라이브러리 안에 있습니다

* pip 로 설치 후, library를 불러옵니다.

* <a href='https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification'>BertForSequenceClassification</a> 모델 찾는 법
  * <a href='https://huggingface.co/docs/transformers/index'>transformers</a> 라이브러리에 들어가서 왼쪽에 Models -> Text Models -> 오른쪽 탭에서 모델을 찾을 수가 있습니다.

* pretrained model 찾는 법
  * 보통 구글링을 통해 찾으며, 공식 github 홈페이지를 통해 찾는다.
  * BertForSequenceClassification 모델의 pretrained 모델을 찾기 위해선 <a href='https://github.com/google-research/bert'>Bert 공식 github</a> 를 이용하면 된다.
  * 강의에서 사용한 bert-base-cased 모델은 <a href='https://huggingface.co/bert-base-cased'>hugging face 홈페이지</a>에 있으며, 출판되지 않은 책과 위키피디아의 정보가 담긴 <a href='https://yknzhu.wixsite.com/mbweb'>데이터셋(Bookcorpus)</a> 훈련되었습니다.

📚 참고할만한 자료:
* [Hugging Face Bert] : https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/bert

In [None]:
# Hugging Face의 트랜스포머 모델을 설치
!pip install transformers==4.31.0 -q

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m87.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from transformers import BertForSequenceClassification
BertForSequenceClassification.from_pretrained("bert-base-cased") # BERT로 분류기를 사용하는 모델 불러오기.

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

#### 📝 설명 : BERT의 input으로 어떤 형식이 들어가야할까?
* BertForSequenceClassification 모델을 사용하여 훈련할 때,

### 3-2. Hugging Face을 활용한 전이 학습 실습

> "IMDB" 데이터를 이용하여 BERT를 fine tuning 하여 감정 분류하는 것을 실습합니다.

#### 📝 설명 : IMDB 데이터
* 데이터셋 다운로드 : <a href='https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews'>IMDB Dataset</a>
* 50K 개의 영화 관람후기 데이터로 해당 영화에 대한 감정 라벨 (긍/부정) 주석이 포함됩니다.
* 데이터로 다운로드 한 후, zip 파일을 풀어서 드라이브에 업로드 합니다.
* 데이터셋 원 출처 : https://ai.stanford.edu/~amaas/data/sentiment/
* License :
```
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}
```

In [None]:
# data 불러오기
data = pd.read_csv('IMDB Dataset.csv')
print(data.shape)
data.head()

(50763, 2)


Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [None]:
dic = {'positive':0, 'negative':1} # positive 면 0으로, negative면 1로 변환
data['sentiment'] = data['sentiment'].map(dic)

In [None]:
# data 8:1:1 로 나누기
train, test = train_test_split(data, test_size = .2, random_state = 42)
val, test = train_test_split(test, test_size = .5, random_state = 42)

print("Train 개수: ", len(train))
print("Validation 개수: ", len(val))
print("Test 개수: ", len(test))

Train 개수:  40000
Validation 개수:  5000
Test 개수:  5000


In [None]:
train.reset_index(drop=True, inplace=True) # index 재정렬
val.reset_index(drop=True, inplace=True) # index 재정렬
test.reset_index(drop=True, inplace=True) # index 재정렬

#### 📝 설명 : BERT를 훈련시키기 위한 모델 전처리
<img src='https://img1.daumcdn.net/thumb/R1280x0/?scode=mtistory2&fname=https%3A%2F%2Fblog.kakaocdn.net%2Fdn%2FWFCfe%2FbtqBWZ40Gmc%2F6FkuwsAGN9e7Uudmi03k4k%2Fimg.png'>

* BERT는 input 으로 [CLS] 토큰과 [SEP] 토큰이 각각 문장의 앞,뒤에 들어가있는 형태여야 합니다.

📚 참고할만한 자료:
* [BERT 논문] : https://arxiv.org/abs/1810.04805

In [None]:
train['review'] = train['review'].apply(lambda x: f'[CLS] {x} [SEP]') # 문장의 앞뒤에 [CLS]와 [SEP] 삽입
val['review'] = val['review'].apply(lambda x: f'[CLS] {x} [SEP]') # 문장의 앞뒤에 [CLS]와 [SEP] 삽입
test['review'] = test['review'].apply(lambda x: f'[CLS] {x} [SEP]') # 문장의 앞뒤에 [CLS]와 [SEP] 삽입

train.head()

Unnamed: 0,review,sentiment
0,[CLS] That's what I kept asking myself during ...,1
1,[CLS] I did not watch the entire movie. I coul...,1
2,[CLS] A touching love story reminiscent of In...,0
3,[CLS] This latter-day Fulci schlocker is a tot...,1
4,"[CLS] First of all, I firmly believe that Norw...",1


In [None]:
# 각 문장들만 추출
train_sentences = train['review'].values
val_sentences = val['review'].values
test_sentences = test['review'].values

# 정답값 추출
train_label = train['sentiment'].values
val_label = val['sentiment'].values
test_label = test['sentiment'].values

#### 📝 설명 : HuggingFace Tokenizer
* HuggingFace에서 Tokenizer는 <a href='https://huggingface.co/transformers/v2.11.0/main_classes/tokenizer.html#transformers.PreTrainedTokenizer.convert_tokens_to_ids'>convert_tokens_to_ids</a> 함수를 통해 나눈 토큰을 id로 변환해줍니다.

* 왜 Bert의 tokenizer를 불러올까?
  * tokenizer는 텍스트를 정수 형태의 리스트로 변환한 것입니다. 이를 불러오므로써 기존에 학습된 어휘 사전을 사용할 수 있으므로, 모델이 이해하는 단어와 토큰을 일치시킬 수 있습니다.
  * 만약, Bert tokenizer를 쓰지 않으면, Bert 모델이 이미 기존에 만들어 놓은 단어사전과 매핑되지 않습니다.
    * 예를 들어, 나는 "사과" 라고 입력을 하였는데, Bert 모델이 이해하는 것은 "고양이" 라고 이해할 수 있습니다.

* <a href='https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer'>BertTokenizer</a>도 pretrained weight도 모델과 동일하게 github와 huggingface에서 찾을 수 있습니다.

📚 참고할만한 자료:
* [Hugging Face Bert] : https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/bert
* [Bert Tokenizer] : https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer

In [None]:
# BERT의 tokenizer로 문장을 토큰으로 분리
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-cased') # 기존에 학습된 BERT tokenizer 불러오기

In [None]:
tokenizer.tokenize(train_sentences[0])[:10] # BERT tokenizer 결과

['[CLS]', 'That', "'", 's', 'what', 'I', 'kept', 'asking', 'myself', 'during']

In [None]:
train_tokenized_texts = list(map(lambda x: tokenizer.tokenize(x), train_sentences))
val_tokenized_texts = list(map(lambda x: tokenizer.tokenize(x), val_sentences))
test_tokenized_texts = list(map(lambda x: tokenizer.tokenize(x), test_sentences))

In [None]:
# 입력 토큰의 최대 시퀀스 길이
MAX_LEN = 128

# 토큰을 숫자 인덱스로 변환
train_input_ids = list(map(lambda x: tokenizer.convert_tokens_to_ids(x), train_tokenized_texts)) # convert_tokens_to_ids로 정수 형태로 변환해주기
val_input_ids = list(map(lambda x: tokenizer.convert_tokens_to_ids(x), val_tokenized_texts))
test_input_ids = list(map(lambda x: tokenizer.convert_tokens_to_ids(x), test_tokenized_texts))

In [None]:
# 문장을 MAX_LEN 길이에 맞게 자르고, 모자란 부분을 패딩 0으로 채움
def zero_padding(id_list,max_len):
    return np.array([i[:max_len] if len(i) >= max_len else i + [0] * (max_len - len(i)) for i in id_list])

train_input_ids = zero_padding(train_input_ids, MAX_LEN)
val_input_ids = zero_padding(val_input_ids, MAX_LEN)
test_input_ids = zero_padding(test_input_ids, MAX_LEN)

In [None]:
train_input_ids[0]

array([  101,  1337,   112,   188,  1184,   146,  2023,  4107,  1991,
        1219,  1103,  1242,  9718,   117,  7406,  2697,   117,  8222,
        1158,  1105,  1704,  1336, 15391,  1115,  1679,  3263,  2193,
        1103,  5731,  1904,   119,  1109, 21329,  1145,  2484,  1146,
        1165,  1128,  1341,  1104,  1103,  1141,   118,  8611,  2650,
         117,  1150,  1138,  1177,  1376,  5415,  1115,  1122,  1110,
        9024,  4763,  1106,  1920,  1184,  5940,  1106,  1172,   119,
        1220,  1132,  1198,  6118,  1637,   172,  1183, 15940,  1116,
        1111,  1103,  1900,  1106,  7311,  1117,  4321, 19418,  8810,
        1113,   117,   170,  8366,  1115,  1144,  1151,  1694,  1277,
        1618,  1107,  1168, 18282,  1241,  1113,  1794,  1105,  1103,
        7678,   119,   133,  9304,   120,   135,   133,  9304,   120,
         135,   146,  1538, 20989,   117,   146,   112,   182,  1136,
        1541,  1141,  1111,  3205,  1916,  2213,  3853,  1219,   170,
        1273,   117]

#### 📝 설명 : Mask란?
* Mask란?
  * 입력 배열의 토큰 중 어떤 부분이 실제 입력이고, 어떤 부분이 패딩(0으로 채워진 부분)인지를 나타내는 이진 마스크입니다.
  * 패딩이면 0, 아니면 1로 구성됩니다.

📚 참고할만한 자료:
* [Transformer 논문] : https://arxiv.org/abs/1706.03762
* [BERT 논문] : https://arxiv.org/abs/1810.04805

In [None]:
# 마스크 만들기
train_masks = train_input_ids > 0 # 패딩이 아닌 부분은 0보다 큰 값이 있으므로 flag를 통해서 마스크를 구성할 수 있습니다.
val_masks = val_input_ids > 0
test_masks = test_input_ids > 0

train_masks[0]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True]

#### 📝 설명 : Dataset 만들기
* train_input 과 label 그리고 mask까지 모두 반환하는 데이터셋 클래스를 만듭니다.
* 모두 tensor 형태로 변환하여 데이터셋 클래스로 구성합니다.

📚 참고할만한 자료:
* [TensorDataset]: https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset

In [None]:
# 모두 tensor로 변환
train_inputs = torch.tensor(train_input_ids) # train set의 input token id들
train_labels = torch.tensor(train_label) # train set의 label들
train_masks = torch.tensor(train_masks) # train set의 mask

validation_inputs = torch.tensor(val_input_ids) # valid set의 input token id들
validation_labels = torch.tensor(val_label) # valid set의 label들
validation_masks = torch.tensor(val_masks) # valid set의 mask

test_inputs = torch.tensor(test_input_ids) # test set의 input token id들
test_labels = torch.tensor(test_label) # test set의 label들
test_masks = torch.tensor(test_masks) # test set의 mask

In [None]:
class EmotionData(torch.utils.data.Dataset): # custom 데이터셋 구성
    def __init__(self, inputs, masks, labels):
        self.inputs = inputs
        self.masks = masks
        self.labels = labels

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self,idx):
        inputs_value = self.inputs[idx]
        masks_value = self.masks[idx]
        labels_value = self.labels[idx]
        return inputs_value, masks_value, labels_value

In [None]:
train_dataset = EmotionData(train_inputs, train_masks, train_labels)
valid_dataset = EmotionData(validation_inputs, validation_masks, validation_labels)
test_dataset = EmotionData(test_inputs, test_masks, test_labels)

In [None]:
BATCH_SIZE = 32
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size = BATCH_SIZE, shuffle = True, drop_last = False, num_workers = 8)
valid_dataloader = torch.utils.data.DataLoader(valid_dataset, batch_size = BATCH_SIZE, shuffle = False, drop_last = False, num_workers = 8)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size = BATCH_SIZE, shuffle = False, drop_last = False, num_workers = 8)

#### 📝 설명 : BERT 모델 불러오기
* Hugging Face의 transformer 라이브러리에 BertForSequenceClassification 모델을 불러옵니다.

📚 참고할만한 자료:
* [BertForSequenceClassification](https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/bert#transformers.BertForSequenceClassification)

In [None]:
model = BertForSequenceClassification.from_pretrained("bert-base-cased").to(device) # pretrained bert 모델 불러오기

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi

#### 📝 설명 : BERT classification model로 fine tuning 하기
*  BERT 모델을 fine tuning 하여 classification을 하는 모델을 구축합니다.
* BertForSequenceClassification 모델은 output으로 loss와 logits 등을 반환하며 loss는 CrossEntropy를 사용한 정답값과 예측값의 차이이고, logits은 해당 클래스일 확률을 의미합니다.
* mask 값을 넣어서 계산할 부분 (padding 처리 안한 부분 != 0) 과 안할 부분 (padding 처리 한 부분 == 0)을 미리 지정해주어서, 모델의 계산 속도를 더욱 빠르게 합니다.

📚 참고할만한 자료:
* [BertForSequenceClassification](https://huggingface.co/docs/transformers/v4.30.0/en/model_doc/bert#transformers.BertForSequenceClassification)

In [None]:
# training 코드, evaluation 코드, training_loop 코드
def training(model, dataloader, train_dataset, optimizer, device, epoch, num_epochs):
    model.train()  # 모델을 학습 모드로 설정
    train_loss = 0.0
    train_accuracy = 0

    tbar = tqdm(dataloader)
    for batch in tbar:
        input_ = batch[0].to(device)
        mask = batch[1].to(device)
        labels = batch[2].to(device)

        # 순전파
        output = model(input_,
                        attention_mask= mask,
                        labels=labels)

        loss = output['loss'] # 얘 확인

        # 역전파 및 가중치 업데이트
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # 손실과 정확도 계산
        train_loss += loss.item()
        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(output['logits'], 1)
        train_accuracy += (predicted == labels).sum().item()

        # tqdm의 진행바에 표시될 설명 텍스트를 설정
        tbar.set_description(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {loss.item():.4f}")

    # 에폭별 학습 결과 출력
    train_loss = train_loss / len(dataloader)
    train_accuracy = train_accuracy / len(train_dataset)

    return model, train_loss, train_accuracy

def evaluation(model, dataloader, val_dataset, device, epoch, num_epochs):
    model.eval()  # 모델을 평가 모드로 설정
    valid_accuracy = 0

    with torch.no_grad(): # model의 업데이트 막기
        tbar = tqdm(dataloader)
        for batch in tbar:
            input_ = batch[0].to(device)
            mask = batch[1].to(device)
            labels = batch[2].to(device)
            # 순전파
            output = model(input_,
                            attention_mask= mask,
                            labels=labels)

            # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
            _, predicted = torch.max(output['logits'], 1)
            valid_accuracy += (predicted == labels).sum().item()

            # tqdm의 진행바에 표시될 설명 텍스트를 설정
            tbar.set_description(f"Epoch [{epoch+1}/{num_epochs}]")

    valid_accuracy = valid_accuracy / len(val_dataset)

    return model, valid_accuracy


def training_loop(model, train_dataloader, valid_dataloader, train_dataset, val_dataset, optimizer, device, num_epochs, model_name):
    best_valid_loss = float('inf')  # 가장 좋은 validation loss를 저장
    valid_max_accuracy = -1

    for epoch in range(num_epochs):
        model, train_loss, train_accuracy = training(model, train_dataloader, train_dataset, optimizer, device, epoch, num_epochs)
        model, valid_accuracy = evaluation(model, valid_dataloader, val_dataset, device, epoch, num_epochs)

        if valid_accuracy > valid_max_accuracy:
            valid_max_accuracy = valid_accuracy
            torch.save(model.state_dict(), f"./model_{model_name}.pt")

        print(f"Epoch [{epoch + 1}/{num_epochs}], Train Loss: {train_loss:.4f}, Train Accuracy: {train_accuracy:.4f}, Valid Accuracy: {valid_accuracy:.4f}")

    return model, valid_max_accuracy

In [None]:
# 모델 전체 fine tuning
num_epochs = 2
model_name = 'bert1'
lr = 1e-5
optimizer = optim.Adam(model.parameters(), lr=lr)
model, valid_max_accuracy = training_loop(model, train_dataloader, valid_dataloader, train_dataset, valid_dataset, optimizer, device, num_epochs, model_name)
print('Valid max accuracy : ', valid_max_accuracy)

  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/2], Train Loss: 0.3587, Train Accuracy: 0.8356, Valid Accuracy: 0.8744


  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/2], Train Loss: 0.2380, Train Accuracy: 0.9036, Valid Accuracy: 0.8890
Valid max accuracy :  0.889


In [None]:
model.load_state_dict(torch.load("./model_bert1.pt")) # 모델 불러오기
model = model.to(device)
model.eval()
total_labels = []
total_preds = []
total_probs = []
with torch.no_grad():
    for batch in tqdm(test_dataloader):
        input_ = batch[0].to(device)
        mask = batch[1].to(device)
        labels = batch[2].to(device)
        output = model(input_,
                attention_mask= mask,
                labels=labels)


        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(output['logits'], 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())
        total_probs.append(output['logits'].detach().cpu().numpy())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
total_probs = np.concatenate(total_probs, axis= 0)
acc = accuracy_score(total_labels, total_preds)
print("Full fine tuning model accuracy : ",acc)

  0%|          | 0/157 [00:00<?, ?it/s]

Full fine tuning model accuracy :  0.8814


In [None]:
model =  BertForSequenceClassification.from_pretrained("bert-base-cased").to(device)
for para in model.parameters(): # 모든 layer freeze 하기
    para.requires_grad = False
for name, param in model.named_parameters(): # fc layer 만 학습하기
    if name in 'classifier.weight':
        param.requires_grad = True

num_epochs = 2
model_name = 'bert2'

optimizer = optim.Adam(model.parameters(), lr=lr)
model, valid_max_accuracy = training_loop(model, train_dataloader, valid_dataloader, train_dataset, valid_dataset, optimizer, device, num_epochs, model_name)
print('Valid max accuracy : ', valid_max_accuracy)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi

  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/2], Train Loss: 0.6893, Train Accuracy: 0.5392, Valid Accuracy: 0.5762


  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/2], Train Loss: 0.6839, Train Accuracy: 0.5563, Valid Accuracy: 0.5822
Valid max accuracy :  0.5822


In [None]:
model.load_state_dict(torch.load("./model_bert2.pt")) # 모델 불러오기
model = model.to(device)
model.eval()
total_labels = []
total_preds = []
total_probs = []
with torch.no_grad():
    for batch in tqdm(test_dataloader):
        input_ = batch[0].to(device)
        mask = batch[1].to(device)
        labels = batch[2].to(device)
        output = model(input_,
                attention_mask= mask,
                labels=labels)


        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(output['logits'], 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())
        total_probs.append(output['logits'].detach().cpu().numpy())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
total_probs = np.concatenate(total_probs, axis= 0)
acc = accuracy_score(total_labels, total_preds)
print("Only FC layer tunning model accuracy : ", acc)

  0%|          | 0/157 [00:00<?, ?it/s]

Only FC layer tunning model accuracy :  0.5968


In [None]:
# learning rate 를 기존보다 높게 설정
model =  BertForSequenceClassification.from_pretrained("bert-base-cased").to(device)
num_epochs = 2
model_name = 'bert3'
lr = 1e-4
optimizer = optim.Adam(model.parameters(), lr=lr)
model, valid_max_accuracy = training_loop(model, train_dataloader, valid_dataloader, train_dataset, valid_dataset, optimizer, device, num_epochs, model_name)
print('Valid max accuracy : ', valid_max_accuracy)

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initi

  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [1/2], Train Loss: 0.7043, Train Accuracy: 0.4998, Valid Accuracy: 0.4998


  0%|          | 0/1250 [00:00<?, ?it/s]

  0%|          | 0/157 [00:00<?, ?it/s]

Epoch [2/2], Train Loss: 0.7038, Train Accuracy: 0.4979, Valid Accuracy: 0.5002
Valid max accuracy :  0.5002


In [None]:
model.load_state_dict(torch.load("./model_bert3.pt")) # 모델 불러오기
model = model.to(device)
model.eval()
total_labels = []
total_preds = []
total_probs = []
with torch.no_grad():
    for batch in tqdm(test_dataloader):
        input_ = batch[0].to(device)
        mask = batch[1].to(device)
        labels = batch[2].to(device)
        output = model(input_,
                attention_mask= mask,
                labels=labels)


        # torch.max에서 dim 인자에 값을 추가할 경우, 해당 dimension에서 최댓값과 최댓값에 해당하는 인덱스를 반환
        _, predicted = torch.max(output['logits'], 1)

        total_preds.extend(predicted.detach().cpu().tolist())
        total_labels.extend(labels.tolist())
        total_probs.append(output['logits'].detach().cpu().numpy())

total_preds = np.array(total_preds)
total_labels = np.array(total_labels)
total_probs = np.concatenate(total_probs, axis= 0)
acc = accuracy_score(total_labels, total_preds)
print("Larger learning rate accuracy : ", acc)

  0%|          | 0/157 [00:00<?, ?it/s]

Only FC layer tunning model accuracy :  0.5076


#Reference
> <b><font color = green>(📒가이드)
- <a href='https://towardsdatascience.com/getting-started-with-pytorch-image-models-timm-a-practitioners-guide-4e77b4bf9055'>timm guide blog</a>

## Required Package

> torch == 2.0.1

> transformers==4.31.0

> timm==0.9.2


## 콘텐츠 라이선스

저작권 : <font color='blue'> <b> ©2023 by Upstage X fastcampus Co., Ltd. All rights reserved.</font></b>

<font color='red'><b>WARNING</font> : 본 교육 콘텐츠의 지식재산권은 업스테이지 및 패스트캠퍼스에 귀속됩니다. 본 콘텐츠를 어떠한 경로로든 외부로 유출 및 수정하는 행위를 엄격히 금합니다. </b>