<a href="https://colab.research.google.com/github/mystlee/2024_CSU_AI/blob/main/chapter5/transfer_learning_example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. 필요한 라이브러리 설치

먼저 필요한 라이브러리를 설치
- torch: pytorch 라이브러리
- torchvision: 영상처리와 관련된 pytorch 라이브러리

In [None]:
!pip install torch torchvision



## 2. 데이터셋 준비   
CIFAR-10 데이터셋을 사용하여 모델을 학습   
데이터 전처리를 위해 transforms를 정의, 데이터셋을 로드   
- transform은 transformer가 아니라 데이터를 변환
- torch에서 제공하는 transform을 이용해서 데이터 증강 (augmentation)도 가능함!

다양한 데이터 증강 기법은 아래 사이트 참고!   
https://pytorch.org/vision/0.9/transforms.html   



In [None]:
import torch
from torchvision import datasets, transforms

# 데이터 전처리
transform_train = transforms.Compose([
    transforms.Resize((224, 224)),  # ResNet의 입력 크기에 맞게 조정
    transforms.RandomHorizontalFlip(),  # 데이터 증강: 좌우 반전
    transforms.ToTensor(), # pytorch 포맷에 맞게 변환
    transforms.Normalize(mean = [0.485, 0.456, 0.406],
                         std = [0.229, 0.224, 0.225]), # 입력 데이터 정규화
])

transform_val = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean = [0.485, 0.456, 0.406],
                         std = [0.229, 0.224, 0.225]),
])

# 훈련 및 검증 데이터셋 로드
train_dataset = datasets.CIFAR10(root = './data', train = True, download = True, transform = transform_train)
val_dataset = datasets.CIFAR10(root = './data', train = False, download = True, transform = transform_val)


Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:13<00:00, 13.1MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


## 3. 데이터로더 준비   
훈련 및 검증 데이터로더를 생성하고, 배치 크기와 셔플 여부를 설정

In [None]:
from torch.utils.data import DataLoader

batch_size = 32

train_loader = DataLoader(train_dataset, batch_size = batch_size, shuffle = True, num_workers = 4)
val_loader = DataLoader(val_dataset, batch_size = batch_size, shuffle = False, num_workers = 4)


## 4. 모델 로드 및 수정   
torchvision의 사전 학습된 ResNet-50 모델을 로드   
마지막 완전 연결층(fc layer)을 CIFAR-10의 클래스 수에 맞게 수정   
https://pytorch.org/vision/stable/models.html#general-information-on-pre-trained-weights

In [None]:
import torchvision.models as models
import torch.nn as nn

# model = models.resnet50(pretrained = True)
model = torch.hub.load("pytorch/vision", "resnet50", weights = "IMAGENET1K_V2")
# model = models.resnet50(pretrained = False)

# 마지막 완전 연결층 수정
num_ftrs = model.fc.in_features
num_classes = 10  # CIFAR-10의 클래스 수
model.fc = nn.Linear(num_ftrs, num_classes)

# 모델을 GPU로 이동 (가능한 경우)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

Downloading: "https://github.com/pytorch/vision/zipball/main" to /root/.cache/torch/hub/main.zip
Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 227MB/s]


## 5. 손실 함수 및 옵티마이저 설정   
손실 함수로 교차 엔트로피 손실(CrossEntropyLoss)을 사용   
옵티마이저로 Adam을 설정 (파인튜닝 시에는 보통 학습률을 낮게 설정!)

In [None]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = 1e-4)

## 6. 학습 및 검증 루프 정의   
훈련과 검증을 위한 함수를 정의   
각 에포크(epoch)마다 모델을 학습시키고 검증 데이터를 통해 성능을 평가

In [None]:
def train_one_epoch(model, dataloader, criterion, optimizer, device, log_interval=1000):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    step = 0  # 현재 에포크 내 스텝 수

    for inputs, labels in dataloader:
        step += 1
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)

        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

        # 전체 스텝 수는 외부에서 관리
        if (step % log_interval == 0):
            current_loss = running_loss / total
            current_acc = correct / total
            print(f'Step {step}: Train Loss: {current_loss:.4f} | Train Acc: {current_acc:.4f}')

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

def evaluate(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():
        for inputs, labels in dataloader:
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item() * inputs.size(0)

            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

## 7. 모델 학습
정의한 함수를 사용하여 모델을 여러 에포크 동안 학습시킴   
각 에포크마다 훈련 손실 및 정확도, 검증 손실 및 정확도를 출력

In [None]:
import copy

num_epochs = 10
log_interval = 100  # 1000 스텝마다 로그 출력

best_val_acc = 0.0
best_model_wts = copy.deepcopy(model.state_dict())
patience = 5
trigger_times = 0

# 전체 스텝 수를 추적하기 위해 카운터 설정
global_step = 0

for epoch in range(num_epochs):
    print(f'Epoch {epoch + 1}/{num_epochs}')
    print('-' * 30)

    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for step, (inputs, labels) in enumerate(train_loader, 1):
        global_step += 1
        inputs = inputs.to(device)
        labels = labels.to(device)

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * inputs.size(0)

        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)

        if global_step % log_interval == 0:
            current_loss = running_loss / total
            current_acc = correct / total
            print(f'Step {global_step}: Train Loss: {current_loss:.4f} | Train Acc: {current_acc:.4f}')

    # 에포크 종료 후 전체 에포크 손실 및 정확도 계산
    epoch_loss = running_loss / len(train_loader.dataset)
    epoch_acc = correct / len(train_loader.dataset)

    # 검증 단계
    val_loss, val_acc = evaluate(model, val_loader, criterion, device)

    print(f'Epoch {epoch + 1} Summary')
    print(f'Train Loss: {epoch_loss:.4f} | Train Acc: {epoch_acc:.4f}')
    print(f'Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}')
    print('-' * 30)

    # 조기 종료 및 최적 모델 저장
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        best_model_wts = copy.deepcopy(model.state_dict())
        trigger_times = 0
    else:
        trigger_times += 1
        if trigger_times >= patience:
            print('Early stopping!')
            break


Epoch 1/10
------------------------------
Step 100: Train Loss: 1.3369 | Train Acc: 0.6162
Step 200: Train Loss: 0.9005 | Train Acc: 0.7356
Step 300: Train Loss: 0.7239 | Train Acc: 0.7843
Step 400: Train Loss: 0.6254 | Train Acc: 0.8113
Step 500: Train Loss: 0.5572 | Train Acc: 0.8310
Step 600: Train Loss: 0.5109 | Train Acc: 0.8438
Step 700: Train Loss: 0.4752 | Train Acc: 0.8538
Step 800: Train Loss: 0.4455 | Train Acc: 0.8621
Step 900: Train Loss: 0.4227 | Train Acc: 0.8679
Step 1000: Train Loss: 0.4032 | Train Acc: 0.8738
Step 1100: Train Loss: 0.3861 | Train Acc: 0.8784
Step 1200: Train Loss: 0.3704 | Train Acc: 0.8833
Step 1300: Train Loss: 0.3571 | Train Acc: 0.8872
Step 1400: Train Loss: 0.3457 | Train Acc: 0.8906
Step 1500: Train Loss: 0.3359 | Train Acc: 0.8936
Epoch 1 Summary
Train Loss: 0.3305 | Train Acc: 0.8953
Val Loss: 0.1360 | Val Acc: 0.9544
------------------------------
Epoch 2/10
------------------------------
Step 1600: Train Loss: 0.1269 | Train Acc: 0.9603
Step