<a href="https://colab.research.google.com/github/oilportrait/test_colab/blob/main/resnetPractice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle
!chmod 600 ~/.kaggle/kaggle.json

In [3]:
!kaggle datasets download -d mikoajfish99/carrots-vs-rockets-image-classification

Downloading carrots-vs-rockets-image-classification.zip to /content
100% 90.0M/90.2M [00:02<00:00, 43.1MB/s]
100% 90.2M/90.2M [00:02<00:00, 34.1MB/s]


In [None]:
!mkdir sample
!unzip carrots-vs-rockets-image-classification.zip -d ./sample/

In [None]:
! pip install transformers datasets

이미지 데이터를 어떻게 전처리할지 규정합니다.

In [6]:
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader, random_split

transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    # resnet은 Imagenet기반으로 학습되었기에 ImageNet의 평균과 표준편차로 Standardization을 수행합니다.
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

In [7]:
# 규정한 전처리로 이미지데이터를 불러오며 가공합니다.
data = ImageFolder(root='./sample/Images/', transform=transform)

Train, validation, test로 데이터를 분류합니다.

In [8]:
trainProportion = 0.7
valProportion = 0.2

totalSize = len(data)
trainSize = int(trainProportion * totalSize)
valSize = int(valProportion * totalSize)
testSize = totalSize - trainSize - valSize

trainData, valData, testData = random_split(data, [trainSize, valSize, testSize])

데이터 세트를 어떻게 이용할것인지 규정해놓습니다.

In [9]:
trainLoader = DataLoader(trainData, batch_size=32, shuffle=True)
valLoader = DataLoader(valData, batch_size=32, shuffle=False)
testLoader = DataLoader(testData, batch_size=32, shuffle=False)

Pre-trained된 모델인 Resnet50을 가져옵니다.

In [10]:
import torch.nn as nn
import torchvision.models as models
model = models.resnet50(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 2)

Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 297MB/s]


Case1. 불러온 모델을 파인 튜닝하지 않고 바로 테스트 해봅니다.

In [12]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # GPU 사용 여부 확인
model.to(device)
model.eval()  # 평가시의 일관성을 위해서 평가 모드를 설정해놓습니다.
correct_predictions = 0 # 올바른 예측을 측정하기 위한 변수

with torch.no_grad(): # 평가중에는 기울기 계산이 불필요합니다.
    for inputs, labels in testLoader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = model(inputs) # 모델에 데이터를 넣고 결과값을 받습니다.
        _, preds = torch.max(outputs, 1)
        correct_predictions += torch.sum(preds == labels.data) # 예측값의 정답여부에 따라 위에서 설정한 변수를 업데이트 합니다.

test_accuracy = correct_predictions.double() / len(testData) # 정답률을 계산합니다.
print(f"Test Accuracy: {test_accuracy}")


Test Accuracy: 0.3870967741935484


파인 튜닝을 하지 않았을때 절반도 맞추지 못했습니다

Case2. 불러온 모델을 파인 튜닝해보겠습니다.

파인 튜닝시 사용할 최적화 방식을 규정해 놓습니다.

In [13]:
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

직접 모델을 훈련시키고 validation도 수행합니다.

In [18]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# earlt stopping을 위한 변수와 에포크 횟수를 정합니다.
num_epochs = 20
best_val_accuracy = 0
patience_counter = 0
max_patience = 5

for epoch in range(num_epochs):
    model.train()
    for inputs, labels in trainLoader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad() # 기울기를 누적하지 않습니다.
        outputs = model(inputs)
        loss = criterion(outputs, labels) # 정답과의 차이를 계산합니다.
        loss.backward() #  backpropagation을 이용해서 기울기를 계산합니다.
        optimizer.step() # 모델의 파라미터를 업데이트합니다.

    model.eval()
    total_val_loss = 0
    correct_val_predictions = 0
    with torch.no_grad():
        for inputs, labels in valLoader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            total_val_loss += loss.item()
            _, preds = torch.max(outputs, 1)
            correct_val_predictions += torch.sum(preds == labels.data)
    val_accuracy = correct_val_predictions.double() / valSize
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}, Validation Loss: {total_val_loss}, Validation Accuracy: {val_accuracy}")

    # earlt stopping을 설정합니다.
    if val_accuracy > best_val_accuracy:
        best_val_accuracy = val_accuracy
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= max_patience:
            print("너 데이터 학습을 안하고 암기를 해? 멈춰")
            break


Epoch 1/20, Loss: 0.1419064849615097, Validation Loss: 0.4750799685716629, Validation Accuracy: 0.9180327868852459
Epoch 2/20, Loss: 0.6434555053710938, Validation Loss: 1.3048861026763916, Validation Accuracy: 0.8688524590163935
Epoch 3/20, Loss: 0.11024395376443863, Validation Loss: 0.4260857477784157, Validation Accuracy: 0.9344262295081968
Epoch 4/20, Loss: 0.15463115274906158, Validation Loss: 0.41519223153591156, Validation Accuracy: 0.9508196721311476
Epoch 5/20, Loss: 0.6555197238922119, Validation Loss: 5.148508310317993, Validation Accuracy: 0.819672131147541
Epoch 6/20, Loss: 0.34422945976257324, Validation Loss: 2.7084851264953613, Validation Accuracy: 0.8032786885245902
Epoch 7/20, Loss: 0.11238492280244827, Validation Loss: 0.9186012223362923, Validation Accuracy: 0.8524590163934427
Epoch 8/20, Loss: 0.010279589332640171, Validation Loss: 0.13916368130594492, Validation Accuracy: 0.9672131147540984
Epoch 9/20, Loss: 0.10661259293556213, Validation Loss: 0.1906706616282463

파인 튜닝된 모델로 평가를 해봅니다.

In [19]:
model.eval()
total_test_loss = 0
correct_test_predictions = 0
with torch.no_grad():
    for inputs, labels in testLoader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        total_test_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct_test_predictions += torch.sum(preds == labels.data) # 맞는 경우를
test_accuracy = correct_test_predictions.double() / testSize
print(f"Test Loss: {total_test_loss}, Test Accuracy: {test_accuracy}")


Test Loss: 0.17484506964683533, Test Accuracy: 0.967741935483871


파인튜닝을 하니 큰폭으로 정답을 맞추는 횟수가 늘어났습니다.