# SOI1010 Machine Learning II - Assignment #3
*   **Name**: Dongmin Kim
*   **Dep**: Automotive Enginnering
*   **ID**: 2021048140
*   **Due**: Dec. 1, 2025



 # Kaggle Competition: Artworks Artist Classification

### Setup Code

In [1]:
from google.colab import files
files.upload()   # kaggle.json


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"kmin2426","key":"f16ba8fc2e26b0dda4cc8820adb03b47"}'}

In [2]:
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json


In [3]:
!kaggle competitions download -c artworks-artist-classification-soi-1010-2025


Downloading artworks-artist-classification-soi-1010-2025.zip to /content
 91% 959M/1.03G [00:08<00:02, 40.8MB/s]
100% 1.03G/1.03G [00:08<00:00, 133MB/s]


In [4]:
!unzip artworks-artist-classification-soi-1010-2025.zip -d data


[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
  inflating: data/train/train/bor.jpg  
  inflating: data/train/train/bov.jpg  
  inflating: data/train/train/bow.jpg  
  inflating: data/train/train/boy.jpg  
  inflating: data/train/train/bpf.jpg  
  inflating: data/train/train/bpi.jpg  
  inflating: data/train/train/bpj.jpg  
  inflating: data/train/train/bpk.jpg  
  inflating: data/train/train/bpn.jpg  
  inflating: data/train/train/bpp.jpg  
  inflating: data/train/train/bpt.jpg  
  inflating: data/train/train/bpx.jpg  
  inflating: data/train/train/bpz.jpg  
  inflating: data/train/train/bqc.jpg  
  inflating: data/train/train/bqh.jpg  
  inflating: data/train/train/bqi.jpg  
  inflating: data/train/train/bqk.jpg  
  inflating: data/train/train/bqn.jpg  
  inflating: data/train/train/bqp.jpg  
  inflating: data/train/train/bqq.jpg  
  inflating: data/train/train/bqu.jpg  
  inflating: data/train/train/brb.jpg  
  inflating: data/train/train/brd.jpg  
  inflating: data/train/train/

### Introduction

In [5]:
import os
import random
import numpy as np
import pandas as pd
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

from sklearn.model_selection import StratifiedKFold
import wandb


In [6]:
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(DEVICE)

cuda


In [14]:
# Hyperparameters
N_SPLITS = 4
NUM_EPOCHS = 100
BATCH_SIZE = 32

BASE_LR = 6e-4
WEIGHT_DECAY = 1e-4
LABEL_SMOOTH = 0.05

RESOLUTION = 288
DROP_OUT = 0.3

# Column Names
IMAGE_COL = "id"
LABEL_COL = "artist"

In [24]:
# Path
CSV_PATH = "/content/data/train.csv"
IMG_DIR  = "/content/data/train/train"

TEST_CSV_PATH = "/content/data/test.csv"
TEST_IMG_DIR  = "/content/data/test/test"

In [9]:
# checkpoint folder
os.makedirs("checkpoints", exist_ok=True)

In [10]:
# Seed
def set_seed(seed: int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)


# wandb
if not wandb.run:
    wandb.init(project="artist-classification", name="scratch_resnet50_kfold_288_colab")


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mehdals1199[0m ([33mehdals1199-hanyang-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [11]:
# Load CSV
df = pd.read_csv(CSV_PATH)

label2idx = {label: i for i, label in enumerate(sorted(df[LABEL_COL].unique()))}
idx2label = {v: k for k, v in label2idx.items()}
df["label_idx"] = df[LABEL_COL].map(label2idx)
num_classes = len(label2idx)

In [12]:
# Dataset
class ArtDataset(Dataset):
    def __init__(self, df, img_dir, image_col, label_col_idx, transform=None):
        self.df = df.reset_index(drop=True)
        self.img_dir = img_dir
        self.image_col = image_col
        self.label_col_idx = label_col_idx
        self.transform = transform
        self.candidate_exts = [".jpg", ".png", ".jpeg", ".JPG", ".PNG", ".JPEG"]

    def _get_img_path(self, img_id: str):
        for ext in self.candidate_exts:
            path = os.path.join(self.img_dir, img_id + ext)
            if os.path.exists(path):
                return path
        return os.path.join(self.img_dir, img_id)

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]

        img_id = str(row[self.image_col])
        img_path = self._get_img_path(img_id)

        image = Image.open(img_path).convert("RGB")
        label = row[self.label_col_idx]

        if self.transform:
            image = self.transform(image)

        return image, label

In [13]:
# Augmentation: Using ImageNet mean & std
train_tf = transforms.Compose([
    transforms.RandomResizedCrop(RESOLUTION, scale=(0.7, 1.0), ratio=(0.75, 1.33)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.05),
    transforms.RandomRotation(15),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std =[0.229, 0.224, 0.225]),
])

eval_tf = transforms.Compose([
    transforms.Resize((320, 320)),
    transforms.CenterCrop(RESOLUTION),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std =[0.229, 0.224, 0.225]),
])

### ResNet-50

In [15]:
# ResNet50
class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()

        # 1x1 (reduce)
        self.conv1 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=1,
            bias=False
        )
        self.bn1 = nn.BatchNorm2d(out_channels)

        # 3x3
        self.conv2 = nn.Conv2d(
            out_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False
        )
        self.bn2 = nn.BatchNorm2d(out_channels)

        # 1x1 (expand)
        self.conv3 = nn.Conv2d(
            out_channels,
            out_channels * self.expansion,
            kernel_size=1,
            bias=False
        )
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)

        self.relu = nn.ReLU(inplace=True)

        # Downsample: F(x) + x
        self.downsample = None
        if stride != 1 or in_channels != out_channels * self.expansion:
            self.downsample = nn.Sequential(
                nn.Conv2d(
                    in_channels,
                    out_channels * self.expansion,
                    kernel_size=1,
                    stride=stride,
                    bias=False
                ),
                nn.BatchNorm2d(out_channels * self.expansion)
            )

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out = out + identity
        out = self.relu(out)

        return out

In [16]:
class ResNet50Scratch(nn.Module):
    def __init__(self, block, layers, num_classes):
        super().__init__()
        self.in_channels = 64

        # stem
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        # stages
        self.layer1 = self._make_layer(block, 64, layers[0], stride=1)
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout(p=DROP_OUT)
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        self._init_weights()

    def _make_layer(self, block, out_channels, blocks, stride):
        layers = []
        layers.append(block(self.in_channels, out_channels, stride=stride))
        self.in_channels = out_channels * block.expansion

        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels, stride=1))

        return nn.Sequential(*layers)

    def _init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1.0)
                nn.init.constant_(m.bias, 0.0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0.0)

    def forward(self, x):
        x = self.conv1(x)  # (B, 64, H/2, W/2)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)  # (B, 64, H/4, W/4)

        x = self.layer1(x)  # 256
        x = self.layer2(x)  # 512
        x = self.layer3(x)  # 1024
        x = self.layer4(x)  # 2048

        x = self.avgpool(x)  # (B, 2048, 1, 1)
        x = torch.flatten(x, 1)  # (B, 2048)
        x = self.dropout(x)
        x = self.fc(x)
        return x

In [17]:
def resnet50_scratch(num_classes):
    # ResNet50: [3, 4, 6, 3]
    return ResNet50Scratch(Bottleneck, [3, 4, 6, 3], num_classes=num_classes)


### Training

In [18]:
# Train & Evaluation
def train_one_epoch(model, loader, criterion, optimizer, epoch, fold):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in loader:
        images = images.to(DEVICE)
        labels = labels.to(DEVICE)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item() * images.size(0)
        _, preds = outputs.max(1)
        correct += preds.eq(labels).sum().item()
        total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    wandb.log({
        f"fold{fold}/train_loss": epoch_loss,
        f"fold{fold}/train_acc": epoch_acc,
        "epoch": epoch
    })

    return epoch_loss, epoch_acc

In [19]:
@torch.no_grad()
def evaluate(model, loader, criterion, epoch, fold, phase="val"):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0

    for images, labels in loader:
        images = images.to(DEVICE)
        labels = labels.to(DEVICE)

        outputs = model(images)
        loss = criterion(outputs, labels)

        running_loss += loss.item() * images.size(0)
        _, preds = outputs.max(1)
        correct += preds.eq(labels).sum().item()
        total += labels.size(0)

    epoch_loss = running_loss / total
    epoch_acc = correct / total

    wandb.log({
        f"fold{fold}/{phase}_loss": epoch_loss,
        f"fold{fold}/{phase}_acc": epoch_acc,
        "epoch": epoch
    })

    return epoch_loss, epoch_acc

### Cross-Validation

In [20]:
# K-Fold
def run_kfold_training():
    skf = StratifiedKFold(
        n_splits=N_SPLITS,
        shuffle=True,
        random_state=42
    )

    all_fold_best_acc = []

    targets = df["label_idx"].values

    for fold, (train_idx, val_idx) in enumerate(skf.split(df, targets)):
        print(f"\n######### Fold {fold} #########")
        wandb.log({"current_fold": fold})

        train_df = df.iloc[train_idx]
        val_df   = df.iloc[val_idx]

        train_dataset = ArtDataset(
            train_df, IMG_DIR,
            image_col=IMAGE_COL,
            label_col_idx="label_idx",
            transform=train_tf
        )
        val_dataset = ArtDataset(
            val_df, IMG_DIR,
            image_col=IMAGE_COL,
            label_col_idx="label_idx",
            transform=eval_tf
        )

        train_loader = DataLoader(
            train_dataset,
            batch_size=BATCH_SIZE,
            shuffle=True,
            num_workers=2,
            pin_memory=True
        )
        val_loader = DataLoader(
            val_dataset,
            batch_size=BATCH_SIZE,
            shuffle=False,
            num_workers=2,
            pin_memory=True
        )

        model = resnet50_scratch(num_classes).to(DEVICE)

        criterion = nn.CrossEntropyLoss(label_smoothing=LABEL_SMOOTH)
        optimizer = optim.Adam(
            model.parameters(),
            lr=BASE_LR,
            weight_decay=WEIGHT_DECAY
        )
        scheduler = optim.lr_scheduler.CosineAnnealingLR(
            optimizer,
            T_max=NUM_EPOCHS
        )

        best_val_acc = 0.0
        best_path = f"checkpoints/best_fold{fold}.pth"

        for epoch in range(1, NUM_EPOCHS + 1):
            train_loss, train_acc = train_one_epoch(
                model, train_loader, criterion, optimizer, epoch, fold
            )
            val_loss, val_acc = evaluate(
                model, val_loader, criterion, epoch, fold, phase="val"
            )

            scheduler.step()

            print(
                f"[Fold {fold}][Epoch {epoch}/{NUM_EPOCHS}] "
                f"train_loss={train_loss:.4f} train_acc={train_acc:.4f} | "
                f"val_loss={val_loss:.4f} val_acc={val_acc:.4f}"
            )

            if val_acc > best_val_acc:
                best_val_acc = val_acc
                torch.save(model.state_dict(), best_path)
                print(f"Best Model: val_acc={best_val_acc:.4f}")

        all_fold_best_acc.append(best_val_acc)
        print(f"Fold {fold} best val_acc: {best_val_acc:.4f}")

    print("\n######### K-Fold Result #########")
    for i, acc in enumerate(all_fold_best_acc):
        print(f"Fold {i}: {acc:.4f}")
    print(f"Mean val_acc: {np.mean(all_fold_best_acc):.4f}")

### Test Ensemble with Augmentation

In [21]:
# Test Ensemble + TTA
class TestDataset(Dataset):
    def __init__(self, df, img_dir, image_col, transform=None):
        self.df = df.reset_index(drop=True)
        self.img_dir = img_dir
        self.image_col = image_col
        self.transform = transform

        self.candidate_exts = [".jpg", ".png", ".jpeg", ".JPG", ".PNG", ".JPEG"]

    def _get_img_path(self, img_id: str):
        for ext in self.candidate_exts:
            path = os.path.join(self.img_dir, img_id + ext)
            if os.path.exists(path):
                return path
        return os.path.join(self.img_dir, img_id)

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]
        img_id = str(row[self.image_col])
        img_path = self._get_img_path(img_id)

        image = Image.open(img_path).convert("RGB")

        if self.transform is not None:
            image = self.transform(image)

        return image, img_id

In [22]:
@torch.no_grad()
def inference_ensemble():
    test_df = pd.read_csv(TEST_CSV_PATH)
    test_dataset = TestDataset(
        test_df, TEST_IMG_DIR,
        image_col=IMAGE_COL,
        transform=eval_tf
    )
    test_loader = DataLoader(
        test_dataset,
        batch_size=BATCH_SIZE,
        shuffle=False,
        num_workers=2,
        pin_memory=True
    )

    # fold
    models = []
    for fold in range(N_SPLITS):
        model = resnet50_scratch(num_classes).to(DEVICE)
        path = f"checkpoints/best_fold{fold}.pth"
        state = torch.load(path, map_location=DEVICE)
        model.load_state_dict(state)
        model.eval()
        models.append(model)

    all_img_names = []
    all_preds = []

    hflip = transforms.RandomHorizontalFlip(p=1.0)

    for images, img_names in test_loader:
        images = images.to(DEVICE)

        # Sum all Logit + TTA
        logits_sum = None
        for model in models:
            out_orig = model(images)

            images_flip = hflip(images.cpu()).to(DEVICE)
            out_flip = model(images_flip)

            outputs = (out_orig + out_flip) / 2.0

            if logits_sum is None:
                logits_sum = outputs
            else:
                logits_sum += outputs

        probs = torch.softmax(logits_sum, dim=1)
        _, preds = probs.max(1)

        all_img_names.extend(list(img_names))
        all_preds.extend(preds.cpu().numpy().tolist())

    # label_idx
    pred_labels = [idx2label[idx] for idx in all_preds]

    # submission
    sub = pd.DataFrame({
        IMAGE_COL: all_img_names,
        LABEL_COL: pred_labels
    })
    sub.to_csv("submission.csv", index=False)
    print("Saved submission.csv")

### Main

In [None]:
if __name__ == "__main__":
    # K-Fold
    run_kfold_training()

    # Test Ensemble
    inference_ensemble()


######### Fold 0 #########




[Fold 0][Epoch 1/100] train_loss=3.6066 train_acc=0.1406 | val_loss=3.2334 val_acc=0.2101
Best Model: val_acc=0.2101
[Fold 0][Epoch 2/100] train_loss=3.3434 train_acc=0.1667 | val_loss=3.1416 val_acc=0.2161
Best Model: val_acc=0.2161
[Fold 0][Epoch 3/100] train_loss=3.2852 train_acc=0.1853 | val_loss=3.0838 val_acc=0.2191
Best Model: val_acc=0.2191
[Fold 0][Epoch 4/100] train_loss=3.1880 train_acc=0.1978 | val_loss=3.1893 val_acc=0.1995
[Fold 0][Epoch 5/100] train_loss=3.1124 train_acc=0.2152 | val_loss=3.0149 val_acc=0.2395
Best Model: val_acc=0.2395
[Fold 0][Epoch 6/100] train_loss=3.0758 train_acc=0.2184 | val_loss=2.9225 val_acc=0.2500
Best Model: val_acc=0.2500
[Fold 0][Epoch 7/100] train_loss=3.0288 train_acc=0.2257 | val_loss=2.9077 val_acc=0.2545
Best Model: val_acc=0.2545
[Fold 0][Epoch 8/100] train_loss=2.9561 train_acc=0.2430 | val_loss=2.8634 val_acc=0.2598
Best Model: val_acc=0.2598
[Fold 0][Epoch 9/100] train_loss=2.9389 train_acc=0.2448 | val_loss=2.9692 val_acc=0.2500
[