# CLASSIFYING DOG v. CAT

Was able to achieve an accuracy of 99% with small tweaks as well as letting the model train for longer.

This was an expected result considering there were plenty of labeled training images for just a binary classification. This will prove more difficult when trying to classify 37 breeds rather than just 2 animals.

```
Evaluating: 100%|██████████████████████████████████| 115/115 [01:59<00:00,  1.04s/batch, acc=0.9899]
[Binary] Test Accuracy: 0.9899
```

In [10]:
# CELL ONE
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from torchvision.datasets import OxfordIIITPet
from torch.utils.data import DataLoader
from torchvision.models import resnet18, ResNet18_Weights
from tqdm import tqdm

In [11]:
# CELL TWO
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Image preprocessing
input_size = 224
transform = transforms.Compose([
    # Resize shortest side to target length, maintain aspect ratio
    transforms.Resize(input_size),
    # Crop center square of size (input_size, input_size)
    transforms.CenterCrop(input_size),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

# Dataset root
data_root = 'data'

# Load only binary (cat vs. dog) labels
train_dataset = OxfordIIITPet(
    root=data_root,
    split='trainval',
    target_types='binary-category',
    transform=transform,
    download=True
)
test_dataset = OxfordIIITPet(
    root=data_root,
    split='test',
    target_types='binary-category',
    transform=transform,
    download=True
)

# Data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=4)

In [12]:
# CELL THREE
# Build ResNet18 model for binary classification
weights = ResNet18_Weights.DEFAULT
model = resnet18(weights=weights)
num_feats = model.fc.in_features
model.fc = nn.Linear(num_feats, 2)

# freeze all except the new fc
for param in model.parameters():
    param.requires_grad = False
for param in model.fc.parameters():
    param.requires_grad = True

model = model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(
    model.fc.parameters(), lr=0.01, momentum=0.9,
    weight_decay=1e-4, nesterov=True
)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

In [15]:
# CELL FOUR
def train(model, loader, criterion, optimizer, scheduler=None, epochs=5):
    model.train()
    for epoch in range(epochs):
        total_loss = 0.0
        total_correct = 0
        loop = tqdm(loader, 
                    desc=f"Epoch [{epoch+1}/{epochs}]", 
                    ncols=100, 
                    unit="batch")
        
        for batch_idx, (images, labels) in enumerate(loop, start=1):
            images, labels = images.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            preds = outputs.argmax(dim=1)
            total_loss   += loss.item() * images.size(0)
            total_correct+= (preds == labels).sum().item()

            # compute running averages
            seen = batch_idx * loader.batch_size
            running_loss = total_loss / seen
            running_acc  = total_correct / seen

            loop.set_postfix(
                loss=f"{running_loss:.4f}",
                acc=f"{running_acc:.4f}"
            )

        if scheduler:
            scheduler.step()
            
        avg_loss = total_loss / len(loader.dataset)
        avg_acc  = total_correct / len(loader.dataset)
        tqdm.write(
            f"[Binary] Epoch {epoch+1}/{epochs}   "
            f"Loss: {avg_loss:.4f}   Acc: {avg_acc:.4f}"
        )

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    loop = tqdm(loader, desc="Evaluating", ncols=100, unit="batch")
    with torch.no_grad():
        for batch_idx, (images, labels) in enumerate(loop, start=1):
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            preds = outputs.argmax(dim=1)
            correct += (preds == labels).sum().item()
            total   += labels.size(0)
            loop.set_postfix(acc=f"{correct/total:.4f}")
    print(f"[Binary] Test Accuracy: {correct/total:.4f}")

train(model, train_loader, criterion, optimizer, scheduler, epochs=2)
evaluate(model, test_loader)

Epoch [1/2]: 100%|████████████████████| 115/115 [02:03<00:00,  1.07s/batch, acc=0.9878, loss=0.0329]


[Binary] Epoch 1/2   Loss: 0.0329   Acc: 0.9878


Epoch [2/2]: 100%|████████████████████| 115/115 [02:02<00:00,  1.07s/batch, acc=0.9916, loss=0.0248]


[Binary] Epoch 2/2   Loss: 0.0248   Acc: 0.9916


Evaluating: 100%|██████████████████████████████████| 115/115 [01:59<00:00,  1.04s/batch, acc=0.9899]

[Binary] Test Accuracy: 0.9899





# CLASSIFYING BREEDS

### Strategy 1: Fine-tune $l$ layers simultaneously

This was not a very efficient strategy, as it took a long time to train. Strategy 1 compared to Strategy 2 performed better by 2-3%. This makes sense as the increased number of classes means that features are more sensitive. You can see in the results below that the deeper the fine-tuning, the better the model performed. Likewise, if we trained an entire model from scratch it would likely perform even better. One point of transfer learning is to save resources however, and achieving around 93% accuracy without much effort is remarkable.

```
=== Summary of validation accuracies ===
l=1: 0.9158
l=2: 0.9198
l=3: 0.9198
l=4: 0.9144
l=5: 0.9266
l=6: 0.9171
l=7: 0.9293
l=8: 0.9117
```

Note also that there is a peak around 5-7 layers. It is likely that more epochs per layer would allow best performance. This model already took hours to train however.

```
Eval: 100%|████████████████████████████████████████| 115/115 [01:59<00:00,  1.04s/batch, acc=0.8967]
Eval Summary — Acc: 0.8967
Final test accuracy: 0.8967
```

Final accuracy is not satisfactory however, and we can do better. This shows slight overfitting to the train set which is to be expected. Maybe we can do better with different strategies...

In [16]:
# CELL ONE: imports, device, transforms, datasets + train/val/test split
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from torchvision.datasets import OxfordIIITPet
from torch.utils.data import DataLoader, random_split
from torchvision.models import resnet18, ResNet18_Weights
from tqdm import tqdm

# device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# image preprocessing
input_size = 224
transform = transforms.Compose([
    transforms.Resize(input_size),
    transforms.CenterCrop(input_size),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225])
])

# dataset root
data_root = 'data'

# full trainval with 37 classes
full_trainval = OxfordIIITPet(
    root=data_root,
    split='trainval',
    target_types='category',      # multiclass
    transform=transform,
    download=True
)
# hold-out test
test_dataset = OxfordIIITPet(
    root=data_root,
    split='test',
    target_types='category',
    transform=transform,
    download=True
)

# split train/val (80/20)
val_pct = 0.2
val_size = int(len(full_trainval) * val_pct)
train_size = len(full_trainval) - val_size
train_dataset, val_dataset = random_split(full_trainval, [train_size, val_size])

# data loaders
batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
val_loader   = DataLoader(val_dataset,   batch_size=batch_size, shuffle=False, num_workers=4)
test_loader  = DataLoader(test_dataset,  batch_size=batch_size, shuffle=False, num_workers=4)

In [17]:
# CELL TWO: helper to build model & set fine-tunable layers
def build_model(num_classes=37):
    model = resnet18(weights=ResNet18_Weights.DEFAULT)
    num_feats = model.fc.in_features
    model.fc = nn.Linear(num_feats, num_classes)
    return model

def set_fine_tune_layers(model, l):
    # freeze all
    for p in model.parameters():
        p.requires_grad = False
    # blocks in order of deepest to shallow + fc
    blocks = ['layer4', 'layer3', 'layer2', 'layer1', 'fc']
    # unfreeze the first l of them
    to_unfreeze = blocks[:l]
    for name, module in model.named_children():
        if name in to_unfreeze:
            for p in module.parameters():
                p.requires_grad = True

In [20]:
# CELL THREE: train & evaluate
criterion = nn.CrossEntropyLoss()

def train_one_epoch(model, loader, optimizer, epoch, epochs):
    model.train()
    total_loss, total_correct = 0.0, 0
    processed = 0
    loop = tqdm(loader,
                desc=f"Epoch [{epoch}/{epochs}] Train",
                ncols=100,
                unit="batch",
                leave=True)
    for images, labels in loop:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        batch_size = images.size(0)
        processed    += batch_size
        total_loss   += loss.item() * batch_size
        total_correct+= (outputs.argmax(1) == labels).sum().item()

        loop.set_postfix(
            loss=f"{total_loss/processed:.4f}",
            acc =f"{total_correct/processed:.4f}"
        )

    avg_loss = total_loss / len(loader.dataset)
    avg_acc  = total_correct / len(loader.dataset)
    tqdm.write(f"Train Summary — Loss: {avg_loss:.4f}, Acc: {avg_acc:.4f}")
    return avg_loss, avg_acc

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    loop = tqdm(loader,
                desc="Eval",
                ncols=100,
                unit="batch",
                leave=True)
    with torch.no_grad():
        for images, labels in loop:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            correct += (outputs.argmax(1) == labels).sum().item()
            total   += labels.size(0)
            loop.set_postfix(acc=f"{correct/total:.4f}")

    acc = correct / total
    tqdm.write(f"Eval Summary — Acc: {acc:.4f}")
    return acc

In [21]:
# CELL FOUR.1: sweep over l = 1 to 5 blocks
results = {}
for l in range(1, 6):
    print(f"\n### Fine-tuning last {l} block(s) + fc ###")
    model = build_model().to(device)
    set_fine_tune_layers(model, l)
    params = filter(lambda p: p.requires_grad, model.parameters())
    optimizer = optim.SGD(params, lr=0.01, momentum=0.9,
                          weight_decay=1e-4, nesterov=True)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

    epochs = 8
    for epoch in range(1, epochs+1):
        train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, epoch, epochs)
        val_acc = evaluate(model, val_loader)
        print(f"Epoch {epoch}/{epochs}  Val Acc: {val_acc:.4f}")
        scheduler.step()

    final_val = evaluate(model, val_loader)
    results[f"l={l}"] = final_val

print("\n=== Summary of validation accuracies ===")
for k, v in results.items():
    print(f"{k}: {v:.4f}")


### Fine-tuning last 1 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=0.6043, loss=1.8052]


Train Summary — Loss: 1.8052, Acc: 0.6043


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.8764]


Eval Summary — Acc: 0.8764
Epoch 1/8  Val Acc: 0.8764


Epoch [2/8] Train: 100%|████████████████| 92/92 [01:55<00:00,  1.25s/batch, acc=0.9256, loss=0.5645]


Train Summary — Loss: 0.5645, Acc: 0.9256


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9022]


Eval Summary — Acc: 0.9022
Epoch 2/8  Val Acc: 0.9022


Epoch [3/8] Train: 100%|████████████████| 92/92 [01:55<00:00,  1.25s/batch, acc=0.9708, loss=0.3036]


Train Summary — Loss: 0.3036, Acc: 0.9708


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9008]


Eval Summary — Acc: 0.9008
Epoch 3/8  Val Acc: 0.9008


Epoch [4/8] Train: 100%|████████████████| 92/92 [01:55<00:00,  1.26s/batch, acc=0.9925, loss=0.1810]


Train Summary — Loss: 0.1810, Acc: 0.9925


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.92s/batch, acc=0.9049]


Eval Summary — Acc: 0.9049
Epoch 4/8  Val Acc: 0.9049


Epoch [5/8] Train: 100%|████████████████| 92/92 [01:55<00:00,  1.26s/batch, acc=0.9997, loss=0.1090]


Train Summary — Loss: 0.1090, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
Epoch 5/8  Val Acc: 0.9144


Epoch [6/8] Train: 100%|████████████████| 92/92 [01:55<00:00,  1.25s/batch, acc=1.0000, loss=0.0741]


Train Summary — Loss: 0.0741, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 6/8  Val Acc: 0.9158


Epoch [7/8] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=1.0000, loss=0.0704]


Train Summary — Loss: 0.0704, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198
Epoch 7/8  Val Acc: 0.9198


Epoch [8/8] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=1.0000, loss=0.0697]


Train Summary — Loss: 0.0697, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 8/8  Val Acc: 0.9158


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158

### Fine-tuning last 2 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [02:08<00:00,  1.40s/batch, acc=0.6226, loss=1.7417]


Train Summary — Loss: 1.7417, Acc: 0.6226


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8668]


Eval Summary — Acc: 0.8668
Epoch 1/8  Val Acc: 0.8668


Epoch [2/8] Train: 100%|████████████████| 92/92 [02:06<00:00,  1.37s/batch, acc=0.9395, loss=0.4703]


Train Summary — Loss: 0.4703, Acc: 0.9395


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8940]


Eval Summary — Acc: 0.8940
Epoch 2/8  Val Acc: 0.8940


Epoch [3/8] Train: 100%|████████████████| 92/92 [02:06<00:00,  1.37s/batch, acc=0.9844, loss=0.2084]


Train Summary — Loss: 0.2084, Acc: 0.9844


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.8954]


Eval Summary — Acc: 0.8954
Epoch 3/8  Val Acc: 0.8954


Epoch [4/8] Train: 100%|████████████████| 92/92 [02:08<00:00,  1.40s/batch, acc=0.9980, loss=0.1048]


Train Summary — Loss: 0.1048, Acc: 0.9980


Eval: 100%|██████████████████████████████████████████| 23/23 [00:45<00:00,  1.96s/batch, acc=0.9117]


Eval Summary — Acc: 0.9117
Epoch 4/8  Val Acc: 0.9117


Epoch [5/8] Train: 100%|████████████████| 92/92 [02:11<00:00,  1.43s/batch, acc=0.9997, loss=0.0645]


Train Summary — Loss: 0.0645, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 5/8  Val Acc: 0.9158


Epoch [6/8] Train: 100%|████████████████| 92/92 [02:07<00:00,  1.38s/batch, acc=1.0000, loss=0.0455]


Train Summary — Loss: 0.0455, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9226]


Eval Summary — Acc: 0.9226
Epoch 6/8  Val Acc: 0.9226


Epoch [7/8] Train: 100%|████████████████| 92/92 [02:10<00:00,  1.41s/batch, acc=1.0000, loss=0.0435]


Train Summary — Loss: 0.0435, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9171]


Eval Summary — Acc: 0.9171
Epoch 7/8  Val Acc: 0.9171


Epoch [8/8] Train: 100%|████████████████| 92/92 [02:07<00:00,  1.38s/batch, acc=1.0000, loss=0.0401]


Train Summary — Loss: 0.0401, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198
Epoch 8/8  Val Acc: 0.9198


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198

### Fine-tuning last 3 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [02:31<00:00,  1.65s/batch, acc=0.6372, loss=1.6653]


Train Summary — Loss: 1.6653, Acc: 0.6372


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.8614]


Eval Summary — Acc: 0.8614
Epoch 1/8  Val Acc: 0.8614


Epoch [2/8] Train: 100%|████████████████| 92/92 [02:31<00:00,  1.65s/batch, acc=0.9440, loss=0.4349]


Train Summary — Loss: 0.4349, Acc: 0.9440


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8818]


Eval Summary — Acc: 0.8818
Epoch 2/8  Val Acc: 0.8818


Epoch [3/8] Train: 100%|████████████████| 92/92 [02:30<00:00,  1.63s/batch, acc=0.9939, loss=0.1770]


Train Summary — Loss: 0.1770, Acc: 0.9939


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.8940]


Eval Summary — Acc: 0.8940
Epoch 3/8  Val Acc: 0.8940


Epoch [4/8] Train: 100%|████████████████| 92/92 [02:28<00:00,  1.61s/batch, acc=1.0000, loss=0.0877]


Train Summary — Loss: 0.0877, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 4/8  Val Acc: 0.9158


Epoch [5/8] Train: 100%|████████████████| 92/92 [02:28<00:00,  1.61s/batch, acc=1.0000, loss=0.0530]


Train Summary — Loss: 0.0530, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9130]


Eval Summary — Acc: 0.9130
Epoch 5/8  Val Acc: 0.9130


Epoch [6/8] Train: 100%|████████████████| 92/92 [02:28<00:00,  1.61s/batch, acc=1.0000, loss=0.0400]


Train Summary — Loss: 0.0400, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198
Epoch 6/8  Val Acc: 0.9198


Epoch [7/8] Train: 100%|████████████████| 92/92 [02:27<00:00,  1.60s/batch, acc=1.0000, loss=0.0405]


Train Summary — Loss: 0.0405, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9171]


Eval Summary — Acc: 0.9171
Epoch 7/8  Val Acc: 0.9171


Epoch [8/8] Train: 100%|████████████████| 92/92 [02:30<00:00,  1.64s/batch, acc=1.0000, loss=0.0380]


Train Summary — Loss: 0.0380, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198
Epoch 8/8  Val Acc: 0.9198


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198

### Fine-tuning last 4 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [03:06<00:00,  2.03s/batch, acc=0.6240, loss=1.7172]


Train Summary — Loss: 1.7172, Acc: 0.6240


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8628]


Eval Summary — Acc: 0.8628
Epoch 1/8  Val Acc: 0.8628


Epoch [2/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9375, loss=0.4557]


Train Summary — Loss: 0.4557, Acc: 0.9375


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.8804]


Eval Summary — Acc: 0.8804
Epoch 2/8  Val Acc: 0.8804


Epoch [3/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9918, loss=0.1784]


Train Summary — Loss: 0.1784, Acc: 0.9918


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9022]


Eval Summary — Acc: 0.9022
Epoch 3/8  Val Acc: 0.9022


Epoch [4/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=0.9990, loss=0.0891]


Train Summary — Loss: 0.0891, Acc: 0.9990


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
Epoch 4/8  Val Acc: 0.9144


Epoch [5/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=1.0000, loss=0.0562]


Train Summary — Loss: 0.0562, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9117]


Eval Summary — Acc: 0.9117
Epoch 5/8  Val Acc: 0.9117


Epoch [6/8] Train: 100%|████████████████| 92/92 [03:05<00:00,  2.02s/batch, acc=1.0000, loss=0.0403]


Train Summary — Loss: 0.0403, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 6/8  Val Acc: 0.9158


Epoch [7/8] Train: 100%|████████████████| 92/92 [03:03<00:00,  2.00s/batch, acc=1.0000, loss=0.0372]


Train Summary — Loss: 0.0372, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9185]


Eval Summary — Acc: 0.9185
Epoch 7/8  Val Acc: 0.9185


Epoch [8/8] Train: 100%|████████████████| 92/92 [03:06<00:00,  2.02s/batch, acc=1.0000, loss=0.0389]


Train Summary — Loss: 0.0389, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.92s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
Epoch 8/8  Val Acc: 0.9144


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144

### Fine-tuning last 5 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [03:08<00:00,  2.05s/batch, acc=0.7147, loss=1.0246]


Train Summary — Loss: 1.0246, Acc: 0.7147


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.7500]


Eval Summary — Acc: 0.7500
Epoch 1/8  Val Acc: 0.7500


Epoch [2/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=0.9511, loss=0.1885]


Train Summary — Loss: 0.1885, Acc: 0.9511


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8614]


Eval Summary — Acc: 0.8614
Epoch 2/8  Val Acc: 0.8614


Epoch [3/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9908, loss=0.0502]


Train Summary — Loss: 0.0502, Acc: 0.9908


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9022]


Eval Summary — Acc: 0.9022
Epoch 3/8  Val Acc: 0.9022


Epoch [4/8] Train: 100%|████████████████| 92/92 [03:03<00:00,  2.00s/batch, acc=0.9973, loss=0.0183]


Train Summary — Loss: 0.0183, Acc: 0.9973


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9253]


Eval Summary — Acc: 0.9253
Epoch 4/8  Val Acc: 0.9253


Epoch [5/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=0.9986, loss=0.0117]


Train Summary — Loss: 0.0117, Acc: 0.9986


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9253]


Eval Summary — Acc: 0.9253
Epoch 5/8  Val Acc: 0.9253


Epoch [6/8] Train: 100%|████████████████| 92/92 [03:09<00:00,  2.06s/batch, acc=1.0000, loss=0.0071]


Train Summary — Loss: 0.0071, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9266]


Eval Summary — Acc: 0.9266
Epoch 6/8  Val Acc: 0.9266


Epoch [7/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=1.0000, loss=0.0055]


Train Summary — Loss: 0.0055, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9266]


Eval Summary — Acc: 0.9266
Epoch 7/8  Val Acc: 0.9266


Epoch [8/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=1.0000, loss=0.0055]


Train Summary — Loss: 0.0055, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9266]


Eval Summary — Acc: 0.9266
Epoch 8/8  Val Acc: 0.9266


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9266]

Eval Summary — Acc: 0.9266

=== Summary of validation accuracies ===
l=1: 0.9158
l=2: 0.9198
l=3: 0.9198
l=4: 0.9144
l=5: 0.9266





In [22]:
# CELL FOUR.2: sweep over l = 6 to 9 blocks
results2 = {}
for l in range(6, 9):
    print(f"\n### Fine-tuning last {l} block(s) + fc ###")
    model = build_model().to(device)
    set_fine_tune_layers(model, l)
    params = filter(lambda p: p.requires_grad, model.parameters())
    optimizer = optim.SGD(params, lr=0.01, momentum=0.9,
                          weight_decay=1e-4, nesterov=True)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

    epochs = 8
    for epoch in range(1, epochs+1):
        train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, epoch, epochs)
        val_acc = evaluate(model, val_loader)
        print(f"Epoch {epoch}/{epochs}  Val Acc: {val_acc:.4f}")
        scheduler.step()

    final_val = evaluate(model, val_loader)
    results2[f"l={l}"] = final_val

print("\n=== Summary of validation accuracies ===")
for k, v in results2.items():
    print(f"{k}: {v:.4f}")


### Fine-tuning last 6 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [03:05<00:00,  2.01s/batch, acc=0.7167, loss=1.0214]


Train Summary — Loss: 1.0214, Acc: 0.7167


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8030]


Eval Summary — Acc: 0.8030
Epoch 1/8  Val Acc: 0.8030


Epoch [2/8] Train: 100%|████████████████| 92/92 [03:06<00:00,  2.03s/batch, acc=0.9535, loss=0.1772]


Train Summary — Loss: 0.1772, Acc: 0.9535


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.94s/batch, acc=0.8573]


Eval Summary — Acc: 0.8573
Epoch 2/8  Val Acc: 0.8573


Epoch [3/8] Train: 100%|████████████████| 92/92 [03:54<00:00,  2.55s/batch, acc=0.9885, loss=0.0584]


Train Summary — Loss: 0.0584, Acc: 0.9885


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.94s/batch, acc=0.8804]


Eval Summary — Acc: 0.8804
Epoch 3/8  Val Acc: 0.8804


Epoch [4/8] Train: 100%|████████████████| 92/92 [03:21<00:00,  2.19s/batch, acc=0.9969, loss=0.0197]


Train Summary — Loss: 0.0197, Acc: 0.9969


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 4/8  Val Acc: 0.9158


Epoch [5/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=1.0000, loss=0.0081]


Train Summary — Loss: 0.0081, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.9117]


Eval Summary — Acc: 0.9117
Epoch 5/8  Val Acc: 0.9117


Epoch [6/8] Train: 100%|████████████████| 92/92 [03:07<00:00,  2.03s/batch, acc=1.0000, loss=0.0052]


Train Summary — Loss: 0.0052, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9090]


Eval Summary — Acc: 0.9090
Epoch 6/8  Val Acc: 0.9090


Epoch [7/8] Train: 100%|████████████████| 92/92 [03:05<00:00,  2.01s/batch, acc=0.9997, loss=0.0054]


Train Summary — Loss: 0.0054, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
Epoch 7/8  Val Acc: 0.9144


Epoch [8/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9997, loss=0.0051]


Train Summary — Loss: 0.0051, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9171]


Eval Summary — Acc: 0.9171
Epoch 8/8  Val Acc: 0.9171


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9171]


Eval Summary — Acc: 0.9171

### Fine-tuning last 7 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=0.7099, loss=1.0519]


Train Summary — Loss: 1.0519, Acc: 0.7099


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8098]


Eval Summary — Acc: 0.8098
Epoch 1/8  Val Acc: 0.8098


Epoch [2/8] Train: 100%|████████████████| 92/92 [03:07<00:00,  2.04s/batch, acc=0.9514, loss=0.1759]


Train Summary — Loss: 0.1759, Acc: 0.9514


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.8804]


Eval Summary — Acc: 0.8804
Epoch 2/8  Val Acc: 0.8804


Epoch [3/8] Train: 100%|████████████████| 92/92 [03:46<00:00,  2.47s/batch, acc=0.9915, loss=0.0504]


Train Summary — Loss: 0.0504, Acc: 0.9915


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.92s/batch, acc=0.9090]


Eval Summary — Acc: 0.9090
Epoch 3/8  Val Acc: 0.9090


Epoch [4/8] Train: 100%|████████████████| 92/92 [03:13<00:00,  2.10s/batch, acc=0.9973, loss=0.0184]


Train Summary — Loss: 0.0184, Acc: 0.9973


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158
Epoch 4/8  Val Acc: 0.9158


Epoch [5/8] Train: 100%|████████████████| 92/92 [03:19<00:00,  2.17s/batch, acc=0.9993, loss=0.0091]


Train Summary — Loss: 0.0091, Acc: 0.9993


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9239]


Eval Summary — Acc: 0.9239
Epoch 5/8  Val Acc: 0.9239


Epoch [6/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=1.0000, loss=0.0055]


Train Summary — Loss: 0.0055, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9198]


Eval Summary — Acc: 0.9198
Epoch 6/8  Val Acc: 0.9198


Epoch [7/8] Train: 100%|████████████████| 92/92 [03:05<00:00,  2.01s/batch, acc=1.0000, loss=0.0060]


Train Summary — Loss: 0.0060, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9226]


Eval Summary — Acc: 0.9226
Epoch 7/8  Val Acc: 0.9226


Epoch [8/8] Train: 100%|████████████████| 92/92 [03:05<00:00,  2.01s/batch, acc=1.0000, loss=0.0050]


Train Summary — Loss: 0.0050, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9293]


Eval Summary — Acc: 0.9293
Epoch 8/8  Val Acc: 0.9293


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9293]


Eval Summary — Acc: 0.9293

### Fine-tuning last 8 block(s) + fc ###


Epoch [1/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.7157, loss=1.0182]


Train Summary — Loss: 1.0182, Acc: 0.7157


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.92s/batch, acc=0.7758]


Eval Summary — Acc: 0.7758
Epoch 1/8  Val Acc: 0.7758


Epoch [2/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9327, loss=0.2233]


Train Summary — Loss: 0.2233, Acc: 0.9327


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.8139]


Eval Summary — Acc: 0.8139
Epoch 2/8  Val Acc: 0.8139


Epoch [3/8] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9823, loss=0.0726]


Train Summary — Loss: 0.0726, Acc: 0.9823


Eval: 100%|██████████████████████████████████████████| 23/23 [00:42<00:00,  1.87s/batch, acc=0.8967]


Eval Summary — Acc: 0.8967
Epoch 3/8  Val Acc: 0.8967


Epoch [4/8] Train: 100%|████████████████| 92/92 [03:19<00:00,  2.17s/batch, acc=0.9986, loss=0.0176]


Train Summary — Loss: 0.0176, Acc: 0.9986


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.9022]


Eval Summary — Acc: 0.9022
Epoch 4/8  Val Acc: 0.9022


Epoch [5/8] Train: 100%|████████████████| 92/92 [03:30<00:00,  2.29s/batch, acc=0.9997, loss=0.0086]


Train Summary — Loss: 0.0086, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9090]


Eval Summary — Acc: 0.9090
Epoch 5/8  Val Acc: 0.9090


Epoch [6/8] Train: 100%|████████████████| 92/92 [03:31<00:00,  2.30s/batch, acc=1.0000, loss=0.0059]


Train Summary — Loss: 0.0059, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
Epoch 6/8  Val Acc: 0.9144


Epoch [7/8] Train: 100%|████████████████| 92/92 [03:28<00:00,  2.26s/batch, acc=1.0000, loss=0.0052]


Train Summary — Loss: 0.0052, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9103]


Eval Summary — Acc: 0.9103
Epoch 7/8  Val Acc: 0.9103


Epoch [8/8] Train: 100%|████████████████| 92/92 [03:26<00:00,  2.25s/batch, acc=1.0000, loss=0.0042]


Train Summary — Loss: 0.0042, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9117]


Eval Summary — Acc: 0.9117
Epoch 8/8  Val Acc: 0.9117


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9117]

Eval Summary — Acc: 0.9117

=== Summary of validation accuracies ===
l=6: 0.9171
l=7: 0.9293
l=8: 0.9117





In [24]:
# CELL FIVE: pick best l, retrain on full train/val, then test
best_l1 = max(results, key=lambda k: results[k]) 
best_l2 = max(results2, key=lambda k: results2[k]) 
best_l = max(best_l1, best_l2)
l_star = int(best_l.split('=')[1])
print(f"Retraining with l* = {l_star}")

# rebuild & freeze/unfreeze
model = build_model().to(device)
set_fine_tune_layers(model, l_star)

# new optimizer + scheduler
params    = filter(lambda p: p.requires_grad, model.parameters())
optimizer = optim.SGD(params, lr=0.01, momentum=0.9,
                      weight_decay=1e-4, nesterov=True)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

# full trainval loader
full_loader = DataLoader(full_trainval, batch_size=batch_size,
                         shuffle=True, num_workers=4)

# train for N epochs with proper args to train_one_epoch
epochs = 8
for epoch in range(1, epochs+1):
    train_loss, train_acc = train_one_epoch(
        model, full_loader, optimizer,
        epoch, epochs
    )
    scheduler.step()

# final evaluation on test set
test_acc = evaluate(model, test_loader)
print(f"Final test accuracy: {test_acc:.4f}")

Retraining with l* = 7


Epoch [1/8] Train: 100%|██████████████| 115/115 [03:46<00:00,  1.97s/batch, acc=0.7372, loss=0.9340]


Train Summary — Loss: 0.9340, Acc: 0.7372


Epoch [2/8] Train: 100%|██████████████| 115/115 [03:46<00:00,  1.97s/batch, acc=0.9473, loss=0.1897]


Train Summary — Loss: 0.1897, Acc: 0.9473


Epoch [3/8] Train: 100%|██████████████| 115/115 [03:48<00:00,  1.99s/batch, acc=0.9837, loss=0.0681]


Train Summary — Loss: 0.0681, Acc: 0.9837


Epoch [4/8] Train: 100%|██████████████| 115/115 [03:50<00:00,  2.00s/batch, acc=0.9943, loss=0.0309]


Train Summary — Loss: 0.0309, Acc: 0.9943


Epoch [5/8] Train: 100%|██████████████| 115/115 [03:51<00:00,  2.01s/batch, acc=0.9997, loss=0.0105]


Train Summary — Loss: 0.0105, Acc: 0.9997


Epoch [6/8] Train: 100%|██████████████| 115/115 [03:46<00:00,  1.97s/batch, acc=0.9992, loss=0.0076]


Train Summary — Loss: 0.0076, Acc: 0.9992


Epoch [7/8] Train: 100%|██████████████| 115/115 [03:47<00:00,  1.98s/batch, acc=1.0000, loss=0.0047]


Train Summary — Loss: 0.0047, Acc: 1.0000


Epoch [8/8] Train: 100%|██████████████| 115/115 [03:43<00:00,  1.94s/batch, acc=1.0000, loss=0.0041]


Train Summary — Loss: 0.0041, Acc: 1.0000


Eval: 100%|████████████████████████████████████████| 115/115 [01:59<00:00,  1.04s/batch, acc=0.8967]

Eval Summary — Acc: 0.8967
Final test accuracy: 0.8967





### Strategy 2: Gradual unfreezing

Gradual unfreezing is similar to Strategy 1 in the way that you are performing deeper learning. This works to save your progress when training the separate layers. This theoretically should retain well trained features when classifying, allowing other layers to train separately. 

```
Eval: 100%|████████████████████████████████████████| 115/115 [01:57<00:00,  1.02s/batch, acc=0.8910]
Eval Summary — Acc: 0.8910
```

The model ends up performing on par with Strategy 1. This is because of its similarity to Strategy 1, and likely because there is not much difference when using a small model like ResNet18 (a model where not much detail can be captured, we will switch to a bigger model later).

In [25]:
# CELL ONE: Gradual unfreezing
model = build_model().to(device)
criterion = nn.CrossEntropyLoss()

# phases = (how many blocks to unfreeze, epochs for this phase)
# blocks are counted as [layer4, layer3, layer2, layer1, fc]
phases = [
    (1, 5),   # fc only
    (2, 4),   # + layer4
    (3, 3),   # + layer3
    (4, 3),   # + layer2
    (5, 3),   # + layer1 (all convs + fc)
]

for l, epochs in phases:
    print(f"\n--- Phase: unfreeze last {l} block(s) + fc; train {epochs} epochs ---")
    set_fine_tune_layers(model, l)
    
    optimizer = optim.SGD(
        filter(lambda p: p.requires_grad, model.parameters()),
        lr=0.005, momentum=0.9,
        weight_decay=1e-4, nesterov=True
    )
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=max(1, epochs//2), gamma=0.1)
    
    for epoch in range(1, epochs+1):
        train_one_epoch(model, train_loader, optimizer, epoch, epochs)
        val_acc = evaluate(model, val_loader)
        scheduler.step()
    print(f"[End Phase l={l}]  Best Val Acc this phase: {val_acc:.4f}")

test_acc = evaluate(model, test_loader)
print(f"\nFinal test accuracy after gradual unfreeze: {test_acc:.4f}")


--- Phase: unfreeze last 1 block(s) + fc; train 5 epochs ---


Epoch [1/5] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.25s/batch, acc=0.5272, loss=2.2453]


Train Summary — Loss: 2.2453, Acc: 0.5272


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.91s/batch, acc=0.7962]


Eval Summary — Acc: 0.7962


Epoch [2/5] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=0.8923, loss=0.8768]


Train Summary — Loss: 0.8768, Acc: 0.8923


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8777]


Eval Summary — Acc: 0.8777


Epoch [3/5] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=0.9382, loss=0.6008]


Train Summary — Loss: 0.6008, Acc: 0.9382


Eval: 100%|██████████████████████████████████████████| 23/23 [00:44<00:00,  1.96s/batch, acc=0.8804]


Eval Summary — Acc: 0.8804


Epoch [4/5] Train: 100%|████████████████| 92/92 [01:54<00:00,  1.24s/batch, acc=0.9531, loss=0.5561]


Train Summary — Loss: 0.5561, Acc: 0.9531


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.8845]


Eval Summary — Acc: 0.8845


Epoch [5/5] Train: 100%|████████████████| 92/92 [01:53<00:00,  1.24s/batch, acc=0.9511, loss=0.5458]


Train Summary — Loss: 0.5458, Acc: 0.9511


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8872]


Eval Summary — Acc: 0.8872
[End Phase l=1]  Best Val Acc this phase: 0.8872

--- Phase: unfreeze last 2 block(s) + fc; train 4 epochs ---


Epoch [1/4] Train: 100%|████████████████| 92/92 [02:07<00:00,  1.39s/batch, acc=0.9480, loss=0.4994]


Train Summary — Loss: 0.4994, Acc: 0.9480


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.8927]


Eval Summary — Acc: 0.8927


Epoch [2/4] Train: 100%|████████████████| 92/92 [02:06<00:00,  1.37s/batch, acc=0.9813, loss=0.3075]


Train Summary — Loss: 0.3075, Acc: 0.9813


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9103]


Eval Summary — Acc: 0.9103


Epoch [3/4] Train: 100%|████████████████| 92/92 [02:05<00:00,  1.37s/batch, acc=0.9946, loss=0.2057]


Train Summary — Loss: 0.2057, Acc: 0.9946


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9049]


Eval Summary — Acc: 0.9049


Epoch [4/4] Train: 100%|████████████████| 92/92 [02:06<00:00,  1.37s/batch, acc=0.9963, loss=0.1970]


Train Summary — Loss: 0.1970, Acc: 0.9963


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9117]


Eval Summary — Acc: 0.9117
[End Phase l=2]  Best Val Acc this phase: 0.9117

--- Phase: unfreeze last 3 block(s) + fc; train 3 epochs ---


Epoch [1/3] Train: 100%|████████████████| 92/92 [02:27<00:00,  1.60s/batch, acc=0.9932, loss=0.1888]


Train Summary — Loss: 0.1888, Acc: 0.9932


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9035]


Eval Summary — Acc: 0.9035


Epoch [2/3] Train: 100%|████████████████| 92/92 [02:29<00:00,  1.62s/batch, acc=0.9993, loss=0.1302]


Train Summary — Loss: 0.1302, Acc: 0.9993


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.9158]


Eval Summary — Acc: 0.9158


Epoch [3/3] Train: 100%|████████████████| 92/92 [02:38<00:00,  1.72s/batch, acc=1.0000, loss=0.1238]


Train Summary — Loss: 0.1238, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/batch, acc=0.9144]


Eval Summary — Acc: 0.9144
[End Phase l=3]  Best Val Acc this phase: 0.9144

--- Phase: unfreeze last 4 block(s) + fc; train 3 epochs ---


Epoch [1/3] Train: 100%|████████████████| 92/92 [03:17<00:00,  2.15s/batch, acc=0.9983, loss=0.1241]


Train Summary — Loss: 0.1241, Acc: 0.9983


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9103]


Eval Summary — Acc: 0.9103


Epoch [2/3] Train: 100%|████████████████| 92/92 [03:03<00:00,  2.00s/batch, acc=1.0000, loss=0.0904]


Train Summary — Loss: 0.0904, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9239]


Eval Summary — Acc: 0.9239


Epoch [3/3] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.01s/batch, acc=0.9997, loss=0.0902]


Train Summary — Loss: 0.0902, Acc: 0.9997


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9212]


Eval Summary — Acc: 0.9212
[End Phase l=4]  Best Val Acc this phase: 0.9212

--- Phase: unfreeze last 5 block(s) + fc; train 3 epochs ---


Epoch [1/3] Train: 100%|████████████████| 92/92 [03:02<00:00,  1.99s/batch, acc=1.0000, loss=0.0604]


Train Summary — Loss: 0.0604, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/batch, acc=0.9239]


Eval Summary — Acc: 0.9239


Epoch [2/3] Train: 100%|████████████████| 92/92 [03:06<00:00,  2.02s/batch, acc=1.0000, loss=0.0294]


Train Summary — Loss: 0.0294, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/batch, acc=0.9226]


Eval Summary — Acc: 0.9226


Epoch [3/3] Train: 100%|████████████████| 92/92 [03:04<00:00,  2.00s/batch, acc=1.0000, loss=0.0279]


Train Summary — Loss: 0.0279, Acc: 1.0000


Eval: 100%|██████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/batch, acc=0.9185]


Eval Summary — Acc: 0.9185
[End Phase l=5]  Best Val Acc this phase: 0.9185


Eval: 100%|████████████████████████████████████████| 115/115 [01:57<00:00,  1.02s/batch, acc=0.8910]

Eval Summary — Acc: 0.8910

Final test accuracy after gradual unfreeze: 0.8910





### Mixing strategies + extra tweaks

Now we've seen Strategy 1 and Strategy 2, it is unlikely to get ResNet18 close to a 95% accuracy, but maybe we can improve the 89% test accuracy with a mix of multiple strategies. Here we use data augmentation as well as some tricks to adjust the learning rate so that we don't fall into false minima. We can switch data augmentation to see a difference. Through our tests it was clear that data augmentation allowed a 5% bump over no augmentation. Freezing was also implemented as it seemed to overfit less with it implemented.

Training was starting to take very long, so early stopping was also implemented to stop training when there was no more benefit. The first code cell lists the hyperparameters that were used and seemed to work best. There is a high likelihood there are better parameters. 

```
Eval: 100%|██████████████████████████████████████████| 115/115 [01:58<00:00,  1.03s/it, acc=0.882]
Eval → acc 0.8817
```

Hmmm... not the result we were looking for. It became clear that maybe ResNet18 was learning as much as it could possibly. We will switch to a larger model now and see how the model improves.

In [44]:
# CELL ONE: Imports & Hyperparameters
import torch, torch.nn as nn, torch.optim as optim
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from torchvision.datasets import OxfordIIITPet
from torchvision.models import resnet18, ResNet18_Weights
from tqdm import tqdm

# device & seed
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(42)

# Hyperparams
NUM_CLASSES    = 37
BATCH_SIZE     = 32
TRAINVAL_SPLIT = 0.8       # train/val split
PATIENCE       = 3         # early-stopping
MIN_DELTA      = 1e-4
WEIGHT_DECAY   = 1e-4      # L2
AUGMENT        = True      # toggle strong aug
FT_BN          = True      # fine-tune BatchNorm
SCHEDULER      = 'cosine'  # 'step' or 'cosine'
BASE_LR        = 1e-4      # for conv layers
HEAD_LR        = 1e-2      # for fc / newly unfrozen
STEP_SIZE      = 5         # for StepLR
GAMMA          = 0.1       # for StepLR

In [45]:
# CELL TWO: Data & Augmentations
if AUGMENT:
    train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224, scale=(0.8,1.0)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(15),
        transforms.ColorJitter(0.2,0.2,0.2,0.1),
        transforms.ToTensor(),
        transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),
    ])
else:
    train_transform = transforms.Compose([
        transforms.Resize(224),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),
    ])

val_test_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]),
])

# load full trainval and test
full = OxfordIIITPet('data', split='trainval', target_types='category',
                     transform=train_transform, download=True)
test = OxfordIIITPet('data', split='test',    target_types='category',
                     transform=val_test_transform, download=True)

# train/val split
n_train = int(len(full) * TRAINVAL_SPLIT)
n_val   = len(full) - n_train
train_ds, val_ds = random_split(full, [n_train, n_val])

train_ds.dataset.transform = train_transform
val_ds.dataset.transform   = val_test_transform

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,  num_workers=4)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False, num_workers=4)
test_loader  = DataLoader(test,     batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

In [46]:
# CELL THREE: Model Builder & BN Toggle 
def build_model(dropout_p=0.5):
    model = resnet18(weights=ResNet18_Weights.DEFAULT)
    nf = model.fc.in_features
    model.fc = nn.Sequential(
        nn.Dropout(dropout_p),
        nn.Linear(nf, NUM_CLASSES)
    )
    return model.to(device)

def set_bn_mode(model, fine_tune_bn: bool):
    # if False: freeze BN (use ImageNet stats)
    # if True: allow update of running stats & affine params
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            m.train() if fine_tune_bn else m.eval()
            for p in m.parameters():
                p.requires_grad = fine_tune_bn

In [47]:
# CELL FOUR: Layer Freezing Helpers 
def set_fine_tune_layers(model, l):
    # freeze everything
    for p in model.parameters():
        p.requires_grad = False

    # unfreeze the classifier head
    for p in model.fc.parameters():
        p.requires_grad = True

    # unfreeze the last l conv‐blocks
    conv_blocks = ['layer4','layer3','layer2','layer1']
    for block in conv_blocks[:l]:
        for p in getattr(model, block).parameters():
            p.requires_grad = True

    # BatchNorm
    set_bn_mode(model, FT_BN)

In [48]:
# CELL FIVE: Optimizer + Scheduler Factory 
from torch.optim.lr_scheduler import StepLR, CosineAnnealingLR

def make_optimizer_and_scheduler(model, epochs):
    param_groups = []
    for name, param in model.named_parameters():
        if not param.requires_grad:
            continue
        lr = HEAD_LR if 'fc' in name else BASE_LR
        param_groups.append({'params': [param], 'lr': lr, 'weight_decay': WEIGHT_DECAY})

    opt = optim.SGD(param_groups, momentum=0.9, nesterov=True)
    if SCHEDULER=='step':
        sched = StepLR(opt, step_size=STEP_SIZE, gamma=GAMMA)
    else:
        sched = CosineAnnealingLR(opt, T_max=epochs)
    return opt, sched

In [49]:
# CELL SIX: Train/Eval + EarlyStopping 
criterion = nn.CrossEntropyLoss()

class EarlyStopping:
    def __init__(self, patience=PATIENCE, min_delta=MIN_DELTA):
        self.patience, self.min_delta = patience, min_delta
        self.best, self.count      = 0, 0

    def step(self, val_acc):
        if val_acc - self.best > self.min_delta:
            self.best, self.count = val_acc, 0
            return False
        else:
            self.count += 1
            return self.count > self.patience

def train_one_epoch(model, loader, optimizer, epoch, epochs):
    model.train()
    total_loss, total_correct, seen = 0.0, 0, 0
    loop = tqdm(loader, desc=f"Ep [{epoch}/{epochs}] Train", ncols=100)
    for imgs, lbls in loop:
        imgs, lbls = imgs.to(device), lbls.to(device)
        optimizer.zero_grad()
        out = model(imgs)
        loss = criterion(out, lbls)
        loss.backward()
        optimizer.step()

        bs = imgs.size(0)
        seen += bs
        total_loss   += loss.item()*bs
        total_correct+= (out.argmax(1)==lbls).sum().item()
        loop.set_postfix(loss=total_loss/seen, acc=total_correct/seen)

    avg_l = total_loss/seen
    avg_a = total_correct/seen
    tqdm.write(f"Train → loss {avg_l:.4f}, acc {avg_a:.4f}")
    return avg_l, avg_a

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    loop = tqdm(loader, desc="  Eval", ncols=100)
    with torch.no_grad():
        for imgs, lbls in loop:
            imgs, lbls = imgs.to(device), lbls.to(device)
            out = model(imgs)
            correct += (out.argmax(1)==lbls).sum().item()
            total   += lbls.size(0)
            loop.set_postfix(acc=correct/total)

    acc = correct/total
    tqdm.write(f"Eval → acc {acc:.4f}")
    return acc

In [50]:
# CELL SEVEN: Experiments
# Gradual unfreeze (Strategy 2)
AUGMENT = False

# phases = (strategy,  blocks to unfreeze, epochs)
phases = [
    ('gradual', 1, 5),  
    ('gradual', 2, 4),
    ('gradual', 3, 3),
    ('gradual', 4, 3),
    ('gradual', 5, 3),
    ('all',     5, 5),  # full all-5 
]

results = {}
for strat, l, epochs in phases:
    print(f"\n### {strat.upper():8s} → unfreeze {l:>1} block(s), epochs={epochs}")
    model = build_model()
    set_fine_tune_layers(model, l)
    opt, sched = make_optimizer_and_scheduler(model, epochs)

    stopper = EarlyStopping()
    best_val = 0.0

    for ep in range(1, epochs+1):
        train_one_epoch(model, train_loader, opt, ep, epochs)
        val_acc = evaluate(model, val_loader)
        sched.step()
        if val_acc > best_val:
            best_val = val_acc
        if stopper.step(val_acc):
            print(f"→ Early stop at epoch {ep}")
            break

    results[f"{strat}_l{l}"] = best_val

# final test on best
best_key = max(results, key=results.get)
print("\nBest config:", best_key, "⇒ val acc =", results[best_key])

test_acc = evaluate(model, test_loader)
print("Test accuracy:", test_acc)


### GRADUAL  → unfreeze 1 block(s), epochs=5


Ep [1/5] Train: 100%|█████████████████████████| 92/92 [02:55<00:00,  1.90s/it, acc=0.467, loss=1.89]


Train → loss 1.8877, acc 0.4674


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.772]


Eval → acc 0.7717


Ep [2/5] Train: 100%|████████████████████████| 92/92 [02:54<00:00,  1.90s/it, acc=0.693, loss=0.981]


Train → loss 0.9810, acc 0.6929


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/it, acc=0.826]


Eval → acc 0.8261


Ep [3/5] Train: 100%|████████████████████████| 92/92 [02:54<00:00,  1.90s/it, acc=0.737, loss=0.808]


Train → loss 0.8075, acc 0.7371


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.823]


Eval → acc 0.8234


Ep [4/5] Train: 100%|████████████████████████| 92/92 [02:54<00:00,  1.89s/it, acc=0.758, loss=0.744]


Train → loss 0.7443, acc 0.7578


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/it, acc=0.855]


Eval → acc 0.8546


Ep [5/5] Train: 100%|████████████████████████| 92/92 [02:54<00:00,  1.90s/it, acc=0.787, loss=0.651]


Train → loss 0.6511, acc 0.7867


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.91s/it, acc=0.853]


Eval → acc 0.8533

### GRADUAL  → unfreeze 2 block(s), epochs=4


Ep [1/4] Train: 100%|█████████████████████████| 92/92 [03:00<00:00,  1.96s/it, acc=0.479, loss=1.86]


Train → loss 1.8569, acc 0.4786


  Eval: 100%|█████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.78]


Eval → acc 0.7799


Ep [2/4] Train: 100%|██████████████████████████| 92/92 [02:59<00:00,  1.95s/it, acc=0.716, loss=0.9]


Train → loss 0.9001, acc 0.7157


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.823]


Eval → acc 0.8234


Ep [3/4] Train: 100%|████████████████████████| 92/92 [02:59<00:00,  1.95s/it, acc=0.746, loss=0.781]


Train → loss 0.7807, acc 0.7463


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.848]


Eval → acc 0.8478


Ep [4/4] Train: 100%|████████████████████████| 92/92 [03:00<00:00,  1.96s/it, acc=0.785, loss=0.655]


Train → loss 0.6552, acc 0.7850


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.859]


Eval → acc 0.8587

### GRADUAL  → unfreeze 3 block(s), epochs=3


Ep [1/3] Train: 100%|█████████████████████████| 92/92 [03:13<00:00,  2.11s/it, acc=0.475, loss=1.85]


Train → loss 1.8532, acc 0.4749


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.792]


Eval → acc 0.7921


Ep [2/3] Train: 100%|████████████████████████| 92/92 [03:13<00:00,  2.10s/it, acc=0.721, loss=0.892]


Train → loss 0.8917, acc 0.7208


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/it, acc=0.837]


Eval → acc 0.8370


Ep [3/3] Train: 100%|████████████████████████| 92/92 [03:04<00:00,  2.00s/it, acc=0.779, loss=0.714]


Train → loss 0.7137, acc 0.7789


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.856]


Eval → acc 0.8560

### GRADUAL  → unfreeze 4 block(s), epochs=3


Ep [1/3] Train: 100%|█████████████████████████| 92/92 [03:13<00:00,  2.11s/it, acc=0.487, loss=1.85]


Train → loss 1.8460, acc 0.4868


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.776]


Eval → acc 0.7758


Ep [2/3] Train: 100%|████████████████████████| 92/92 [03:14<00:00,  2.11s/it, acc=0.704, loss=0.921]


Train → loss 0.9210, acc 0.7041


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.826]


Eval → acc 0.8261


Ep [3/3] Train: 100%|█████████████████████████| 92/92 [03:14<00:00,  2.12s/it, acc=0.764, loss=0.75]


Train → loss 0.7502, acc 0.7643


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.853]


Eval → acc 0.8533

### GRADUAL  → unfreeze 5 block(s), epochs=3


Ep [1/3] Train: 100%|██████████████████████████| 92/92 [03:17<00:00,  2.15s/it, acc=0.461, loss=1.9]


Train → loss 1.9003, acc 0.4606


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.803]


Eval → acc 0.8030


Ep [2/3] Train: 100%|████████████████████████| 92/92 [03:13<00:00,  2.10s/it, acc=0.711, loss=0.886]


Train → loss 0.8863, acc 0.7109


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.829]


Eval → acc 0.8288


Ep [3/3] Train: 100%|████████████████████████| 92/92 [03:33<00:00,  2.32s/it, acc=0.756, loss=0.729]


Train → loss 0.7287, acc 0.7565


  Eval: 100%|████████████████████████████████████████████| 23/23 [02:01<00:00,  5.29s/it, acc=0.848]


Eval → acc 0.8478

### ALL      → unfreeze 5 block(s), epochs=5


Ep [1/5] Train: 100%|█████████████████████████| 92/92 [31:33<00:00, 20.58s/it, acc=0.474, loss=1.85]


Train → loss 1.8531, acc 0.4738


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.88s/it, acc=0.781]


Eval → acc 0.7812


Ep [2/5] Train: 100%|█████████████████████████| 92/92 [03:28<00:00,  2.27s/it, acc=0.71, loss=0.901]


Train → loss 0.9007, acc 0.7099


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.819]


Eval → acc 0.8193


Ep [3/5] Train: 100%|████████████████████████| 92/92 [03:13<00:00,  2.11s/it, acc=0.758, loss=0.754]


Train → loss 0.7542, acc 0.7585


  Eval: 100%|█████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.84]


Eval → acc 0.8397


Ep [4/5] Train: 100%|████████████████████████| 92/92 [03:13<00:00,  2.10s/it, acc=0.782, loss=0.673]


Train → loss 0.6728, acc 0.7823


  Eval: 100%|█████████████████████████████████████████████| 23/23 [00:43<00:00,  1.90s/it, acc=0.88]


Eval → acc 0.8804


Ep [5/5] Train: 100%|████████████████████████| 92/92 [03:13<00:00,  2.10s/it, acc=0.813, loss=0.574]


Train → loss 0.5737, acc 0.8132


  Eval: 100%|████████████████████████████████████████████| 23/23 [00:43<00:00,  1.89s/it, acc=0.885]


Eval → acc 0.8845

Best config: all_l5 ⇒ val acc = 0.8845108695652174


  Eval: 100%|██████████████████████████████████████████| 115/115 [01:58<00:00,  1.03s/it, acc=0.882]

Eval → acc 0.8817
Test accuracy: 0.8817116380485146





### Strategy 3: Implementing ResNet50

Here we implement ResNet50, a larger model that should be able to capture more information and hopefully distinguish between the breeds better. We also implement MixUp (a data augmentation strategy listed to achieve a D/C grade).

```
Eval: 100%|████████████████████████████████████████████| 115/115 [06:38<00:00,  3.46s/it, acc=0.917]
Eval → acc 0.9174
```

Wow! This is around a 2.5% increase over using ResNet18 using similar parameters. This took significantly more time to train however and the returns are diminishing. Let's try one more model and see if we can push it further...

In [61]:
# CELL 1: Imports & Hyperparameters
import os, random
import numpy as np
import torch, torch.nn as nn, torch.optim as optim
from torch.optim.lr_scheduler import OneCycleLR
from torch.utils.data import DataLoader, Subset
from torchvision import transforms
from torchvision.datasets import OxfordIIITPet
from torchvision.models import resnet50, ResNet50_Weights
from tqdm import tqdm

# reproducibility & device
SEED       = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
device     = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# hyperparameters
NUM_CLASSES = 37
BATCH_SIZE  = 32
TRAINVAL_SPLIT = 0.8
EPOCHS      = 20             
MIXUP_ALPHA = 0.4
BASE_LR     = 1e-5           # for pretrained layers
HEAD_LR     = 1e-3           # for fc head
WEIGHT_DECAY= 1e-4
PATIENCE    = 4              # early‐stop on val
MIN_DELTA   = 1e-4

In [62]:
# CELL 2: Transforms & DataLoaders 
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(224, scale=(0.8,1.0)),
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(0.2,0.2,0.2,0.1),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225]),
])
val_test_transform = transforms.Compose([
    transforms.Resize(224),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225]),
])

# load full trainval & test
full = OxfordIIITPet('data', split='trainval', target_types='category',
                     transform=None, download=True)
test = OxfordIIITPet('data', split='test',    target_types='category',
                     transform=val_test_transform, download=True)

# split indices
n = len(full)
idx = list(range(n))
random.shuffle(idx)
split = int(n * TRAINVAL_SPLIT)
train_idx, val_idx = idx[:split], idx[split:]

# create two separate Datasets with their own transforms
train_ds = Subset(
    OxfordIIITPet('data','trainval','category',transform=train_transform),
    train_idx
)
val_ds   = Subset(
    OxfordIIITPet('data','trainval','category',transform=val_test_transform),
    val_idx
)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,  num_workers=4)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False, num_workers=4)
test_loader  = DataLoader(test,     batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

In [63]:
# CELL 3: MixUp Utilities 
def mixup_data(x, y, alpha=MIXUP_ALPHA):
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1.0
    batch_size = x.size(0)
    index = torch.randperm(batch_size).to(device)
    mixed_x = lam * x + (1 - lam) * x[index]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

In [64]:
# CELL 4: Build ResNet50 & BN Tuning 
def build_model(num_classes=NUM_CLASSES, dropout_p=0.5):
    model = resnet50(weights=ResNet50_Weights.DEFAULT)
    nf = model.fc.in_features
    model.fc = nn.Sequential(
        nn.Dropout(dropout_p),
        nn.Linear(nf, num_classes)
    )
    return model.to(device)

def set_bn_mode(model, fine_tune_bn=True):
    # if fine_tune_bn: allow updating running stats & affine weights
    # else: freeze BN in eval mode
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            if fine_tune_bn:
                m.train()
                for p in m.parameters(): p.requires_grad = True
            else:
                m.eval()
                for p in m.parameters(): p.requires_grad = False

In [65]:
# CELL 5: Optimizer & OneCycle Scheduler 
def make_optim_scheduler(model, total_steps):
    # differential LR groups: head vs. backbone
    fc_params   = list(model.fc.parameters())
    other_params= [p for n,p in model.named_parameters() if not n.startswith('fc.')]
    groups = [
        {'params': fc_params,    'lr': HEAD_LR},
        {'params': other_params, 'lr': BASE_LR}
    ]
    optimizer = optim.SGD(groups, momentum=0.9, nesterov=True, weight_decay=WEIGHT_DECAY)
    scheduler = OneCycleLR(
        optimizer,
        max_lr=[HEAD_LR, BASE_LR],
        total_steps=total_steps,
        pct_start=0.1,
        anneal_strategy='cos'
    )
    return optimizer, scheduler

In [66]:
# CELL 6: Train / Eval / EarlyStopping 
criterion = nn.CrossEntropyLoss()

class EarlyStopping:
    def __init__(self, patience=PATIENCE, min_delta=MIN_DELTA):
        self.patience, self.min_delta = patience, min_delta
        self.best, self.count = 0.0, 0
    def step(self, val_acc):
        if val_acc - self.best > self.min_delta:
            self.best, self.count = val_acc, 0
            return False
        else:
            self.count += 1
            return self.count > self.patience

def train_one_epoch(model, loader, optimizer):
    model.train()
    total_loss, total_correct, seen = 0.0, 0, 0
    loop = tqdm(loader, desc="Train", ncols=100)
    for imgs, lbls in loop:
        imgs, lbls = imgs.to(device), lbls.to(device)
        imgs, y_a, y_b, lam = mixup_data(imgs, lbls)
        optimizer.zero_grad()
        out = model(imgs)
        loss = mixup_criterion(criterion, out, y_a, y_b, lam)
        loss.backward()
        optimizer.step()

        bs = imgs.size(0)
        seen += bs
        total_loss   += loss.item() * bs
        # for accuracy we compare to the lam-weighted targets
        preds = out.argmax(dim=1)
        total_correct+= (lam * (preds==y_a).float() + (1-lam)*(preds==y_b).float()).sum().item()

        loop.set_postfix(loss=total_loss/seen, acc=total_correct/seen)
    return total_loss/seen, total_correct/seen

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    loop = tqdm(loader, desc="Eval", ncols=100)
    with torch.no_grad():
        for imgs, lbls in loop:
            imgs, lbls = imgs.to(device), lbls.to(device)
            out = model(imgs)
            correct += (out.argmax(dim=1)==lbls).sum().item()
            total   += lbls.size(0)
            loop.set_postfix(acc=correct/total)
    return correct/total

In [58]:
# CELL 7: Full Training Loop with Checkpointing 
# Rebuild model and set BatchNorm
model = build_model()
set_bn_mode(model, fine_tune_bn=True)

# Optimizer with differential LRs: head 1e-2, backbone 1e-4
optimizer = optim.SGD([
    {'params': model.fc.parameters(),                                    'lr': 1e-2},
    {'params': [p for n,p in model.named_parameters() if 'fc' not in n], 'lr': 1e-4}
], momentum=0.9, weight_decay=WEIGHT_DECAY, nesterov=True)

# OneCycleLR with proper total_steps and max_lr
total_steps = EPOCHS * len(train_loader)
scheduler = OneCycleLR(
    optimizer,
    max_lr=[1e-2, 1e-4],
    total_steps=total_steps,
    pct_start=0.1,
    anneal_strategy='cos'
)

stopper = EarlyStopping()
best_val = 0.0
ckpt_path = "best_resnet50_fixed.pth"
step = 0

for epoch in range(1, EPOCHS+1):
    print(f"\n=== Epoch {epoch}/{EPOCHS} ===")
    # Training
    model.train()
    train_loss, train_correct, seen = 0.0, 0, 0
    loop = tqdm(train_loader, desc="Train", ncols=100)
    for imgs, lbls in loop:
        imgs, lbls = imgs.to(device), lbls.to(device)
        # ensure lam is Python float
        mixed_x, y_a, y_b, lam = mixup_data(imgs, lbls)
        lam = float(lam)
        optimizer.zero_grad()
        out = model(mixed_x)
        loss = mixup_criterion(criterion, out, y_a, y_b, lam)
        loss.backward()
        optimizer.step()
        scheduler.step()       # step per batch

        bs = imgs.size(0)
        seen       += bs
        train_loss += loss.item() * bs
        preds      = out.argmax(dim=1)
        train_correct += (lam * (preds==y_a).float() + (1-lam)*(preds==y_b).float()).sum().item()
        loop.set_postfix(loss=train_loss/seen, acc=train_correct/seen)

    # Validation
    val_acc = evaluate(model, val_loader)
    avg_train_acc = train_correct/seen
    print(f"Train Acc: {avg_train_acc:.4f}   Val Acc: {val_acc:.4f}")

    # Checkpoint & early stop
    if val_acc > best_val + MIN_DELTA:
        best_val = val_acc
        torch.save(model.state_dict(), ckpt_path)
        print("→ New best, checkpoint saved.")
    if stopper.step(val_acc):
        print(f"→ Early stopping at epoch {epoch}")
        break

# Load best checkpoint and test
model.load_state_dict(torch.load(ckpt_path))
test_acc = evaluate(model, test_loader)
print(f"\n*** Final Test Accuracy: {test_acc:.4f} ***")


=== Epoch 1/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:17<00:00,  6.06s/it, acc=0.266, loss=3.24]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:40<00:00,  4.38s/it, acc=0.834]


Train Acc: 0.2655   Val Acc: 0.8342
→ New best, checkpoint saved.

=== Epoch 2/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:32<00:00,  6.23s/it, acc=0.617, loss=1.99]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:39<00:00,  4.33s/it, acc=0.899]


Train Acc: 0.6175   Val Acc: 0.8995
→ New best, checkpoint saved.

=== Epoch 3/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:37<00:00,  6.27s/it, acc=0.673, loss=1.54]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:39<00:00,  4.31s/it, acc=0.913]


Train Acc: 0.6731   Val Acc: 0.9130
→ New best, checkpoint saved.

=== Epoch 4/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:08<00:00,  5.96s/it, acc=0.692, loss=1.44]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:42<00:00,  4.46s/it, acc=0.918]


Train Acc: 0.6916   Val Acc: 0.9185
→ New best, checkpoint saved.

=== Epoch 5/20 ===


Train: 100%|██████████████████████████████████| 92/92 [10:03<00:00,  6.56s/it, acc=0.697, loss=1.43]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:39<00:00,  4.34s/it, acc=0.924]


Train Acc: 0.6973   Val Acc: 0.9239
→ New best, checkpoint saved.

=== Epoch 6/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:26<00:00,  6.15s/it, acc=0.694, loss=1.42]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:40<00:00,  4.37s/it, acc=0.924]


Train Acc: 0.6936   Val Acc: 0.9239

=== Epoch 7/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:04<00:00,  5.92s/it, acc=0.712, loss=1.31]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:39<00:00,  4.34s/it, acc=0.933]


Train Acc: 0.7122   Val Acc: 0.9334
→ New best, checkpoint saved.

=== Epoch 8/20 ===


Train: 100%|██████████████████████████████████| 92/92 [10:36<00:00,  6.92s/it, acc=0.716, loss=1.35]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:42<00:00,  4.47s/it, acc=0.925]


Train Acc: 0.7161   Val Acc: 0.9253

=== Epoch 9/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:22<00:00,  6.12s/it, acc=0.731, loss=1.29]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:40<00:00,  4.35s/it, acc=0.933]


Train Acc: 0.7312   Val Acc: 0.9334

=== Epoch 10/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:28<00:00,  6.18s/it, acc=0.742, loss=1.21]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:40<00:00,  4.35s/it, acc=0.928]


Train Acc: 0.7423   Val Acc: 0.9280

=== Epoch 11/20 ===


Train: 100%|██████████████████████████████████| 92/92 [10:03<00:00,  6.55s/it, acc=0.749, loss=1.21]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:43<00:00,  4.52s/it, acc=0.929]


Train Acc: 0.7485   Val Acc: 0.9293

=== Epoch 12/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:38<00:00,  6.29s/it, acc=0.772, loss=1.11]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:40<00:00,  4.37s/it, acc=0.938]


Train Acc: 0.7725   Val Acc: 0.9375
→ New best, checkpoint saved.

=== Epoch 13/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:09<00:00,  5.97s/it, acc=0.733, loss=1.25]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:41<00:00,  4.39s/it, acc=0.939]


Train Acc: 0.7333   Val Acc: 0.9389
→ New best, checkpoint saved.

=== Epoch 14/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:39<00:00,  6.30s/it, acc=0.731, loss=1.26]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:42<00:00,  4.47s/it, acc=0.935]


Train Acc: 0.7313   Val Acc: 0.9348

=== Epoch 15/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:26<00:00,  6.16s/it, acc=0.754, loss=1.19]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:45<00:00,  4.59s/it, acc=0.942]


Train Acc: 0.7540   Val Acc: 0.9416
→ New best, checkpoint saved.

=== Epoch 16/20 ===


Train: 100%|██████████████████████████████████| 92/92 [10:08<00:00,  6.62s/it, acc=0.748, loss=1.24]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:41<00:00,  4.40s/it, acc=0.936]


Train Acc: 0.7482   Val Acc: 0.9361

=== Epoch 17/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:13<00:00,  6.02s/it, acc=0.759, loss=1.17]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:41<00:00,  4.41s/it, acc=0.939]


Train Acc: 0.7586   Val Acc: 0.9389

=== Epoch 18/20 ===


Train: 100%|███████████████████████████████████| 92/92 [09:09<00:00,  5.97s/it, acc=0.74, loss=1.24]
Eval: 100%|███████████████████████████████████████████████| 23/23 [01:40<00:00,  4.36s/it, acc=0.94]


Train Acc: 0.7397   Val Acc: 0.9402

=== Epoch 19/20 ===


Train: 100%|██████████████████████████████████| 92/92 [09:04<00:00,  5.92s/it, acc=0.718, loss=1.32]
Eval: 100%|██████████████████████████████████████████████| 23/23 [01:39<00:00,  4.34s/it, acc=0.938]


Train Acc: 0.7176   Val Acc: 0.9375

=== Epoch 20/20 ===


Train: 100%|██████████████████████████████████| 92/92 [10:13<00:00,  6.66s/it, acc=0.732, loss=1.27]
Eval: 100%|███████████████████████████████████████████████| 23/23 [01:39<00:00,  4.35s/it, acc=0.94]
  model.load_state_dict(torch.load(ckpt_path))


Train Acc: 0.7322   Val Acc: 0.9402
→ Early stopping at epoch 20


Eval: 100%|████████████████████████████████████████████| 115/115 [06:38<00:00,  3.46s/it, acc=0.917]


*** Final Test Accuracy: 0.9174 ***





### Strategy 4: Implementing EfficientNet-B3

EfficientNet-B3 is a well known image classification model (and is, well, efficient). Perhaps this models architecture would be better suited for classifying the breeds.

We are implementing MixUp as well as CutMix. With a more robust model, this should allow our model to not overfit and generalize more. This is especially good because earlier models were overfitting more than they should have.

After running the model for 10 hours (yeah...), we achieve a validation accuracy of 95.52%. This is great! On our test set we get the following.

```
Eval: 100%|█████████████████████████████████████████████| 115/115 [13:42<00:00,  7.15s/it, acc=0.93]

*** Final Test Accuracy: 0.9302 ***
```

This is great! And is a 4% increase over what we were getting using ResNet18 with Strategy 1 and Strategy 2.

You could train on all of the data to generalize across all datasets, and would likely see another increase in performance. This would make testing the accuracy difficult however.

In [69]:
# CELL 1: Imports & Hyperparameters
import os, random
import numpy as np
import torch, torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts
from torch.utils.data import DataLoader, Subset
from torchvision import transforms, models
from torchvision.datasets import OxfordIIITPet
from tqdm import tqdm

# reproducibility & device
SEED            = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
device          = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# hyperparameters
NUM_CLASSES     = 37
BATCH_SIZE      = 32
TRAINVAL_SPLIT  = 0.8
EPOCHS          = 30
FREEZE_EPOCHS   = 5
MIXUP_ALPHA     = 0.4
CUTMIX_ALPHA    = 1.0
HEAD_LR         = 1e-3
BASE_LR         = 1e-4
WEIGHT_DECAY    = 1e-4
LABEL_SMOOTHING = 0.1
CKPT_PATH       = "best_effnetb3.pth"
MIN_DELTA       = 1e-4
PATIENCE        = 4

In [70]:
# CELL 2: Transforms & DataLoaders
train_transform = transforms.Compose([
    transforms.RandomResizedCrop(300, scale=(0.8,1.0)),
    transforms.RandAugment(num_ops=2, magnitude=9),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225]),
])
val_test_transform = transforms.Compose([
    transforms.Resize(320),
    transforms.CenterCrop(300),
    transforms.ToTensor(),
    transforms.Normalize([0.485,0.456,0.406],
                         [0.229,0.224,0.225]),
])

full = OxfordIIITPet('data', split='trainval', target_types='category', download=True)
test = OxfordIIITPet('data', split='test',     target_types='category', transform=val_test_transform, download=True)

n = len(full)
idx = list(range(n))
random.shuffle(idx)
split = int(n * TRAINVAL_SPLIT)
train_idx, val_idx = idx[:split], idx[split:]

train_ds = Subset(
    OxfordIIITPet('data','trainval','category',transform=train_transform),
    train_idx
)
val_ds = Subset(
    OxfordIIITPet('data','trainval','category',transform=val_test_transform),
    val_idx
)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,  num_workers=4)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False, num_workers=4)
test_loader  = DataLoader(test,     batch_size=BATCH_SIZE, shuffle=False, num_workers=4)

In [71]:
# CELL 3: Build EfficientNet-B3 & BN toggling
def build_model(num_classes=NUM_CLASSES, dropout_p=0.5):
    weights = models.EfficientNet_B3_Weights.IMAGENET1K_V1
    model = models.efficientnet_b3(weights=weights)
    nf = model.classifier[1].in_features
    model.classifier = nn.Sequential(
        nn.Dropout(dropout_p),
        nn.Linear(nf, num_classes)
    )
    return model.to(device)

def set_bn_mode(model, fine_tune_bn=True):
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            if fine_tune_bn:
                m.train()
                for p in m.parameters(): p.requires_grad = True
            else:
                m.eval()
                for p in m.parameters(): p.requires_grad = False

def freeze_backbone(model):
    for name, p in model.named_parameters():
        if not name.startswith('classifier'): p.requires_grad = False

def unfreeze_all(model):
    for p in model.parameters(): p.requires_grad = True

model = build_model()

Downloading: "https://download.pytorch.org/models/efficientnet_b3_rwightman-b3899882.pth" to /Users/johnludeke/.cache/torch/hub/checkpoints/efficientnet_b3_rwightman-b3899882.pth
100%|██████████████████████████████████████| 47.2M/47.2M [00:06<00:00, 7.99MB/s]


In [72]:
# CELL 4: MixUp & CutMix utilities
def mixup_data(x, y, alpha=MIXUP_ALPHA):
    lam = np.random.beta(alpha, alpha) if alpha > 0 else 1.0
    idx = torch.randperm(x.size(0)).to(device)
    mixed_x = lam * x + (1 - lam) * x[idx]
    return mixed_x, y, y[idx], lam

def cutmix_data(x, y, alpha=CUTMIX_ALPHA):
    lam = np.random.beta(alpha, alpha) if alpha > 0 else 1.0
    B, _, H, W = x.size()
    idx = torch.randperm(B).to(device)
    cut_rat = np.sqrt(1. - lam)
    cut_w = int(W * cut_rat)
    cut_h = int(H * cut_rat)
    cx = np.random.randint(W); cy = np.random.randint(H)
    x1 = np.clip(cx - cut_w//2, 0, W); x2 = np.clip(cx + cut_w//2, 0, W)
    y1 = np.clip(cy - cut_h//2, 0, H); y2 = np.clip(cy + cut_h//2, 0, H)
    x[:, :, y1:y2, x1:x2] = x[idx, :, y1:y2, x1:x2]
    lam = 1 - ((x2-x1)*(y2-y1)/(H*W))
    return x, y, y[idx], lam

def mixup_criterion(crit, pred, y_a, y_b, lam):
    return lam*crit(pred, y_a) + (1-lam)*crit(pred, y_b)

In [73]:
# CELL 5: Loss & EarlyStopping
criterion = nn.CrossEntropyLoss(label_smoothing=LABEL_SMOOTHING)

class EarlyStopping:
    def __init__(self, patience=PATIENCE, min_delta=MIN_DELTA):
        self.patience, self.min_delta = patience, min_delta
        self.best, self.count = 0.0, 0
    def step(self, val_acc):
        if val_acc - self.best > self.min_delta:
            self.best, self.count = val_acc, 0
            return False
        else:
            self.count += 1
            return self.count > self.patience

In [74]:
# CELL 6: Optimizer & Scheduler
optimizer = optim.AdamW([
    {'params': model.classifier.parameters(),         'lr': HEAD_LR},
    {'params': [p for n,p in model.named_parameters() if 'classifier' not in n], 'lr': BASE_LR}
], weight_decay=WEIGHT_DECAY)

scheduler = CosineAnnealingWarmRestarts(
    optimizer,
    T_0=10,
    T_mult=2
)

In [76]:
# CELL 7: Train & Eval functions with tqdm
def train_one_epoch(model, loader, optimizer):
    model.train()
    total_loss, total_correct, seen = 0.0, 0, 0
    loop = tqdm(loader, desc="Train", ncols=100)
    for imgs, lbls in loop:
        imgs, lbls = imgs.to(device), lbls.to(device)
        r = random.random()
        if r < 0.4:
            imgs, y_a, y_b, lam = mixup_data(imgs, lbls)
            loss_fn = lambda out: mixup_criterion(criterion, out, y_a, y_b, lam)
        elif r < 0.8:
            imgs, y_a, y_b, lam = cutmix_data(imgs, lbls)
            loss_fn = lambda out: mixup_criterion(criterion, out, y_a, y_b, lam)
        else:
            y_a, y_b, lam = lbls, lbls, 1.0
            loss_fn = lambda out: criterion(out, lbls)

        optimizer.zero_grad()
        out = model(imgs)
        loss = loss_fn(out)
        loss.backward()
        optimizer.step()
        scheduler.step()

        bs = lbls.size(0)
        preds = out.argmax(dim=1)
        correct = (lam*(preds==y_a).float() + (1-lam)*(preds==y_b).float()).sum().item()
        total_correct += correct
        seen += bs
        total_loss += loss.item() * bs

        loop.set_postfix(loss=total_loss/seen, acc=total_correct/seen)
    return total_loss/seen, total_correct/seen

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    loop = tqdm(loader, desc="Eval", ncols=100)
    with torch.no_grad():
        for imgs, lbls in loop:
            imgs, lbls = imgs.to(device), lbls.to(device)
            out = model(imgs)
            batch_correct = (out.argmax(dim=1)==lbls).sum().item()
            correct += batch_correct
            total += lbls.size(0)
            loop.set_postfix(acc=correct/total)
    return correct/total

In [77]:
# CELL 8: Full Training Loop
stopper = EarlyStopping()
best_val = 0.0

for epoch in range(1, EPOCHS+1):
    print(f"\n=== Epoch {epoch}/{EPOCHS} ===")
    if epoch <= FREEZE_EPOCHS:
        freeze_backbone(model)
        set_bn_mode(model, fine_tune_bn=False)
    else:
        unfreeze_all(model)
        set_bn_mode(model, fine_tune_bn=True)

    train_loss, train_acc = train_one_epoch(model, train_loader, optimizer)
    val_acc = evaluate(model, val_loader)
    print(f"Train Acc: {train_acc:.4f}  Val Acc: {val_acc:.4f}")

    if val_acc > best_val + MIN_DELTA:
        best_val = val_acc
        torch.save(model.state_dict(), CKPT_PATH)
        print("→ New best, checkpoint saved.")
    if stopper.step(val_acc):
        print(f"→ Early stopping at epoch {epoch}")
        break


=== Epoch 1/30 ===


Train: 100%|██████████████████████████████████| 92/92 [11:41<00:00,  7.62s/it, acc=0.332, loss=3.19]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:08<00:00,  8.20s/it, acc=0.836]


Train Acc: 0.3317  Val Acc: 0.8356
→ New best, checkpoint saved.

=== Epoch 2/30 ===


Train: 100%|███████████████████████████████████| 92/92 [11:32<00:00,  7.53s/it, acc=0.615, loss=2.5]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:05<00:00,  8.04s/it, acc=0.885]


Train Acc: 0.6147  Val Acc: 0.8845
→ New best, checkpoint saved.

=== Epoch 3/30 ===


Train: 100%|███████████████████████████████████| 92/92 [11:46<00:00,  7.68s/it, acc=0.624, loss=2.2]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:02<00:00,  7.94s/it, acc=0.904]


Train Acc: 0.6237  Val Acc: 0.9035
→ New best, checkpoint saved.

=== Epoch 4/30 ===


Train: 100%|██████████████████████████████████| 92/92 [11:30<00:00,  7.50s/it, acc=0.697, loss=1.91]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:04<00:00,  8.00s/it, acc=0.904]


Train Acc: 0.6970  Val Acc: 0.9035

=== Epoch 5/30 ===


Train: 100%|██████████████████████████████████| 92/92 [11:31<00:00,  7.51s/it, acc=0.665, loss=1.92]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:06<00:00,  8.11s/it, acc=0.912]


Train Acc: 0.6651  Val Acc: 0.9117
→ New best, checkpoint saved.

=== Epoch 6/30 ===


Train: 100%|██████████████████████████████████| 92/92 [41:36<00:00, 27.13s/it, acc=0.724, loss=1.72]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:04<00:00,  8.01s/it, acc=0.931]


Train Acc: 0.7236  Val Acc: 0.9307
→ New best, checkpoint saved.

=== Epoch 7/30 ===


Train: 100%|██████████████████████████████████| 92/92 [44:03<00:00, 28.73s/it, acc=0.707, loss=1.75]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:04<00:00,  8.03s/it, acc=0.938]


Train Acc: 0.7073  Val Acc: 0.9375
→ New best, checkpoint saved.

=== Epoch 8/30 ===


Train: 100%|██████████████████████████████████| 92/92 [44:05<00:00, 28.75s/it, acc=0.714, loss=1.65]
Eval: 100%|███████████████████████████████████████████████| 23/23 [03:03<00:00,  7.99s/it, acc=0.94]


Train Acc: 0.7141  Val Acc: 0.9402
→ New best, checkpoint saved.

=== Epoch 9/30 ===


Train: 100%|██████████████████████████████████| 92/92 [41:27<00:00, 27.04s/it, acc=0.736, loss=1.63]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:02<00:00,  7.92s/it, acc=0.947]


Train Acc: 0.7358  Val Acc: 0.9470
→ New best, checkpoint saved.

=== Epoch 10/30 ===


Train: 100%|██████████████████████████████████| 92/92 [39:20<00:00, 25.66s/it, acc=0.749, loss=1.55]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:02<00:00,  7.92s/it, acc=0.955]


Train Acc: 0.7487  Val Acc: 0.9552
→ New best, checkpoint saved.

=== Epoch 11/30 ===


Train: 100%|██████████████████████████████████| 92/92 [39:54<00:00, 26.03s/it, acc=0.752, loss=1.54]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:01<00:00,  7.91s/it, acc=0.951]


Train Acc: 0.7524  Val Acc: 0.9511

=== Epoch 12/30 ===


Train: 100%|██████████████████████████████████| 92/92 [42:28<00:00, 27.70s/it, acc=0.735, loss=1.58]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:02<00:00,  7.95s/it, acc=0.947]


Train Acc: 0.7353  Val Acc: 0.9470

=== Epoch 13/30 ===


Train: 100%|██████████████████████████████████| 92/92 [41:02<00:00, 26.77s/it, acc=0.753, loss=1.52]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:04<00:00,  8.03s/it, acc=0.948]


Train Acc: 0.7531  Val Acc: 0.9484

=== Epoch 14/30 ===


Train: 100%|██████████████████████████████████| 92/92 [38:51<00:00, 25.34s/it, acc=0.787, loss=1.43]
Eval: 100%|███████████████████████████████████████████████| 23/23 [03:02<00:00,  7.93s/it, acc=0.95]


Train Acc: 0.7867  Val Acc: 0.9497

=== Epoch 15/30 ===


Train: 100%|██████████████████████████████████| 92/92 [40:31<00:00, 26.43s/it, acc=0.744, loss=1.58]
Eval: 100%|██████████████████████████████████████████████| 23/23 [03:25<00:00,  8.92s/it, acc=0.951]

Train Acc: 0.7443  Val Acc: 0.9511
→ Early stopping at epoch 15





In [78]:
# CELL 9: Load Best & Test
model.load_state_dict(torch.load(CKPT_PATH))
test_acc = evaluate(model, test_loader)
print(f"\n*** Final Test Accuracy: {test_acc:.4f} ***")

  model.load_state_dict(torch.load(CKPT_PATH))
Eval: 100%|█████████████████████████████████████████████| 115/115 [13:42<00:00,  7.15s/it, acc=0.93]


*** Final Test Accuracy: 0.9302 ***



