# GNR 638: Machine Learning for Remote Sensing

**Task:** Multi-class image classification of remotely sensed images  
**Competition:** [gnr638-mls4rs-a1 on Kaggle](https://www.kaggle.com/competitions/gnr638-mls4rs-a1)  
**Data source:** [PatternNet](https://huggingface.co/datasets/blanchon/PatternNet) (HuggingFace) — publicly available benchmark that matches the competition class set  
**Architecture:** ResNet-50 with ImageNet pre-training (transfer learning)  
**Classes (7):** Basketball Court, Beach, Forest, Railway, Swimming Pool, Tennis Court, Others

---

## Background

The Kaggle competition (GNR 638, IIT Bombay) is closed and invitation-only, so the original data is inaccessible.
**PatternNet** is a widely-used public remote sensing benchmark with 38 scene classes (800 images × 256×256 px each).
Six of its classes map directly onto the competition classes; all remaining 32 classes are pooled into an *Others* category.

| Competition class | PatternNet class | PatternNet index |
|---|---|---|
| Basketball Court | `basketball_court` | 2 |
| Beach | `beach` | 3 |
| Forest | `forest` | 14 |
| Railway | `railway` | 26 |
| Swimming Pool (Water Pool) | `swimming_pool` | 34 |
| Tennis Court | `tennis_court` | 35 |
| Others | remaining 32 classes (subsampled) | — |

---

## 0. Environment Setup

In [None]:
import os
import random
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, Subset
import torchvision.transforms as transforms
from torchvision import models
from sklearn.metrics import confusion_matrix, classification_report
from pathlib import Path

# ── Reproducibility ──────────────────────────────────────────────────────────
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed_all(SEED)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'PyTorch {torch.__version__} | Device: {DEVICE}')
if DEVICE.type == 'cuda':
    print(f'GPU: {torch.cuda.get_device_name(0)} | '
          f'VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')

## 1. Data Download — PatternNet via HuggingFace

PatternNet is downloaded automatically by the HuggingFace `datasets` library and cached locally.
No account or API key is required.

In [None]:
from datasets import load_dataset

print('Downloading PatternNet from HuggingFace (cached after first run) ...')
hf_dataset = load_dataset('blanchon/PatternNet', split='train')

print(f'\nDataset loaded.')
print(f'  Total samples : {len(hf_dataset)}')
print(f'  Columns       : {hf_dataset.column_names}')
print(f'  Image type    : {type(hf_dataset[0]["image"])}')
print(f'  Image size    : {hf_dataset[0]["image"].size}  (W x H)')
print(f'  Num classes   : {hf_dataset.features["label"].num_classes}')

## 2. Class Mapping & Dataset Construction

Map PatternNet's 38 classes to our 7 competition classes.  
The *Others* category is subsampled to 800 images to maintain class balance.

In [None]:
# ── PatternNet label names ────────────────────────────────────────────────────
PN_LABEL_NAMES = hf_dataset.features['label'].names
print('PatternNet classes:')
for i, name in enumerate(PN_LABEL_NAMES):
    print(f'  {i:>2}  {name}')

In [None]:
# ── Competition class definitions ─────────────────────────────────────────────
CLASS_NAMES = [
    'basketball_court',
    'beach',
    'forest',
    'railway',
    'swimming_pool',
    'tennis_court',
    'others',
]
CLASS_TO_IDX = {name: i for i, name in enumerate(CLASS_NAMES)}
NUM_CLASSES  = len(CLASS_NAMES)

# PatternNet index → our competition label index
# Any PatternNet index not listed here maps to 'others' (6)
PN_TO_COMP = {
    PN_LABEL_NAMES.index('basketball_court'): CLASS_TO_IDX['basketball_court'],
    PN_LABEL_NAMES.index('beach'):            CLASS_TO_IDX['beach'],
    PN_LABEL_NAMES.index('forest'):           CLASS_TO_IDX['forest'],
    PN_LABEL_NAMES.index('railway'):          CLASS_TO_IDX['railway'],
    PN_LABEL_NAMES.index('swimming_pool'):    CLASS_TO_IDX['swimming_pool'],
    PN_LABEL_NAMES.index('tennis_court'):     CLASS_TO_IDX['tennis_court'],
}
OTHERS_LABEL = CLASS_TO_IDX['others']   # = 6

print(f'Competition classes ({NUM_CLASSES}):', CLASS_NAMES)
print(f'\nPatternNet index → competition label:')
for pn_idx, comp_idx in PN_TO_COMP.items():
    print(f'  PN[{pn_idx:>2}] {PN_LABEL_NAMES[pn_idx]:<22} → [{comp_idx}] {CLASS_NAMES[comp_idx]}')

In [None]:
# ── Partition all 30,400 indices by competition label ─────────────────────────
OTHERS_SAMPLE = 800   # subsample 'others' to match per-class count

indices_by_class = {i: [] for i in range(NUM_CLASSES)}
pn_labels = hf_dataset['label']   # list of int, fast access

for idx, pn_lbl in enumerate(pn_labels):
    comp_lbl = PN_TO_COMP.get(pn_lbl, OTHERS_LABEL)
    indices_by_class[comp_lbl].append(idx)

# Subsample 'others'
rng = random.Random(SEED)
indices_by_class[OTHERS_LABEL] = rng.sample(
    indices_by_class[OTHERS_LABEL], OTHERS_SAMPLE
)

print('Images per competition class (after subsampling others):')
total = 0
counts = {}
for i, name in enumerate(CLASS_NAMES):
    n = len(indices_by_class[i])
    counts[name] = n
    total += n
    print(f'  [{i}] {name:<22} {n}')
print(f'  {"TOTAL":<26} {total}')

In [None]:
# ── Stratified train / val / test split (70 / 15 / 15) ───────────────────────
TRAIN_FRAC = 0.70
VAL_FRAC   = 0.15
# TEST_FRAC = 0.15 (remainder)

train_indices, val_indices, test_indices = [], [], []

for cls_idx in range(NUM_CLASSES):
    idxs = indices_by_class[cls_idx].copy()
    rng.shuffle(idxs)
    n       = len(idxs)
    n_train = int(n * TRAIN_FRAC)
    n_val   = int(n * VAL_FRAC)
    train_indices.extend(idxs[:n_train])
    val_indices.extend(idxs[n_train:n_train + n_val])
    test_indices.extend(idxs[n_train + n_val:])

print(f'Split sizes  — train: {len(train_indices)}, '
      f'val: {len(val_indices)}, test: {len(test_indices)}')

## 3. Exploratory Data Analysis

In [None]:
# ── Class distribution bar chart ─────────────────────────────────────────────
fig, ax = plt.subplots(figsize=(10, 4))
bars = ax.bar(counts.keys(), counts.values(), color='steelblue', edgecolor='white')
ax.set_title('Dataset — Class Distribution (after subsampling Others)', fontsize=13, fontweight='bold')
ax.set_xlabel('Class', fontsize=11)
ax.set_ylabel('Number of Images', fontsize=11)
ax.tick_params(axis='x', rotation=25)
for bar, val in zip(bars, counts.values()):
    ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 2,
            str(val), ha='center', va='bottom', fontsize=9)
plt.tight_layout()
plt.savefig('class_distribution.png', dpi=100, bbox_inches='tight')
plt.show()
print(f'Class balance ratio (min/max): {min(counts.values())/max(counts.values()):.2f}')

In [None]:
# ── Sample images per class ───────────────────────────────────────────────────
SAMPLES_PER_CLASS = 3

# Build a quick lookup: comp_label → list of HF indices (from full set)
fig, axes = plt.subplots(NUM_CLASSES, SAMPLES_PER_CLASS,
                         figsize=(SAMPLES_PER_CLASS * 3, NUM_CLASSES * 3))
fig.suptitle('Sample Images per Class (PatternNet)', fontsize=13, fontweight='bold', y=1.01)

for row, cls_name in enumerate(CLASS_NAMES):
    cls_idx  = CLASS_TO_IDX[cls_name]
    pool     = indices_by_class[cls_idx]
    samples  = rng.sample(pool, min(SAMPLES_PER_CLASS, len(pool)))
    for col, hf_idx in enumerate(samples):
        ax  = axes[row][col]
        img = hf_dataset[hf_idx]['image']   # PIL image
        ax.imshow(img)
        ax.axis('off')
        if col == 0:
            ax.set_ylabel(cls_name.replace('_', ' ').title(),
                          fontsize=8, rotation=0, labelpad=70, va='center')

plt.tight_layout()
plt.savefig('sample_images.png', dpi=100, bbox_inches='tight')
plt.show()

In [None]:
# ── Image size statistics ─────────────────────────────────────────────────────
# PatternNet images are uniformly 256x256; confirm on a sample
STAT_SAMPLE = 200
sample_idxs = rng.sample(range(len(hf_dataset)), STAT_SAMPLE)
widths  = [hf_dataset[i]['image'].size[0] for i in sample_idxs]
heights = [hf_dataset[i]['image'].size[1] for i in sample_idxs]

print(f'Image dimensions (W x H) — sample of {STAT_SAMPLE}:')
print(f'  Width  — min: {min(widths)}, max: {max(widths)}, mean: {np.mean(widths):.0f}')
print(f'  Height — min: {min(heights)}, max: {max(heights)}, mean: {np.mean(heights):.0f}')

## 4. Data Augmentation & DataLoaders

ImageNet normalisation statistics are used since ResNet-50 was pre-trained on ImageNet.  
Training transforms include random crop, flips, rotations, colour jitter, and random erasing.

In [None]:
# ── Hyperparameters ───────────────────────────────────────────────────────────
IMG_SIZE    = 224
BATCH_SIZE  = 32
NUM_WORKERS = 4

IMAGENET_MEAN = [0.485, 0.456, 0.406]
IMAGENET_STD  = [0.229, 0.224, 0.225]

train_transform = transforms.Compose([
    transforms.Resize((IMG_SIZE + 32, IMG_SIZE + 32)),
    transforms.RandomCrop(IMG_SIZE),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomVerticalFlip(p=0.3),
    transforms.RandomRotation(degrees=20),
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05),
    transforms.ToTensor(),
    transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
    transforms.RandomErasing(p=0.2, scale=(0.02, 0.1)),
])

eval_transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=IMAGENET_MEAN, std=IMAGENET_STD),
])

print(f'Input size : {IMG_SIZE}x{IMG_SIZE} | Batch size: {BATCH_SIZE}')

In [None]:
class PatternNetDataset(Dataset):
    """
    Wraps a HuggingFace PatternNet split with competition label mapping
    and optional torchvision transform.
    """

    def __init__(self, hf_dataset, indices, pn_to_comp, others_label, transform=None):
        self.hf_dataset   = hf_dataset
        self.indices      = indices          # HF dataset indices to expose
        self.pn_to_comp   = pn_to_comp       # {pn_int: comp_int}
        self.others_label = others_label
        self.transform    = transform

    def __len__(self):
        return len(self.indices)

    def __getitem__(self, idx):
        hf_idx  = self.indices[idx]
        item    = self.hf_dataset[hf_idx]
        image   = item['image'].convert('RGB')      # PIL image
        pn_lbl  = item['label']
        comp_lbl = self.pn_to_comp.get(pn_lbl, self.others_label)

        if self.transform:
            image = self.transform(image)
        return image, comp_lbl


train_ds = PatternNetDataset(hf_dataset, train_indices, PN_TO_COMP, OTHERS_LABEL, train_transform)
val_ds   = PatternNetDataset(hf_dataset, val_indices,   PN_TO_COMP, OTHERS_LABEL, eval_transform)
test_ds  = PatternNetDataset(hf_dataset, test_indices,  PN_TO_COMP, OTHERS_LABEL, eval_transform)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True,
                          num_workers=NUM_WORKERS, pin_memory=True)
val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=NUM_WORKERS, pin_memory=True)
test_loader  = DataLoader(test_ds,  batch_size=BATCH_SIZE, shuffle=False,
                          num_workers=NUM_WORKERS, pin_memory=True)

print(f'Train : {len(train_ds):>5} images  |  {len(train_loader):>3} batches')
print(f'Val   : {len(val_ds):>5} images  |  {len(val_loader):>3} batches')
print(f'Test  : {len(test_ds):>5} images  |  {len(test_loader):>3} batches')

## 5. Model — ResNet-50 with Fine-Tuning

Load ImageNet pre-trained ResNet-50, replace the final FC layer with a dropout + linear head for 7-class output.

In [None]:
def build_resnet50(num_classes: int, pretrained: bool = True):
    """Build ResNet-50 with a custom classification head."""
    weights = models.ResNet50_Weights.IMAGENET1K_V2 if pretrained else None
    model   = models.resnet50(weights=weights)
    in_features = model.fc.in_features
    model.fc = nn.Sequential(
        nn.Dropout(p=0.4),
        nn.Linear(in_features, num_classes)
    )
    return model


model = build_resnet50(NUM_CLASSES).to(DEVICE)

total_params     = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'Total parameters     : {total_params:,}')
print(f'Trainable parameters : {trainable_params:,}')
print(f'Classification head  : {model.fc}')

## 6. Training

- **Loss:** Cross-entropy with label smoothing (0.1)  
- **Optimiser:** AdamW with weight decay  
- **Scheduler:** Cosine annealing with warm restarts  
- **Early stopping:** Patience of 7 epochs on validation accuracy

In [None]:
NUM_EPOCHS   = 30
LR           = 3e-4
WEIGHT_DECAY = 1e-4
PATIENCE     = 7
CKPT_DIR     = Path('checkpoints')
CKPT_DIR.mkdir(exist_ok=True)

criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
optimizer = optim.AdamW(model.parameters(), lr=LR, weight_decay=WEIGHT_DECAY)
scheduler = optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=1)

print(f'Epochs: {NUM_EPOCHS} | LR: {LR} | Weight decay: {WEIGHT_DECAY} | Patience: {PATIENCE}')

In [None]:
def print_progress_bar(current, total, length=40):
    """Print a simple progress bar that updates in place."""
    progress = current / total
    filled   = int(length * progress)
    bar      = '#' * filled + '.' * (length - filled)
    print(f'\rProgress: [{bar}] {current}/{total} ({progress*100:.1f}%)',
          end='', flush=True)
    if current == total:
        print()


def run_epoch(model, loader, criterion, optimizer=None):
    """Run one epoch. Pass optimizer=None for evaluation mode."""
    training = optimizer is not None
    model.train() if training else model.eval()
    running_loss = correct = total = 0

    with torch.set_grad_enabled(training):
        for images, labels in loader:
            images = images.to(DEVICE, non_blocking=True)
            labels = labels.to(DEVICE, non_blocking=True)
            outputs = model(images)
            loss    = criterion(outputs, labels)
            if training:
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
            running_loss += loss.item() * images.size(0)
            correct      += (outputs.argmax(1) == labels).sum().item()
            total        += images.size(0)

    return running_loss / total, correct / total


print('Training utilities defined.')

In [None]:
history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}
best_val_acc  = 0.0
epochs_no_imp = 0

print(f'Training ResNet-50 on {DEVICE} ...\n')
print(f'{"Epoch":>6} | {"Train Loss":>10} | {"Train Acc":>9} | '
      f'{"Val Loss":>8} | {"Val Acc":>8} | {"LR":>10}')
print('-' * 65)

for epoch in range(1, NUM_EPOCHS + 1):
    train_loss, train_acc = run_epoch(model, train_loader, criterion, optimizer)
    val_loss,   val_acc   = run_epoch(model, val_loader,   criterion)
    scheduler.step()

    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)

    current_lr = optimizer.param_groups[0]['lr']
    marker     = ' *' if val_acc > best_val_acc else ''
    print(f'{epoch:>6} | {train_loss:>10.4f} | {train_acc*100:>8.2f}% | '
          f'{val_loss:>8.4f} | {val_acc*100:>7.2f}%{marker} | {current_lr:>10.2e}')

    if val_acc > best_val_acc:
        best_val_acc  = val_acc
        epochs_no_imp = 0
        torch.save({'epoch': epoch,
                    'model_state_dict': model.state_dict(),
                    'val_acc': best_val_acc},
                   CKPT_DIR / 'best_model.pth')
    else:
        epochs_no_imp += 1
        if epochs_no_imp >= PATIENCE:
            print(f'\nEarly stopping at epoch {epoch} (no improvement for {PATIENCE} epochs).')
            break

print(f'\nBest validation accuracy: {best_val_acc*100:.2f}%')

## 7. Training Curves

In [None]:
x = range(1, len(history['train_loss']) + 1)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.plot(x, history['train_loss'], label='Train', color='steelblue')
ax1.plot(x, history['val_loss'],   label='Validation', color='coral')
ax1.set_title('Cross-Entropy Loss', fontweight='bold')
ax1.set_xlabel('Epoch'); ax1.set_ylabel('Loss')
ax1.legend(); ax1.grid(alpha=0.3)

ax2.plot(x, [a*100 for a in history['train_acc']], label='Train', color='steelblue')
ax2.plot(x, [a*100 for a in history['val_acc']],   label='Validation', color='coral')
ax2.set_title('Classification Accuracy (%)', fontweight='bold')
ax2.set_xlabel('Epoch'); ax2.set_ylabel('Accuracy (%)')
ax2.legend(); ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig('training_curves.png', dpi=100, bbox_inches='tight')
plt.show()
print(f'Best val acc: {best_val_acc*100:.2f}%  |  '
      f'Final train acc: {history["train_acc"][-1]*100:.2f}%')

## 8. Evaluation on Validation Set

In [None]:
# ── Load best checkpoint ──────────────────────────────────────────────────────
ckpt = torch.load(CKPT_DIR / 'best_model.pth', map_location=DEVICE)
model.load_state_dict(ckpt['model_state_dict'])
print(f'Loaded checkpoint from epoch {ckpt["epoch"]} (val acc: {ckpt["val_acc"]*100:.2f}%)')

In [None]:
model.eval()
all_preds, all_labels = [], []

with torch.no_grad():
    for images, labels in val_loader:
        outputs = model(images.to(DEVICE))
        all_preds.extend(outputs.argmax(1).cpu().numpy())
        all_labels.extend(labels.numpy())

all_preds  = np.array(all_preds)
all_labels = np.array(all_labels)

print(f'Validation Accuracy: {(all_preds == all_labels).mean()*100:.2f}%\n')
print(classification_report(
    all_labels, all_preds,
    target_names=[n.replace('_', ' ').title() for n in CLASS_NAMES],
    digits=3
))

In [None]:
# ── Normalised Confusion Matrix ───────────────────────────────────────────────
cm      = confusion_matrix(all_labels, all_preds)
cm_norm = cm.astype(float) / cm.sum(axis=1, keepdims=True)

display_names = [n.replace('_', ' ').title() for n in CLASS_NAMES]

fig, ax = plt.subplots(figsize=(9, 7))
im = ax.imshow(cm_norm, cmap='Blues', vmin=0, vmax=1)
fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

ticks = np.arange(NUM_CLASSES)
ax.set_xticks(ticks); ax.set_xticklabels(display_names, rotation=35, ha='right', fontsize=9)
ax.set_yticks(ticks); ax.set_yticklabels(display_names, fontsize=9)

for i in range(NUM_CLASSES):
    for j in range(NUM_CLASSES):
        colour = 'white' if cm_norm[i, j] > 0.5 else 'black'
        ax.text(j, i, f'{cm_norm[i, j]:.2f}\n({cm[i, j]})',
                ha='center', va='center', fontsize=8, color=colour)

ax.set_title('Normalised Confusion Matrix — Validation Set', fontweight='bold')
ax.set_xlabel('Predicted Class'); ax.set_ylabel('True Class')
plt.tight_layout()
plt.savefig('confusion_matrix.png', dpi=100, bbox_inches='tight')
plt.show()

## 9. Test Set Evaluation & Submission Generation

The held-out test split (15% of data) is used to simulate the competition leaderboard evaluation.
A `submission.csv` is produced in the Kaggle format.

In [None]:
model.eval()
test_preds, test_labels = [], []
n_batches = len(test_loader)

print(f'Running inference on {len(test_ds)} test images ...')
with torch.no_grad():
    for i, (images, labels) in enumerate(test_loader, 1):
        outputs = model(images.to(DEVICE))
        test_preds.extend(outputs.argmax(1).cpu().numpy())
        test_labels.extend(labels.numpy())
        print_progress_bar(i, n_batches)

test_preds  = np.array(test_preds)
test_labels = np.array(test_labels)

test_accuracy = (test_preds == test_labels).mean()
print(f'\nTest Accuracy: {test_accuracy*100:.2f}%\n')
print(classification_report(
    test_labels, test_preds,
    target_names=[n.replace('_', ' ').title() for n in CLASS_NAMES],
    digits=3
))

In [None]:
# ── Generate submission CSV in Kaggle format ──────────────────────────────────
IDX_TO_CLASS = {i: name for name, i in CLASS_TO_IDX.items()}
pred_labels  = [IDX_TO_CLASS[p] for p in test_preds]
image_ids    = [f'test_{i:05d}.jpg' for i in range(len(test_preds))]

submission = pd.DataFrame({'id': image_ids, 'label': pred_labels})
submission.to_csv('submission.csv', index=False)

print('submission.csv saved.')
print(f'Shape: {submission.shape}')
print('\nPrediction distribution:')
print(submission['label'].value_counts().to_string())
print('\nFirst 5 rows:')
print(submission.head(5).to_string(index=False))

## 10. Test-Time Augmentation (TTA)

Averages softmax probabilities across three augmented views of each test image.  
Typically yields a 0.5–2% accuracy improvement over single-pass inference.

In [None]:
TTA_TRANSFORMS = [
    eval_transform,   # original
    transforms.Compose([
        transforms.Resize((IMG_SIZE, IMG_SIZE)),
        transforms.RandomHorizontalFlip(p=1.0),
        transforms.ToTensor(),
        transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
    ]),
    transforms.Compose([
        transforms.Resize((IMG_SIZE, IMG_SIZE)),
        transforms.RandomVerticalFlip(p=1.0),
        transforms.ToTensor(),
        transforms.Normalize(IMAGENET_MEAN, IMAGENET_STD),
    ]),
]

model.eval()
tta_probs = None

for t_idx, tta_tf in enumerate(TTA_TRANSFORMS):
    tta_ds     = PatternNetDataset(hf_dataset, test_indices, PN_TO_COMP, OTHERS_LABEL, tta_tf)
    tta_loader = DataLoader(tta_ds, batch_size=BATCH_SIZE,
                            shuffle=False, num_workers=NUM_WORKERS)
    probs_list = []

    print(f'TTA pass {t_idx + 1}/{len(TTA_TRANSFORMS)} ...')
    with torch.no_grad():
        for i, (images, _) in enumerate(tta_loader, 1):
            logits = model(images.to(DEVICE))
            probs_list.extend(torch.softmax(logits, dim=1).cpu().numpy())
            print_progress_bar(i, len(tta_loader))

    tta_probs = np.array(probs_list) if tta_probs is None else tta_probs + np.array(probs_list)

tta_preds       = tta_probs.argmax(axis=1)
tta_pred_labels = [IDX_TO_CLASS[p] for p in tta_preds]
tta_accuracy    = (tta_preds == test_labels).mean()

tta_submission = pd.DataFrame({'id': image_ids, 'label': tta_pred_labels})
tta_submission.to_csv('submission_tta.csv', index=False)

print(f'\nTTA Test Accuracy : {tta_accuracy*100:.2f}%')
print(f'Standard Accuracy : {test_accuracy*100:.2f}%')
print(f'Improvement       : {(tta_accuracy - test_accuracy)*100:+.2f}%')
print('\nsubmission_tta.csv saved.')

## 11. Summary

| Item | Value |
|------|-------|
| Data source | PatternNet via HuggingFace (`blanchon/PatternNet`) |
| Architecture | ResNet-50 (ImageNet pre-trained, IMAGENET1K_V2) |
| Classes | 7 (basketball court, beach, forest, railway, swimming pool, tennis court, others) |
| Total images used | ~5,600 (6 × 800 target + 800 others) |
| Split | 70% train / 15% val / 15% test (stratified) |
| Input size | 224 × 224 |
| Optimiser | AdamW (lr=3×10⁻⁴, wd=10⁻⁴) |
| Scheduler | Cosine annealing with warm restarts (T₀=10) |
| Loss | Cross-entropy + label smoothing (0.1) |
| Augmentation | Random crop, H/V flip, rotation, colour jitter, random erasing |
| TTA | 3 passes (original, H-flip, V-flip) |

**Output files:**
- `submission.csv` — standard single-pass inference on the test split
- `submission_tta.csv` — test-time augmented predictions (recommended)

**Possible further improvements:**
- MixUp / CutMix augmentation during training
- EfficientNet-B4 or ViT-B/16 for higher capacity
- Weighted loss if class imbalance is significant
- Ensemble of multiple ResNet and EfficientNet variants