# Neuro-Fuzzy Computing - Project - Fall 2025
## Galaxy Zoo — Training

In this notebook, we train and evaluate on the **training** portion of the Galaxy Zoo dataset.

Dataset location (Drive): `MyDrive/galaxy-zoo-the-galaxy-challenge/`: includes the files `images_training_rev1.zip` and `training_solutions_rev1.zip`.

#### Inspecting the initial dataset location

### Dataset inspection and preprocessing

In the following cells we perform the essential preprocessing steps

1. **Define data paths and parameters**
   - Point to the processed training images folder: `images_training_rev1/`
   - Point to the label file: `training_solutions_rev1.csv`

In [1]:
from pathlib import Path
import pandas as pd

DATA_ROOT = Path.cwd()

img_dir = DATA_ROOT / "images_training_rev1"
csv_path = DATA_ROOT / "training_solutions_rev1.csv"

print("img_dir exists:", img_dir.is_dir())
print("csv_path exists:", csv_path.is_file())

img_dir exists: True
csv_path exists: True


2. **Check label to image consistency**
   - Confirm the number of label rows matches the number of processed images
   - If there is a mismatch, we report example GalaxyIDs whose image files are missing

In [2]:
# If this cell takes minutes to run, something went wrong with Colab finding the images, most likely due to their size. If that happens, restart session
solutions_df = pd.read_csv(csv_path)

# IDs of Galaxies are the labels of the column "GalaxyID"
ids = solutions_df["GalaxyID"].astype(str).tolist()

# These labels must match the names of the files inside the folder "images_training_424"
train_image_names = sorted([p.name for p in img_dir.glob("*.jpg")])

if len(ids) != len(train_image_names):
    missing = []
    name_set = set(train_image_names)
    for gid in ids[:50]:
        if f"{gid}.jpg" not in name_set:
            missing.append(gid)
    raise ValueError(f"Label/image count mismatch: labels={len(ids)} images={len(train_image_names)}. Example missing IDs: {missing[:10]}")

3. **Prepare inputs for a TensorFlow dataset**

In this cell, we:
   - Create `paths` (image filepaths) and `labels` (soft targets) from `training_solutions_rev1.csv`
   - Define `load_image(path, y)`, which will be used later with `tf.data.Dataset.map(...)` to load/parse images **on demand**

In [3]:
import numpy as np
from PIL import Image
import torch
from torchvision import transforms

target_cols = [c for c in solutions_df.columns if c != "GalaxyID"]

paths = (solutions_df["GalaxyID"].astype(int).astype(str) + ".jpg") \
    .apply(lambda fn: str(img_dir / fn)).to_numpy()

labels = solutions_df[target_cols].to_numpy(dtype=np.float32)

img_transform = transforms.Compose([
    transforms.Resize((424, 424), interpolation=transforms.InterpolationMode.LANCZOS),
    transforms.ToTensor(),
])

def load_image(path: str, y):
    img = Image.open(path).convert("RGB")
    img = img_transform(img)  # float32, [0,1], shape [3,424,424]
    y = torch.as_tensor(y, dtype=torch.float32)
    return img, y

### Train/Val/Test split (80/10/10)

We create our own 80/10/10 train/val/test split from the edited training set.

We start from a dataset of `(filepath, target_vector)` pairs and we shuffle **once** with a fixed seed (`seed=42`, `reshuffle_each_iteration=False`) to get a reproducible, **fixed** random ordering

Split by slicing the shuffled dataset:
  - **Train:** first 80% of samples
  - **Validation:** next 10%
  - **Test:** final 10%

Lastly, we apply `load_image` **after** the split so each subset loads/decodes images lazily and independently

In [4]:
n_total = len(paths)
n_train = int(0.8 * n_total)
n_val   = int(0.1 * n_total)
n_test  = n_total - n_train - n_val

print("Dataset size:", n_total)
print("Train:", n_train)
print("Val:", n_val)
print("Test:", n_test)

# Shuffle ONCE (fixed order for reproducibility)
g = torch.Generator().manual_seed(42)
perm = torch.randperm(n_total, generator=g).numpy()

paths_shuf  = paths[perm]
labels_shuf = labels[perm]

# Split (no images loaded yet)
train_paths, train_labels = paths_shuf[:n_train], labels_shuf[:n_train]
val_paths,   val_labels   = paths_shuf[n_train:n_train + n_val], labels_shuf[n_train:n_train + n_val]
test_paths,  test_labels  = paths_shuf[n_train + n_val:], labels_shuf[n_train + n_val:]

Dataset size: 61578
Train: 49262
Val: 6157
Test: 6159


### Image loading pipeline (lazy + batched)

To build the dataset, we:
   - Convert the split datasets from `(filepath, target_vector)` into `(image_tensor, target_tensor)` using `map(load_image)`
     - Images are read/decoded **on demand** with `tf.io.read_file` + `tf.io.decode_jpeg`
     - Converted to `float32` in **[0, 1]** (and resized to 424×424 as a safety step)
   - Optimize input throughput:
     - **Train:** shuffle (per epoch) → batch → prefetch
     - **Val/Test:** batch → prefetch

Each dataset element is a **batch** `(image_tensor, target_tensor)` where:
  - `image_tensor` has shape **(batch_size, 424, 424, 3)** (channels-last) in **[0, 1]**
  - `target_tensor` has shape **(batch_size, 37)**


In [7]:
import os
from torch.utils.data import Dataset, DataLoader

BATCH_SIZE = 16

class PathLabelDataset(Dataset):
    def __init__(self, paths, labels):
        self.paths = paths
        self.labels = labels

    def __len__(self):
        return len(self.paths)

    def __getitem__(self, idx):
        return load_image(self.paths[idx], self.labels[idx])

train_dataset = PathLabelDataset(train_paths, train_labels)
val_dataset   = PathLabelDataset(val_paths, val_labels)
test_dataset  = PathLabelDataset(test_paths, test_labels)

num_workers = max(0, (os.cpu_count() or 0) // 4 - 1)

train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    drop_last=True,
    num_workers=num_workers,
    pin_memory=True,
    persistent_workers=(num_workers > 0),
    prefetch_factor=2 if num_workers > 0 else None,
)

val_loader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    drop_last=True,
    num_workers=num_workers,
    pin_memory=True,
    persistent_workers=(num_workers > 0),
    prefetch_factor=2 if num_workers > 0 else None,
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    drop_last=True,
    num_workers=num_workers,
    pin_memory=True,
    persistent_workers=(num_workers > 0),
    prefetch_factor=2 if num_workers > 0 else None,
)

# Estimate total steps (batches) for full training run
steps_per_epoch = len(train_loader)
total_steps = int(steps_per_epoch * 30)  # 30 = max epochs

xb, yb = next(iter(train_loader))
print("train batch x:", xb.shape, xb.dtype, "y:", yb.shape, yb.dtype)

train batch x: torch.Size([16, 3, 424, 424]) torch.float32 y: torch.Size([16, 37]) torch.float32


### Building our CNN model

In [16]:
import torch.nn as nn

model = nn.Sequential(
    # Block 1 (3 -> 32) with 2 convs
    nn.Conv2d(3, 32, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(32),
    nn.ReLU(inplace=True),
    nn.Conv2d(32, 32, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(32),
    nn.ReLU(inplace=True),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout2d(p=0.05),

    # Block 2 (32 -> 64) with 2 convs
    nn.Conv2d(32, 64, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(64),
    nn.ReLU(inplace=True),
    nn.Conv2d(64, 64, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(64),
    nn.ReLU(inplace=True),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout2d(p=0.10),

    # Block 3 (64 -> 128) with 2 convs
    nn.Conv2d(64, 128, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(128),
    nn.ReLU(inplace=True),
    nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(128),
    nn.ReLU(inplace=True),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout2d(p=0.15),

    # Block 4 (128 -> 256) with 2 convs
    nn.Conv2d(128, 256, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(256),
    nn.ReLU(inplace=True),
    nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=True),
    nn.BatchNorm2d(256),
    nn.ReLU(inplace=True),
    nn.MaxPool2d(kernel_size=2),
    nn.Dropout2d(p=0.20),

    # Global average pooling + 37-dim output
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(1),
    nn.Linear(256, 37, bias=True),
)

print(model)

Sequential(
  (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
  (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (5): ReLU(inplace=True)
  (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (7): Dropout2d(p=0.05, inplace=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (10): ReLU(inplace=True)
  (11): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (13): ReLU(inplace=True)
  (14): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (15): Dropout2d(p=0.1, inplace=Fal

### Optimizer, loss function and model compilation

In [17]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

criterion = nn.MSELoss()

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)

def cosine_decay(step: int, total_steps: int, initial_lr: float = 1e-3, alpha: float = 1e-2) -> float:
    step = min(step, total_steps)
    cosine = 0.5 * (1.0 + math.cos(math.pi * step / total_steps))
    return initial_lr * (alpha + (1.0 - alpha) * cosine)

def set_lr(optimizer: torch.optim.Optimizer, lr: float) -> None:
    for pg in optimizer.param_groups:
        pg["lr"] = lr

@torch.no_grad()
def rmse(pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    return torch.sqrt(torch.mean((pred - target) ** 2))

scaler = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

### Defining the training loop

In [None]:
import time
import copy

def get_current_lr(optimizer: torch.optim.Optimizer) -> float:
    return float(optimizer.param_groups[0]["lr"])

def train_loop(
    model,
    train_loader,
    val_loader,
    epochs: int = 30,
    patience: int = 3,
    min_delta: float = 1e-3,
    total_steps: int | None = None,
    initial_lr: float = 1e-3,
    alpha: float = 1e-2,
    device: torch.device | None = None,
    criterion=None,
    optimizer=None,
    scaler=None,
):
    if device is None:
        device = next(model.parameters()).device
    if criterion is None:
        raise ValueError("criterion must be provided (e.g., nn.MSELoss()).")
    if optimizer is None:
        raise ValueError("optimizer must be provided (e.g., torch.optim.AdamW(...)).")
    if scaler is None:
        scaler = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

    if total_steps is None:
        total_steps = len(train_loader) * epochs

    def cosine_decay(step: int) -> float:
        s = min(step, total_steps)
        cosine = 0.5 * (1.0 + math.cos(math.pi * s / total_steps))
        return initial_lr * (alpha + (1.0 - alpha) * cosine)

    def set_lr(lr: float) -> None:
        for pg in optimizer.param_groups:
            pg["lr"] = lr

    def rmse_sum_sse(pred: torch.Tensor, target: torch.Tensor) -> tuple[float, int]:
        # returns (sum_squared_error, num_elements)
        diff = pred - target
        sse = float(torch.sum(diff * diff).detach().cpu().item())
        n = target.numel()
        return sse, n

    best_val = float("inf")
    patience_ctr = 0
    best_state = None
    global_step = 0
    last_epoch = 0

    for epoch in range(1, epochs + 1):
        t0 = time.time()
        last_epoch = epoch

        # Training
        model.train()
        train_sse = 0.0
        train_n = 0

        for xb, yb in train_loader:
            xb = xb.to(device, non_blocking=True)
            yb = yb.to(device, non_blocking=True)

            set_lr(cosine_decay(global_step))

            optimizer.zero_grad(set_to_none=True)

            with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
                preds = model(xb)
                loss = criterion(preds, yb)

            scaler.scale(loss).backward()
            scaler.step(optimizer)
            scaler.update()

            sse, n = rmse_sum_sse(preds, yb)
            train_sse += sse
            train_n += n

            global_step += 1

        train_rmse_val = math.sqrt(train_sse / max(1, train_n))

        # Evaluation
        model.eval()
        val_sse = 0.0
        val_n = 0

        with torch.no_grad():
            for xb, yb in val_loader:
                xb = xb.to(device, non_blocking=True)
                yb = yb.to(device, non_blocking=True)

                with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
                    preds = model(xb)

                sse, n = rmse_sum_sse(preds, yb)
                val_sse += sse
                val_n += n

        val_rmse_val = math.sqrt(val_sse / max(1, val_n))

        lr_val = get_current_lr(optimizer)
        dt = time.time() - t0

        # Early stopping
        improved = (best_val - val_rmse_val) > min_delta
        if improved:
            best_val = val_rmse_val
            patience_ctr = 0
            best_state = copy.deepcopy(model.state_dict())
        else:
            patience_ctr += 1

        print(
            f"Epoch {epoch:02d}/{epochs} | "
            f"lr={lr_val:.6g} | "
            f"train_RMSE={train_rmse_val:.6f} | "
            f"eval_RMSE={val_rmse_val:.6f} | "
            f"patience={patience_ctr}/{patience} | "
            f"time={dt:.2f}s"
        )

        if patience_ctr >= patience:
            break

    if best_state is not None:
        model.load_state_dict(best_state)

    return last_epoch

### Training our model

In [20]:
epochs_ran = train_loop(
    model,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-3,
    alpha=1e-2,
    device=device,
    criterion=criterion,
    optimizer=optimizer,
    scaler=scaler,
)

Epoch 01/30 | lr=0.00099729 | train_RMSE=0.160046 | eval_RMSE=0.154146 | patience=0/3 | time=376.53s
Epoch 02/30 | lr=0.000989187 | train_RMSE=0.152481 | eval_RMSE=0.148057 | patience=0/3 | time=377.16s
Epoch 03/30 | lr=0.000975778 | train_RMSE=0.146231 | eval_RMSE=0.139906 | patience=0/3 | time=376.41s
Epoch 04/30 | lr=0.000957212 | train_RMSE=0.137463 | eval_RMSE=0.128491 | patience=0/3 | time=376.21s
Epoch 05/30 | lr=0.000933691 | train_RMSE=0.129679 | eval_RMSE=0.124759 | patience=0/3 | time=376.21s
Epoch 06/30 | lr=0.000905473 | train_RMSE=0.124608 | eval_RMSE=0.121008 | patience=0/3 | time=376.35s
Epoch 07/30 | lr=0.000872868 | train_RMSE=0.120480 | eval_RMSE=0.115805 | patience=0/3 | time=376.42s
Epoch 08/30 | lr=0.000836232 | train_RMSE=0.116749 | eval_RMSE=0.111439 | patience=0/3 | time=376.52s
Epoch 09/30 | lr=0.000795967 | train_RMSE=0.113611 | eval_RMSE=0.109716 | patience=0/3 | time=376.53s
Epoch 10/30 | lr=0.000752515 | train_RMSE=0.110993 | eval_RMSE=0.107231 | patience=

### Evaluate on held-out test split

In [None]:
model.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).cpu().item())
        test_n += yb.numel()

test_rmse = math.sqrt(test_sse / max(1, test_n))
print("Test RMSE:", test_rmse)

Test RMSE: 0.09789868925789266


In [22]:
# Cell for cleanup
import gc
gc.collect()

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

## Comparison with other models

#### DenseNet

##### Model build

In [23]:
from torchvision import models

weights = models.DenseNet121_Weights.IMAGENET1K_V1
base = models.densenet121(weights=weights)

# Use as a feature extractor (remove classifier)
base.classifier = nn.Identity()

model2 = nn.Sequential(
    weights.transforms(),      # handles resize/crop if needed + ImageNet normalization
    base,                      # outputs [B, 1024]
    nn.Linear(1024, 37, bias=True),
)

model2 = model2.to(device)

print(model2)

Downloading: "https://download.pytorch.org/models/densenet121-a639ec97.pth" to /home/user/.cache/torch/hub/checkpoints/densenet121-a639ec97.pth


100.0%


Sequential(
  (0): ImageClassification(
      crop_size=[224]
      resize_size=[256]
      mean=[0.485, 0.456, 0.406]
      std=[0.229, 0.224, 0.225]
      interpolation=InterpolationMode.BILINEAR
  )
  (1): DenseNet(
    (features): Sequential(
      (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu0): ReLU(inplace=True)
      (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (denseblock1): _DenseBlock(
        (denselayer1): _DenseLayer(
          (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu1): ReLU(inplace=True)
          (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu2): ReLU(inplace=True)
          (conv

##### Compilation

In [24]:
model2 = model2.to(device)

criterion2 = nn.MSELoss()

optimizer2 = torch.optim.AdamW(model2.parameters(), lr=1e-3, weight_decay=1e-4)

def cosine_decay2(step: int, total_steps: int, initial_lr: float = 1e-3, alpha: float = 1e-2) -> float:
    step = min(step, total_steps)
    cosine = 0.5 * (1.0 + math.cos(math.pi * step / total_steps))
    return initial_lr * (alpha + (1.0 - alpha) * cosine)

def set_lr2(optimizer: torch.optim.Optimizer, lr: float) -> None:
    for pg in optimizer.param_groups:
        pg["lr"] = lr

@torch.no_grad()
def rmse2(pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    return torch.sqrt(torch.mean((pred - target) ** 2))

scaler2 = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

##### Training

In [25]:
epochs_ran2 = train_loop(
    model2,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-3,
    alpha=1e-2,
    device=device,
    criterion=criterion2,
    optimizer=optimizer2,
    scaler=scaler2,
)

Epoch 01/30 | lr=0.00099729 | train_RMSE=0.123484 | eval_RMSE=0.104466 | patience=0/3 | time=163.39s
Epoch 02/30 | lr=0.000989187 | train_RMSE=0.104444 | eval_RMSE=0.100183 | patience=0/3 | time=162.76s
Epoch 03/30 | lr=0.000975778 | train_RMSE=0.098061 | eval_RMSE=0.098982 | patience=0/3 | time=162.53s
Epoch 04/30 | lr=0.000957212 | train_RMSE=0.092934 | eval_RMSE=0.092127 | patience=0/3 | time=163.21s
Epoch 05/30 | lr=0.000933691 | train_RMSE=0.088679 | eval_RMSE=0.091321 | patience=1/3 | time=162.22s
Epoch 06/30 | lr=0.000905473 | train_RMSE=0.085375 | eval_RMSE=0.087067 | patience=0/3 | time=162.51s
Epoch 07/30 | lr=0.000872868 | train_RMSE=0.082520 | eval_RMSE=0.086000 | patience=0/3 | time=162.94s
Epoch 08/30 | lr=0.000836232 | train_RMSE=0.080091 | eval_RMSE=0.084653 | patience=0/3 | time=162.55s
Epoch 09/30 | lr=0.000795967 | train_RMSE=0.077301 | eval_RMSE=0.090185 | patience=1/3 | time=162.86s
Epoch 10/30 | lr=0.000752515 | train_RMSE=0.074780 | eval_RMSE=0.086159 | patience=

##### Testing

In [None]:
import math

model2.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model2(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).detach().cpu().item())
        test_n += yb.numel()

test_rmse2 = math.sqrt(test_sse / max(1, test_n))
print("DenseNet Test RMSE:", test_rmse2)
print("DenseNet epochs:", epochs_ran2)

DenseNet Test RMSE: 0.0847407106771224
DenseNet epochs: 11


In [27]:
# Cell for cleanup
import gc
gc.collect()

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

#### ResNet
##### Model build

In [29]:
from torchvision.transforms import Normalize

weights = models.ResNet50_Weights.IMAGENET1K_V2
base = models.resnet50(weights=weights)

# Remove classification head so it outputs features
base.fc = nn.Identity()

imagenet_normalize = Normalize(
    mean=(0.485, 0.456, 0.406),
    std=(0.229, 0.224, 0.225),
)

model3 = nn.Sequential(
    imagenet_normalize,      # expects float tensors in [0,1], shape [B,3,H,W]
    base,                    # outputs [B, 2048]
    nn.Linear(2048, 37, bias=True),
)

model3 = model3.to(device)

print(model3)

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /home/user/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth


100.0%


Sequential(
  (0): Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
  (1): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (r

##### Compilation

In [30]:
criterion3 = nn.MSELoss()

optimizer3 = torch.optim.AdamW(model3.parameters(), lr=1e-3, weight_decay=1e-4)

def cosine_decay3(step: int, total_steps: int, initial_lr: float = 1e-3, alpha: float = 1e-2) -> float:
    step = min(step, total_steps)
    cosine = 0.5 * (1.0 + math.cos(math.pi * step / total_steps))
    return initial_lr * (alpha + (1.0 - alpha) * cosine)

def set_lr3(optimizer: torch.optim.Optimizer, lr: float) -> None:
    for pg in optimizer.param_groups:
        pg["lr"] = lr

@torch.no_grad()
def rmse3(pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    return torch.sqrt(torch.mean((pred - target) ** 2))

scaler3 = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

##### Training

In [31]:
epochs_ran3 = train_loop(
    model3,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-3,
    alpha=1e-2,
    device=device,
    criterion=criterion3,
    optimizer=optimizer3,
    scaler=scaler3,
)

Epoch 01/30 | lr=0.00099729 | train_RMSE=0.112651 | eval_RMSE=0.101932 | patience=0/3 | time=507.59s
Epoch 02/30 | lr=0.000989187 | train_RMSE=0.096564 | eval_RMSE=0.094963 | patience=0/3 | time=509.24s
Epoch 03/30 | lr=0.000975778 | train_RMSE=0.091001 | eval_RMSE=0.091882 | patience=0/3 | time=509.09s
Epoch 04/30 | lr=0.000957212 | train_RMSE=0.086605 | eval_RMSE=0.089332 | patience=0/3 | time=508.55s
Epoch 05/30 | lr=0.000933691 | train_RMSE=0.083199 | eval_RMSE=0.089521 | patience=1/3 | time=508.98s
Epoch 06/30 | lr=0.000905473 | train_RMSE=0.079440 | eval_RMSE=0.088686 | patience=2/3 | time=509.07s
Epoch 07/30 | lr=0.000872868 | train_RMSE=0.075826 | eval_RMSE=0.086161 | patience=0/3 | time=508.28s
Epoch 08/30 | lr=0.000836232 | train_RMSE=0.071648 | eval_RMSE=0.086345 | patience=1/3 | time=508.49s
Epoch 09/30 | lr=0.000795967 | train_RMSE=0.066771 | eval_RMSE=0.086189 | patience=2/3 | time=509.23s
Epoch 10/30 | lr=0.000752515 | train_RMSE=0.061460 | eval_RMSE=0.087637 | patience=

##### Testing

In [32]:
model3.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model3(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).detach().cpu().item())
        test_n += yb.numel()

test_rmse3 = math.sqrt(test_sse / max(1, test_n))
print("ResNet50 Test RMSE:", test_rmse3)
print("ResNet50 epochs:", epochs_ran3)

ResNet50 Test RMSE: 0.08638898185622608
ResNet50 epochs: 10


In [33]:
# Cell for cleanup
import gc
gc.collect()

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

#### MobileNetV2 (pretrained transfer learning)
For this model, we opted for a frozen backbone, to check how well a pretrained transfer learning model would perform. We use pretrained weights, keep them fixed and only train our new head.
##### Model build

In [44]:
weights = models.MobileNet_V2_Weights.IMAGENET1K_V1
base = models.mobilenet_v2(weights=weights)

# Freeze backbone
for p in base.parameters():
    p.requires_grad = False

# Remove classifier so backbone returns features [B, 1280, H', W']
base.classifier = nn.Identity()

# MobileNetV2 expects inputs normalized to [-1, 1] (for tensors in [0,1])
to_mobilenetv2_range = Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))

model4 = nn.Sequential(
    to_mobilenetv2_range,
    base,
    nn.Dropout(p=0.2),
    nn.Linear(1280, 37, bias=True),
)

model4 = model4.to(device)

print(model4)

Sequential(
  (0): Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
  (1): MobileNetV2(
    (features): Sequential(
      (0): Conv2dNormActivation(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU6(inplace=True)
      )
      (1): InvertedResidual(
        (conv): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): ReLU6(inplace=True)
          )
          (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (2): InvertedResidual(
        (conv): Sequential(
          (0): Conv2dNormActiv

##### Compilation

In [45]:
criterion4 = nn.MSELoss()

optimizer4 = torch.optim.AdamW(
    (p for p in model4.parameters() if p.requires_grad),
    lr=1e-3,
    weight_decay=1e-4,
)

def cosine_decay4(step: int, total_steps: int, initial_lr: float = 1e-3, alpha: float = 1e-2) -> float:
    step = min(step, total_steps)
    cosine = 0.5 * (1.0 + math.cos(math.pi * step / total_steps))
    return initial_lr * (alpha + (1.0 - alpha) * cosine)

def set_lr4(optimizer: torch.optim.Optimizer, lr: float) -> None:
    for pg in optimizer.param_groups:
        pg["lr"] = lr

@torch.no_grad()
def rmse4(pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    return torch.sqrt(torch.mean((pred - target) ** 2))

scaler4 = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

##### Training

In [46]:
epochs_ran4 = train_loop(
    model4,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-3,
    alpha=1e-2,
    device=device,
    criterion=criterion4,
    optimizer=optimizer4,
    scaler=scaler4,
)

Epoch 01/30 | lr=0.00099729 | train_RMSE=0.191170 | eval_RMSE=0.177336 | patience=0/3 | time=80.44s
Epoch 02/30 | lr=0.000989187 | train_RMSE=0.179655 | eval_RMSE=0.175076 | patience=0/3 | time=80.67s
Epoch 03/30 | lr=0.000975778 | train_RMSE=0.180192 | eval_RMSE=0.164982 | patience=0/3 | time=81.03s
Epoch 04/30 | lr=0.000957212 | train_RMSE=0.179274 | eval_RMSE=0.173387 | patience=1/3 | time=81.32s
Epoch 05/30 | lr=0.000933691 | train_RMSE=0.178611 | eval_RMSE=0.174649 | patience=2/3 | time=80.38s
Epoch 06/30 | lr=0.000905473 | train_RMSE=0.177965 | eval_RMSE=0.163579 | patience=0/3 | time=80.63s
Epoch 07/30 | lr=0.000872868 | train_RMSE=0.177081 | eval_RMSE=0.164883 | patience=1/3 | time=80.64s
Epoch 08/30 | lr=0.000836232 | train_RMSE=0.176379 | eval_RMSE=0.164790 | patience=2/3 | time=80.53s
Epoch 09/30 | lr=0.000795967 | train_RMSE=0.175059 | eval_RMSE=0.162376 | patience=0/3 | time=80.57s
Epoch 10/30 | lr=0.000752515 | train_RMSE=0.174052 | eval_RMSE=0.159796 | patience=0/3 | tim

##### Testing

In [47]:
model4.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model4(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).detach().cpu().item())
        test_n += yb.numel()

test_rmse4 = math.sqrt(test_sse / max(1, test_n))
print("MobileNetV2 TL Test RMSE:", test_rmse4)
print("MobileNetV2 TL epochs:", epochs_ran4)

MobileNetV2 TL Test RMSE: 0.15793819170595588
MobileNetV2 TL epochs: 16


In [48]:
# Cell for cleanup
import gc
gc.collect()

torch.cuda.empty_cache()
torch.cuda.ipc_collect()

#### VGG16 (pretrained)
##### Model build

In [49]:
weights = models.VGG16_Weights.IMAGENET1K_V1
base = models.vgg16(weights=weights)

# Trainable backbone
for p in base.parameters():
    p.requires_grad = True

# Use feature extractor part only (convs)
features = base.features

# ImageNet normalization for tensors in [0,1]
imagenet_normalize = Normalize(
    mean=(0.485, 0.456, 0.406),
    std=(0.229, 0.224, 0.225),
)

model5 = nn.Sequential(
    imagenet_normalize,
    features,                       # [B, 512, H', W']
    nn.AdaptiveAvgPool2d((1, 1)),   # [B, 512, 1, 1]
    nn.Flatten(1),                  # [B, 512]
    nn.Dropout(p=0.3),
    nn.Linear(512, 256, bias=True),
    nn.ReLU(inplace=True),
    nn.Dropout(p=0.3),
    nn.Linear(256, 37, bias=True),
)

model5 = model5.to(device)

print(model5)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/user/.cache/torch/hub/checkpoints/vgg16-397923af.pth


100.0%


Sequential(
  (0): Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
  (1): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=

##### Compilation

In [50]:
criterion5 = nn.MSELoss()

optimizer5 = torch.optim.AdamW(model5.parameters(), lr=1e-3, weight_decay=1e-4)

def cosine_decay5(step: int, total_steps: int, initial_lr: float = 1e-3, alpha: float = 1e-2) -> float:
    step = min(step, total_steps)
    cosine = 0.5 * (1.0 + math.cos(math.pi * step / total_steps))
    return initial_lr * (alpha + (1.0 - alpha) * cosine)

def set_lr5(optimizer: torch.optim.Optimizer, lr: float) -> None:
    for pg in optimizer.param_groups:
        pg["lr"] = lr

@torch.no_grad()
def rmse5(pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
    return torch.sqrt(torch.mean((pred - target) ** 2))

scaler5 = torch.amp.GradScaler("cuda", enabled=(device.type == "cuda"))

##### Training

In [51]:
epochs_ran5 = train_loop(
    model5,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-3,
    alpha=1e-2,
    device=device,
    criterion=criterion5,
    optimizer=optimizer5,
    scaler=scaler5,
)

Epoch 01/30 | lr=0.00099729 | train_RMSE=0.166857 | eval_RMSE=0.164871 | patience=0/3 | time=1003.41s
Epoch 02/30 | lr=0.000989187 | train_RMSE=0.164141 | eval_RMSE=0.164962 | patience=1/3 | time=1002.38s
Epoch 03/30 | lr=0.000975778 | train_RMSE=0.164006 | eval_RMSE=0.164722 | patience=2/3 | time=1004.14s
Epoch 04/30 | lr=0.000957212 | train_RMSE=nan | eval_RMSE=nan | patience=3/3 | time=994.55s


##### Testing

In [52]:
model5.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model5(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).detach().cpu().item())
        test_n += yb.numel()

test_rmse5 = math.sqrt(test_sse / max(1, test_n))
print("VGG16 Test RMSE:", test_rmse5)
print("VGG16 epochs:", epochs_ran5)

VGG16 Test RMSE: 0.16424842647203233
VGG16 epochs: 4


In [53]:
scaler5 = torch.amp.GradScaler("cuda", enabled=False)

optimizer5 = torch.optim.AdamW(model5.parameters(), lr=1e-4, weight_decay=1e-4)

In [54]:
epochs_ran5 = train_loop(
    model5,
    train_loader,
    val_loader,
    epochs=30,
    patience=3,
    min_delta=1e-3,
    total_steps=total_steps,
    initial_lr=1e-4,   # lowered
    alpha=1e-2,
    device=device,
    criterion=criterion5,
    optimizer=optimizer5,
    scaler=scaler5,    # AMP off
)

Epoch 01/30 | lr=9.9729e-05 | train_RMSE=0.164035 | eval_RMSE=0.164709 | patience=0/3 | time=997.93s
Epoch 02/30 | lr=9.89187e-05 | train_RMSE=0.164010 | eval_RMSE=0.164712 | patience=1/3 | time=1001.59s
Epoch 03/30 | lr=9.75778e-05 | train_RMSE=0.163960 | eval_RMSE=0.164711 | patience=2/3 | time=1001.09s
Epoch 04/30 | lr=9.57212e-05 | train_RMSE=0.163946 | eval_RMSE=0.164703 | patience=3/3 | time=1001.57s


In [55]:
model5.eval()

test_sse = 0.0
test_n = 0

with torch.no_grad():
    for xb, yb in test_loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        with torch.autocast(device_type=device.type, enabled=(device.type == "cuda")):
            preds = model5(xb)

        diff = preds - yb
        test_sse += float(torch.sum(diff * diff).detach().cpu().item())
        test_n += yb.numel()

test_rmse5 = math.sqrt(test_sse / max(1, test_n))
print("VGG16 Test RMSE:", test_rmse5)
print("VGG16 epochs:", epochs_ran5)

VGG16 Test RMSE: 0.16409609247217463
VGG16 epochs: 4
