# ResNet-18 Baseline for Dyslexia-Oriented Character Classification

This notebook implements a **baseline experiment** to evaluate ResNet-18 on character-level crops derived from YOLO annotations of a synthetic dyslexia handwriting dataset. The goal is to **establish a reliable reference point** (accuracy, macro-F1, confusion patterns) before comparing alternative architectures, and then **select the most optimal model** for the project.

## Overview
- **Task:** 3-class classification at the **character** level  
  `0 = Normal`, `1 = Reversal`, `2 = Corrected`
- **Data pipeline:** Image-level YOLO labels → per-bbox crops → square pad → resize → ResNet-18 head (3 classes)
- **Safety constraints:** No flips/rotations (to preserve dyslexia-relevant geometry); light blur/erasing only
- **Training setup:** ImageNet-initialized ResNet-18, AdamW, Cosine LR, label smoothing, AMP (CUDA), macro-F1 model selection
- **Metrics:** Validation **Accuracy**, **macro-F1**, **Confusion Matrix**, per-class precision/recall

## Why this baseline?
- Provides a **fast, strong, and interpretable** starting point with a well-known backbone (ResNet-18).
- Uses **macro-F1** to make model selection robust under possible class imbalance.
- Confusion analysis highlights where the model under-performs (e.g., **Reversal → Normal**), informing targeted improvements.

## Current result (validation)
- macro-F1 ≈ **0.876**, Accuracy ≈ **0.873**
- Dominant error mode: **Reversal** misclassified as **Normal**, suggesting the need for slightly wider context (e.g., small bbox inflation) or modest appearance jitter (no geometric changes).

In [5]:
import torch, torchvision
print("cuda? ->", torch.cuda.is_available())
if torch.cuda.is_available():
    print(torch.cuda.get_device_name(0))


cuda? -> True
Tesla T4


Verifies whether PyTorch can see a CUDA-capable GPU and, if so, prints the device model (e.g., “Tesla T4”, “L4”, “A100”). This is a quick sanity check before starting any GPU-accelerated training.

In [4]:
import os, glob
from pathlib import Path
from typing import List, Tuple

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from torchvision import transforms, models
from PIL import Image, ImageOps

from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
import numpy as np
import random

# Reproducibility
def set_seed(seed=42):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = False
    torch.backends.cudnn.benchmark = True
set_seed(42)

# ---------- Config ----------
IMG_SIZE = 224
NUM_CLASSES = 3
MEAN = [0.485, 0.456, 0.406]
STD  = [0.229, 0.224, 0.225]
SAVE_DIR = os.path.join(ROOT, "runs_resnet18_yolocrops")
os.makedirs(SAVE_DIR, exist_ok=True)

def load_yolo_txt(txt_path: Path):
    """
    YOLO format: class cx cy w h  (normalized)
    """
    boxes = []
    if not txt_path.exists():
        return boxes
    with open(txt_path, "r", encoding="utf-8") as f:
        for line in f:
            parts = line.strip().split()
            if len(parts) != 5:
                continue
            c, cx, cy, w, h = parts
            boxes.append((int(c), float(cx), float(cy), float(w), float(h)))
    return boxes

def norm_to_xyxy(box, W, H):
    # normalized cx,cy,w,h -> pixel x1,y1,x2,y2
    c, cx, cy, w, h = box
    px = cx * W; py = cy * H
    pw = w * W; ph = h * H
    x1 = max(0, int(px - pw/2)); y1 = max(0, int(py - ph/2))
    x2 = min(W-1, int(px + pw/2)); y2 = min(H-1, int(py + ph/2))
    return c, x1, y1, x2, y2

def pad_to_square(img: Image.Image, fill=0):
    w, h = img.size
    if w == h:
        return img
    if w > h:
        pad = (0, (w-h)//2, 0, w-h-(w-h)//2)
    else:
        pad = ((h-w)//2, 0, h-w-(h-w)//2, 0)
    return ImageOps.expand(img, border=pad, fill=fill)


## What this cell does
- Imports core libraries (PyTorch/torchvision, PIL, scikit-learn metrics, NumPy) and seeds all RNGs for reproducibility.  
- Defines dataset-wide configuration (input size, number of classes, ImageNet mean/std, output directory).  
- Implements YOLO label parsing (`class cx cy w h` in normalized coordinates), conversion to pixel `x1,y1,x2,y2`, and a utility to pad rectangular crops to squares before resizing.

## Key Design Choices
- **CuDNN settings:** `benchmark=True` (faster convolutions for stable input shapes) and `deterministic=False` (accepts minor nondeterminism for higher throughput on Colab GPUs).  
- **ImageNet normalization:** `MEAN/STD` matches ResNet-18 pretraining statistics, improving convergence stability.  
- **Robust YOLO loader:** skips malformed lines and tolerates missing label files by returning an empty list—prevents crashes during dataset indexing.  
- **Square padding:** uses `ImageOps.expand` to preserve aspect ratio before resizing to `IMG_SIZE`, avoiding geometric distortions that could blur dyslexia-relevant cues.

## Notes & Pitfalls
- For **strict determinism**, set `torch.backends.cudnn.deterministic=True` and `torch.backends.cudnn.benchmark=False` (training will be slower).  
- Label IDs are assumed to be `{0,1,2}` → `{Normal, Reversal, Corrected}`; ensure your dataset follows this convention.  
- Non-RGB inputs are converted to RGB later; remove alpha channels if present.  
- If black padding biases the model (tiny crops with large borders), consider reflect/replicate padding, mean-color padding, or slightly **inflating** the bbox before cropping (introduced in later cells).


In [6]:
class YOLOCropDataset(Dataset):
    """
    images_dir: .../images/train
    labels_dir: .../labels/train
    Her görüntüdeki TUM bbox'ları ayrı örnek olarak döndürür.
    """
    def __init__(self, images_dir: str, labels_dir: str, img_size=224, split="train"):
        self.img_root = Path(images_dir)
        self.lbl_root = Path(labels_dir)
        self.items: List[Tuple[str, int, Tuple[int,int,int,int]]] = []
        self.img_size = img_size
        self.split = split

        img_paths = sorted(glob.glob(str(self.img_root / "*.*")))
        for ip in img_paths:
            ipath = Path(ip)
            stem = ipath.stem
            txt = self.lbl_root / f"{stem}.txt"
            boxes = load_yolo_txt(txt)
            if not boxes:
                continue
            # boyut almak için bir kez aç
            with Image.open(ip) as _im:
                _im = _im.convert("RGB")
                W, H = _im.size
            for b in boxes:
                c, x1, y1, x2, y2 = norm_to_xyxy(b, W, H)
                # kutu min boyut filtresi: çok küçükse atla (isteğe bağlı)
                if (x2 - x1) < 3 or (y2 - y1) < 3:
                    continue
                self.items.append((ip, c, (x1,y1,x2,y2)))

        # Augmentations — flip/rotate YOK
        if split == "train":
            self.tf = transforms.Compose([
                transforms.ToTensor(),
                transforms.RandomApply([transforms.GaussianBlur(kernel_size=3)], p=0.1),
                transforms.Normalize(mean=MEAN, std=STD),
                transforms.RandomErasing(p=0.05, scale=(0.02, 0.06), ratio=(0.3, 3.3), value='random'),
            ])
        else:
            self.tf = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize(mean=MEAN, std=STD),
            ])

    def __len__(self):
        return len(self.items)

    def __getitem__(self, idx):
        ipath, c, (x1,y1,x2,y2) = self.items[idx]
        with Image.open(ipath) as img:
            img = img.convert("RGB")
            crop = img.crop((x1,y1,x2,y2))
        crop = pad_to_square(crop, fill=0).resize((self.img_size,self.img_size), Image.BILINEAR)
        x = self.tf(crop)
        y = torch.tensor(c, dtype=torch.long)
        return x, y

def make_loaders(root: str, batch_size=128, num_workers=2):
    images_train = os.path.join(root, "images/train")
    labels_train = os.path.join(root, "labels/train")
    images_val   = os.path.join(root, "images/val")
    labels_val   = os.path.join(root, "labels/val")

    ds_tr = YOLOCropDataset(images_train, labels_train, img_size=IMG_SIZE, split="train")
    ds_va = YOLOCropDataset(images_val,   labels_val,   img_size=IMG_SIZE, split="val")

    # sınıf dengesizliği için weights (opsiyonel)
    labels = [y for _,y,_ in ds_tr.items]
    if len(labels) > 0:
        counts = np.bincount(labels, minlength=NUM_CLASSES).astype(np.float32)
        inv = (1.0 / np.maximum(counts, 1))
        weights = [inv[y] for y in labels]
        sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)
        dl_tr = DataLoader(ds_tr, batch_size=batch_size, sampler=sampler, num_workers=num_workers, pin_memory=True)
    else:
        dl_tr = DataLoader(ds_tr, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=True)

    dl_va = DataLoader(ds_va, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True)
    return ds_tr, ds_va, dl_tr, dl_va

ds_tr, ds_va, dl_tr, dl_va = make_loaders(ROOT, batch_size=128, num_workers=2)
len(ds_tr), len(ds_va)


(112906, 52126)

## What this cell does
- Defines `YOLOCropDataset`, a dataset that **reads YOLO-formatted labels** for each image, converts normalized boxes to pixel coordinates, and **returns per-character crops** with class IDs (`0=Normal, 1=Reversal, 2=Corrected`).
- Applies **safe augmentations** for the dyslexia setting (no flips/rotations), pads crops to squares, and resizes them to `IMG_SIZE`.
- Provides `make_loaders(...)` to build **train/val datasets and dataloaders**, including an optional **class-balanced sampler** to mitigate label imbalance.


## Key Design Choices
- **Per-bbox sampling:** Each bounding box becomes a training sample. This matches label granularity (letter-level) and avoids ambiguity from page/line-level labels.
- **No geometric flips/rotations:** Preserves dyslexia-relevant cues (e.g., reversals). Augmentations are limited to light blur and erasing to improve robustness without altering orientation.
- **Square padding + resize:** `pad_to_square → resize(IMG_SIZE)` keeps character geometry stable and avoids aspect-ratio distortion.
- **Tiny-box filter:** Discards extremely small boxes (`<3 px` in width/height) to reduce noisy crops that can harm learning.
- **Class rebalancing (optional):** `WeightedRandomSampler` uses inverse class frequency from `np.bincount` to reduce bias toward dominant classes in minibatches.

## Notes & Pitfalls
- **Label presence:** If an image has no valid boxes (missing/empty `.txt` or all tiny), it contributes no samples; this is intentional to prevent crashes during indexing.
- **Transforms:** Train split uses `GaussianBlur` and `RandomErasing` after normalization; val split is deterministic (no stochastic transforms).
- **I/O considerations:** Opening images on-the-fly may bottleneck training. For large datasets, consider pre-cropping to an `ImageFolder` structure under `/content/` for faster epochs.
- **DataLoader settings:** `num_workers=2` is conservative. On Colab GPUs, try `4–8`, set `pin_memory=True`, and (optionally) `persistent_workers=True, prefetch_factor=4` to increase throughput.
- **Memory format (later):** When moving batches/models to GPU, using `memory_format=torch.channels_last` can provide minor speedups with AMP.
- **Reproducibility:** Global seeding is handled elsewhere; note that stochastic ops (e.g., RandomErasing) introduce minor variability across runs.

## Outputs
- `ds_tr, ds_va` are `Dataset` objects with per-bbox samples.
- `dl_tr` may use a class-balanced sampler (if labels exist); `dl_va` is a standard sequential loader.
- `len(ds_tr), len(ds_va)` prints the **number of cropped samples** (not the number of images).



In [8]:
def build_resnet18(num_classes=3, pretrained=True):
    m = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1 if pretrained else None)
    m.fc = nn.Sequential(nn.Dropout(0.2), nn.Linear(m.fc.in_features, num_classes))
    return m

@torch.no_grad()
def evaluate(model, loader, device):
    model.eval()
    preds, gts = [], []
    for xb, yb in loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)
        with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
            logits = model(xb)
        pr = logits.argmax(1)
        preds.append(pr.cpu()); gts.append(yb.cpu())
    preds = torch.cat(preds).numpy()
    gts   = torch.cat(gts).numpy()
    acc = accuracy_score(gts, preds)
    f1  = f1_score(gts, preds, average="macro")
    cm  = confusion_matrix(gts, preds, labels=[0,1,2])
    rep = classification_report(gts, preds, labels=[0,1,2], target_names=["Normal","Reversal","Corrected"])
    return acc, f1, cm, rep

def train_resnet18(
    epochs=20, lr=3e-4, weight_decay=1e-4, batch_size=128, out_dir=SAVE_DIR
):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Device:", device)

    ds_tr, ds_va, dl_tr, dl_va = make_loaders(ROOT, batch_size=batch_size, num_workers=2)
    print(f"Train samples (crops): {len(ds_tr)} | Val: {len(ds_va)}")

    model = build_resnet18(NUM_CLASSES, pretrained=True).to(device)
    opt = torch.optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=epochs)
    crit = nn.CrossEntropyLoss(label_smoothing=0.05)
    scaler = torch.cuda.amp.GradScaler(enabled=(device.type=='cuda'))

    best_f1 = -1.0
    best_path = os.path.join(out_dir, "best.pt")

    for ep in range(1, epochs+1):
        model.train()
        losses = []
        for xb, yb in dl_tr:
            xb = xb.to(device, non_blocking=True); yb = yb.to(device, non_blocking=True)
            opt.zero_grad(set_to_none=True)
            with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
                logits = model(xb)
                loss = crit(logits, yb)
            scaler.scale(loss).backward()
            scaler.step(opt); scaler.update()
            losses.append(loss.item())
        sch.step()

        va_acc, va_f1, va_cm, _ = evaluate(model, dl_va, device)
        print(f"Epoch {ep:02d} | loss={np.mean(losses):.4f} | val_acc={va_acc:.4f} | val_f1={va_f1:.4f}")
        print("Val CM:\n", va_cm)

        if va_f1 > best_f1:
            best_f1 = va_f1
            torch.save({"model": model.state_dict()}, best_path)

    print("Best macro-F1:", best_f1)
    print("Saved:", best_path)
    return best_path

best_ckpt = train_resnet18(epochs=20, lr=3e-4, batch_size=128)
best_ckpt


Device: cuda
Train samples (crops): 112906 | Val: 52126


  scaler = torch.cuda.amp.GradScaler(enabled=(device.type=='cuda'))
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 01 | loss=0.2446 | val_acc=0.8568 | val_f1=0.8587
Val CM:
 [[16890   640    42]
 [ 5109 12302    61]
 [ 1532    83 15467]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 02 | loss=0.2051 | val_acc=0.8489 | val_f1=0.8500
Val CM:
 [[16617   632   323]
 [ 5492 11925    55]
 [ 1275   101 15706]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 03 | loss=0.1969 | val_acc=0.8580 | val_f1=0.8606
Val CM:
 [[17004   540    28]
 [ 4864 12588    20]
 [ 1863    89 15130]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 04 | loss=0.1916 | val_acc=0.8606 | val_f1=0.8627
Val CM:
 [[16295  1039   238]
 [ 4548 12896    28]
 [ 1294   117 15671]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 05 | loss=0.1870 | val_acc=0.8442 | val_f1=0.8462
Val CM:
 [[16240  1060   272]
 [ 5263 12187    22]
 [ 1383   122 15577]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 06 | loss=0.1834 | val_acc=0.8702 | val_f1=0.8721
Val CM:
 [[16384   847   341]
 [ 3926 13387   159]
 [ 1437    57 15588]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 07 | loss=0.1827 | val_acc=0.8671 | val_f1=0.8692
Val CM:
 [[16739   767    66]
 [ 4612 12841    19]
 [ 1386    75 15621]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 08 | loss=0.1777 | val_acc=0.8699 | val_f1=0.8722
Val CM:
 [[16481  1060    31]
 [ 4258 13193    21]
 [ 1352    61 15669]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 09 | loss=0.1749 | val_acc=0.8679 | val_f1=0.8696
Val CM:
 [[16737   677   158]
 [ 4687 12739    46]
 [ 1266    50 15766]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 10 | loss=0.1750 | val_acc=0.8734 | val_f1=0.8757
Val CM:
 [[16354  1146    72]
 [ 3974 13482    16]
 [ 1327    65 15690]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 11 | loss=0.1731 | val_acc=0.8683 | val_f1=0.8705
Val CM:
 [[16574   949    49]
 [ 4418 13041    13]
 [ 1373    64 15645]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 12 | loss=0.1723 | val_acc=0.8656 | val_f1=0.8678
Val CM:
 [[16670   879    23]
 [ 4629 12837     6]
 [ 1398    70 15614]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 13 | loss=0.1711 | val_acc=0.8687 | val_f1=0.8707
Val CM:
 [[16611   863    98]
 [ 4514 12942    16]
 [ 1281    73 15728]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 14 | loss=0.1707 | val_acc=0.8665 | val_f1=0.8687
Val CM:
 [[16473  1037    62]
 [ 4442 13021     9]
 [ 1332    78 15672]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 15 | loss=0.1705 | val_acc=0.8624 | val_f1=0.8645
Val CM:
 [[16560   934    78]
 [ 4765 12702     5]
 [ 1318    70 15694]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 16 | loss=0.1704 | val_acc=0.8598 | val_f1=0.8617
Val CM:
 [[16739   787    46]
 [ 5048 12412    12]
 [ 1367    48 15667]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 17 | loss=0.1702 | val_acc=0.8601 | val_f1=0.8624
Val CM:
 [[16532  1011    29]
 [ 4791 12676     5]
 [ 1386    69 15627]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 18 | loss=0.1701 | val_acc=0.8646 | val_f1=0.8669
Val CM:
 [[16564   978    30]
 [ 4609 12858     5]
 [ 1369    65 15648]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 19 | loss=0.1700 | val_acc=0.8630 | val_f1=0.8652
Val CM:
 [[16630   903    39]
 [ 4766 12700     6]
 [ 1351    74 15657]]


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):
  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


Epoch 20 | loss=0.1700 | val_acc=0.8642 | val_f1=0.8664
Val CM:
 [[16669   873    30]
 [ 4720 12748     4]
 [ 1376    76 15630]]
Best macro-F1: 0.8756921285880935
Saved: /content/drive/MyDrive/kaggle/working/synthdata/runs_resnet18_yolocrops/best.pt


'/content/drive/MyDrive/kaggle/working/synthdata/runs_resnet18_yolocrops/best.pt'

## Validation Results (Epoch 20)

- **Val Accuracy:** `0.8642`
- **Val macro-F1:** `0.8664`
- **Best macro-F1 across training:** `0.8757` (saved as best checkpoint)
- **Confusion Matrix (rows = true, cols = pred):**


### Per-class Precision / Recall
| Class      | Precision | Recall |
|------------|-----------|--------|
| Normal (0) | 0.732     | 0.949  |
| Reversal(1)| 0.931     | 0.730  |
| Corrected(2)| 0.998    | 0.915  |

> Computed from the confusion matrix. High precision for **Reversal** but lower recall indicates many true Reversal samples are predicted as **Normal**.

## Error Analysis (from CM)
- **Reversal → Normal** is the dominant confusion (`4720` cases), suggesting the model under-detects reversal cues when visual evidence is subtle or cropped too tightly.
- **Normal ↔ Corrected** confusion is minimal, indicating the model reliably distinguishes corrected glyphs once visible.

## Takeaways & Next Steps
- **Context preservation:** Slightly **inflate boxes** (e.g., `+10%`) during training crops to retain local context that signals reversals.
- **Class balance:** Keep `WeightedRandomSampler` (or add class-weighted loss / focal loss) to further improve recall on **Reversal**.
- **Light appearance jitter:** A small `ColorJitter(brightness=0.05, contrast=0.05)` (no flips/rotations) can improve robustness to scan/lighting variations without breaking dyslexia-relevant geometry.
- **Capacity/regularization sweep:** Try EfficientNet-B0 or MobileNetV3 as baselines; monitor macro-F1 and specifically **Reversal recall**.


In [9]:
def load_and_report(ckpt_path):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    _, ds_va, _, dl_va = make_loaders(ROOT, batch_size=128, num_workers=2)
    model = build_resnet18(NUM_CLASSES, pretrained=False).to(device)
    sd = torch.load(ckpt_path, map_location="cpu")["model"]
    model.load_state_dict(sd)
    acc, f1, cm, rep = evaluate(model, dl_va, device)
    print("VAL accuracy:", f"{acc:.4f}")
    print("VAL macro-F1:", f"{f1:.4f}")
    print("Confusion Matrix:\n", cm)
    print(rep)

load_and_report(best_ckpt)


  with torch.cuda.amp.autocast(enabled=(device.type=='cuda')):


VAL accuracy: 0.8734
VAL macro-F1: 0.8757
Confusion Matrix:
 [[16354  1146    72]
 [ 3974 13482    16]
 [ 1327    65 15690]]
              precision    recall  f1-score   support

      Normal       0.76      0.93      0.83     17572
    Reversal       0.92      0.77      0.84     17472
   Corrected       0.99      0.92      0.95     17082

    accuracy                           0.87     52126
   macro avg       0.89      0.87      0.88     52126
weighted avg       0.89      0.87      0.88     52126



## What this cell does
- Rebuilds the validation dataloader and **reconstructs the model** with the same architecture as during training.
- **Loads the saved checkpoint** (`best_ckpt`) into the model (using `map_location="cpu"` for portability).
- Runs the shared `evaluate(...)` routine to print **Validation Accuracy**, **macro-F1**, **Confusion Matrix**, and a full **classification report**.

## Why it matters
- Separates evaluation from training to **avoid accidental state leakage** (e.g., `model.train()` flags, stochastic layers).
- Guarantees that the saved weights **deserialize correctly** into the intended architecture (sanity check for future reuse/deployment).
- Produces stable, comparable metrics for model selection across different runs or model variants.

## Validation Results
- **VAL accuracy:** `0.8734`  
- **VAL macro-F1:** `0.8757`  
- **Confusion Matrix (rows = true, cols = pred):**

[16354 1146 72]
[ 3974 13482 16]
[ 1327 65 15690]


### Quick error analysis
- The dominant confusion is **Reversal → Normal** (`3974` cases). This indicates the model **under-calls reversals** when cues are subtle or the crop is too tight.
- **Corrected → Normal** (`1327`) is the next notable confusion; still much smaller than Reversal→Normal.
- Column totals suggest a **bias toward predicting “Normal”** (predicted Normal = 21,655 vs. true Normal = 17,572), while **Reversal** is under-predicted.




