# E1 Test & Evaluation

**Purpose:** This notebook is for conducting E1 Experiment and processing the experiment's results.

This notebook: 

1. Downloads dataset (Roboflow)
2. Downloads COCO pre-trained models (YOLOv8m and RT-DETER-L)
2. Builds evaluation indices
3. Conducting the experiment
4. Generates predictions on test set
5. Runs all 3 evaluation metrics
6. Generates plots


## 0. Setup


### 0.1 Clone Repo to Colab /content

In [None]:
# Check if in Colab
import sys
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("Running in Google Colab")
    # Clone repo if not already cloned
    import os
    if not os.path.exists('Deep_Learning_Gil_Alon'):
        !git clone https://github.com/gil-attar/Deep_Learning_Project_Gil_Alon.git Deep_Learning_Gil_Alon
    %cd Deep_Learning_Gil_Alon
else:
    print("Running locally")
    import os
    from pathlib import Path
    # Navigate to project root if in notebooks/
    if os.path.basename(os.getcwd()) == 'notebooks':
        os.chdir('..')

print(f"Working directory: {os.getcwd()}")

### 0.2 Mounting Google Drive & Setting up Folder Structure for Results

In [None]:
# Mount Google Drive for saving run results
from google.colab import drive
drive.mount("/content/drive")

In [None]:
import time
from pathlib import Path

PROJECT_NAME = "Deep_Learning_Project_Gil_Alon"

# IMPORTANT:
# - For a fresh folder each time, keep the timestamp.
# - To continue the SAME sweep after a disconnect, set RUN_ID to a fixed string
#   (e.g., RUN_ID="E1_fullsweep_50ep_v1") and reuse it after reconnect.
RUN_ID = time.strftime("E1_%Y%m%d_%H%M%S")

DRIVE_ROOT = Path("/content/drive/MyDrive/Colab_Outputs") / PROJECT_NAME / RUN_ID

PERSIST_E1_RUNS = DRIVE_ROOT / "E1_runs"       # where Experiment 1 outputs will live
PERSIST_WEIGHTS = DRIVE_ROOT / "pretrained"    # optional cache for pretrained weights

PERSIST_E1_RUNS.mkdir(parents=True, exist_ok=True)
PERSIST_WEIGHTS.mkdir(parents=True, exist_ok=True)

print("Drive root:", DRIVE_ROOT)
print("E1 runs:", PERSIST_E1_RUNS)
print("Weights cache:", PERSIST_WEIGHTS)


### 0.3 Symlink the repo’s runs/ to Drive

In [None]:
from pathlib import Path

REPO = Path.cwd()  
print("REPO =", REPO)

E1_RUNS_IN_REPO = REPO / "experiments" / "Experiment_1" / "runs"

# Safety check
assert REPO.exists(), f"Repo path does not exist: {REPO}"

# Remove local runs dir if it exists, then link to Drive
!rm -rf "{E1_RUNS_IN_REPO}"
!ln -s "{PERSIST_E1_RUNS}" "{E1_RUNS_IN_REPO}"

print("Symlink created:")
!ls -la "{E1_RUNS_IN_REPO}"


#### 0.4 Re-routing Model Weights to Drive

In [None]:
WEIGHTS_IN_REPO = REPO / "artifacts" / "weights"

!rm -rf "{WEIGHTS_IN_REPO}"
!ln -s "{PERSIST_WEIGHTS}" "{WEIGHTS_IN_REPO}"

print("Weights dir now points to Drive:")
!ls -la "{WEIGHTS_IN_REPO}"


In [None]:
# Install dependencies
!pip install -q ultralytics roboflow pyyaml pillow numpy matplotlib pandas tqdm


In [None]:
# Check GPU
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 1. Download Dataset

In [None]:
# Set Roboflow API key
import os
os.environ["ROBOFLOW_API_KEY"] = "zEF9icmDY2oTcPkaDcQY"  # Your API key

# Download dataset
!python scripts/download_dataset.py --output_dir data/raw

In [None]:
# Verify dataset downloaded
!echo "Train images: $(ls data/raw/train/images/ 2>/dev/null | wc -l)"
!echo "Valid images: $(ls data/raw/valid/images/ 2>/dev/null | wc -l)"
!echo "Test images: $(ls data/raw/test/images/ 2>/dev/null | wc -l)"

## 2. Fetch COCO-Pretrained Weights (YOLOv8m + RT-DETR-L)

Will be stored under `artifacts/weights/`.


In [None]:
# Fetch pretrained weights (idempotent)
!bash scripts/fetch_weights.sh


## 3. Build Evaluation Indices

In [None]:
# Build train/val/test indices (with ACTUAL image dimensions - this is critical!)
# Remove old indices first to ensure fresh rebuild
import shutil
from pathlib import Path

if Path("data/processed/evaluation").exists():
    shutil.rmtree("data/processed/evaluation")
    print("✓ Removed old indices")

!python scripts/build_evaluation_indices.py \
    --dataset_root data/raw \
    --output_dir data/processed/evaluation

In [None]:
# Verify indices created
import json
from pathlib import Path

test_index_path = "data/processed/evaluation/test_index.json"

if Path(test_index_path).exists():
    with open(test_index_path) as f:
        test_data = json.load(f)
    print(f"✓ Test index: {test_data['metadata']['num_images']} images")
    print(f"  Total objects: {test_data['metadata']['total_objects']}")
    print(f"  Classes: {test_data['metadata']['num_classes']}")
else:
    print(f"❌ Test index not found!")

### 3.1. Create data.yaml for Training

In [None]:
# Create data.yaml with absolute paths for Colab
import yaml
from pathlib import Path
import os

# Get absolute path to dataset
dataset_root = Path('data/raw').resolve()

# Read original data.yaml to get class names
with open(dataset_root / 'data.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Create config with ABSOLUTE paths
train_config = {
    'path': str(dataset_root),  # Absolute base path
    'train': 'train/images',
    'val': 'valid/images', 
    'test': 'test/images',
    'names': config['names'],
    'nc': len(config['names'])
}

# Save to data/processed/
output_path = Path('data/processed/data.yaml')
output_path.parent.mkdir(parents=True, exist_ok=True)

with open(output_path, 'w') as f:
    yaml.dump(train_config, f, default_flow_style=False, sort_keys=False)

print(f"✓ Created data.yaml with absolute paths")
print(f"  Path: {train_config['path']}")
print(f"  Classes: {train_config['nc']}")

## 4. Run Experiment 1 (Freeze Ladder: YOLOv8m vs RT-DETR-L)

This section runs the full E1 sweep using the experiments runner scripts. 
then, per run train, export predictions, and run the custom evaluator.


In [None]:
# E1 sweep configuration
DRY_RUN = True   # True for quick testing: 1 epoch for each of the 8 runs. False for the real sweep.
EPOCHS  = 1 if DRY_RUN else 50
IMGSZ   = 640
SEED    = 42

print(f"DRY_RUN={DRY_RUN} | EPOCHS={EPOCHS} | IMGSZ={IMGSZ} | SEED={SEED}")


In [None]:
# Run the full E1 matrix (8 runs)
# This calls experiments/Experiment_1/runOneTest.py for each (model, freeze_id).
!EPOCHS={EPOCHS} IMGSZ={IMGSZ} SEED={SEED} bash experiments/Experiment_1/run_experiment1.sh


## 5. Verify Run Outputs

Sanity checks: verify that each run directory contains manifests, predictions, and evaluation outputs.


In [None]:
from pathlib import Path
import json

runs_root = Path("experiments/Experiment_1/runs")
assert runs_root.exists(), f"Missing runs root: {runs_root}"

expected_files = [
    "run_manifest.json",
    "train_summary.json",
    "predictions/val_predictions.json",
    "predictions/test_predictions.json",
    "eval/val/metrics.json",
    "eval/test/metrics.json",
]

missing = []
run_dirs = sorted([p for p in runs_root.glob("**/F[0-3]") if p.is_dir()])
print(f"Found {len(run_dirs)} run directories")
for rd in run_dirs:
    for ef in expected_files:
        if not (rd / ef).exists():
            missing.append((str(rd), ef))

if missing:
    print("WARNING: missing expected artifacts in some runs:")
    for rd, ef in missing[:40]:
        print(f"  - {rd} :: {ef}")
else:
    print("✓ All runs contain the expected artifacts.")


## 6. Aggregate Metrics Across Runs

Build a single table across the 8 runs, reading `run_manifest.json`, `eval/test/metrics.json`, and the predictions timing metadata.


In [None]:
### THIS CODE CELL IS AFTER THE EDIT ###

import pandas as pd
import numpy as np
from pathlib import Path

def find_results_csv(run_dir: Path) -> Path:
    """
    Ultralytics typically writes: <run_dir>/results.csv
    We also support fallback searches.
    """
    direct = run_dir / "results.csv"
    if direct.exists():
        return direct

    # fallback: sometimes results*.csv exists
    cands = list(run_dir.glob("results*.csv")) + list(run_dir.rglob("results*.csv"))
    cands = [p for p in cands if p.is_file()]
    if not cands:
        raise FileNotFoundError(f"No results.csv found under: {run_dir}")

    # pick the shortest path (usually the main one), then newest
    cands.sort(key=lambda p: (len(str(p)), -p.stat().st_mtime))
    return cands[0]

def load_epoch_log(run_dir: Path) -> pd.DataFrame:
    csv_path = find_results_csv(run_dir)
    df = pd.read_csv(csv_path)

    # Normalize column names (strip whitespace)
    df.columns = [c.strip() for c in df.columns]

    # Ensure epoch exists
    if "epoch" not in df.columns:
        # sometimes epoch is implicit index
        df.insert(0, "epoch", np.arange(len(df)))

    return df

def pick_first_existing_col(df: pd.DataFrame, candidates: list[str]) -> str | None:
    for c in candidates:
        if c in df.columns:
            return c
    return None

def get_map50_col(df: pd.DataFrame) -> str | None:
    # Common Ultralytics naming variants
    candidates = [
        "metrics/mAP50(B)", "metrics/mAP50", "metrics/mAP_0.5",
        "metrics/mAP50-95(B)",  # not map50, but used as fallback
        "metrics/mAP50(M)",     # if masks exist (unlikely here)
    ]
    c = pick_first_existing_col(df, candidates)
    return c

def get_lr_cols(df: pd.DataFrame) -> list[str]:
    # Ultralytics often logs lr/pg0, lr/pg1, lr/pg2
    return [c for c in df.columns if c.startswith("lr/")]

def get_train_loss_cols(df: pd.DataFrame) -> list[str]:
    return [c for c in df.columns if c.startswith("train/") and "loss" in c]

def get_val_loss_cols(df: pd.DataFrame) -> list[str]:
    return [c for c in df.columns if c.startswith("val/") and "loss" in c]

# Load per-epoch logs for all discovered runs
epoch_logs = {}  # key: (model, freeze_id) -> DataFrame
for rd in run_dirs:
    model = rd.parts[-2]
    freeze_id = rd.parts[-1]
    try:
        epoch_logs[(model, freeze_id)] = load_epoch_log(rd)
    except Exception as e:
        print(f"[WARN] Could not load epoch log for {model}/{freeze_id}: {e}")

print(f"Loaded epoch logs: {len(epoch_logs)} / {len(run_dirs)} runs")
list(epoch_logs.keys())[:10]


In [None]:
import json
import math
import pandas as pd
import numpy as np

def _safe_get(d, keys, default=None):
    cur = d
    for k in keys:
        if not isinstance(cur, dict) or k not in cur:
            return default
        cur = cur[k]
    return cur

def auc_trapz(x, y):
    x = np.asarray(x, dtype=float)
    y = np.asarray(y, dtype=float)
    if len(x) < 2:
        return np.nan
    return float(np.trapz(y, x))

rows = []
for rd in run_dirs:
    parts = rd.parts
    model = parts[-2]
    freeze_id = parts[-1]

    manifest = json.loads((rd / "run_manifest.json").read_text())
    test_metrics = json.loads((rd / "eval/test/metrics.json").read_text())
    preds_test = json.loads((rd / "predictions/test_predictions.json").read_text())

    # Per-epoch log (Ultralytics)
    ep = epoch_logs.get((model, freeze_id))
    map50_col = None
    best_val_map50 = np.nan
    best_epoch = np.nan
    epoch_to_90pct = np.nan
    auc_map50 = np.nan

    train_loss_cols, val_loss_cols = [], []
    best_train_loss = np.nan
    best_val_loss = np.nan
    gen_gap_at_best = np.nan

    if ep is not None and len(ep) > 0:
        map50_col = get_map50_col(ep)
        if map50_col is not None:
            y = ep[map50_col].astype(float).to_numpy()
            x = ep["epoch"].astype(float).to_numpy()

            best_epoch = int(ep.loc[np.nanargmax(y), "epoch"])
            best_val_map50 = float(np.nanmax(y))
            auc_map50 = auc_trapz(x, y)

            target = 0.9 * best_val_map50
            idxs = np.where(y >= target)[0]
            if len(idxs) > 0:
                epoch_to_90pct = int(ep.loc[idxs[0], "epoch"])

        # Loss summaries + generalization gap (if val losses exist)
        train_loss_cols = get_train_loss_cols(ep)
        val_loss_cols = get_val_loss_cols(ep)

        if train_loss_cols:
            ep["_train_total_loss"] = ep[train_loss_cols].sum(axis=1)
        if val_loss_cols:
            ep["_val_total_loss"] = ep[val_loss_cols].sum(axis=1)

        if (train_loss_cols and val_loss_cols and not math.isnan(best_epoch)):
            # pick the row at best_epoch if exists, else closest
            row = ep[ep["epoch"] == best_epoch]
            if len(row) == 0:
                row = ep.iloc[[int(np.nanargmax(ep[map50_col].astype(float).to_numpy()))]]
            best_train_loss = float(row["_train_total_loss"].iloc[0])
            best_val_loss = float(row["_val_total_loss"].iloc[0])
            gen_gap_at_best = float(best_val_loss - best_train_loss)

    # Evaluator primary metric (kept; mostly for final test reporting)
    # (Your extract_primary_metric is fine; reuse it)
    def extract_primary_metric(metrics_json: dict):
        candidates = [
            ("overall_f1", ["overall", "f1"]),
            ("overall_precision", ["overall", "precision"]),
            ("overall_recall", ["overall", "recall"]),
            ("f1", ["f1"]),
            ("precision", ["precision"]),
            ("recall", ["recall"]),
            ("map50", ["map50"]),
            ("map", ["map"]),
        ]
        for name, path in candidates:
            val = _safe_get(metrics_json, path)
            if isinstance(val, (int, float)) and not (isinstance(val, float) and math.isnan(val)):
                return name, float(val)
        for k, v in metrics_json.items():
            if isinstance(v, (int, float)):
                return str(k), float(v)
        return None, None

    primary_metric_name, primary_metric_val = extract_primary_metric(test_metrics)

    rows.append({
        "model": model,
        "freeze_id": freeze_id,

        "trainable_params": _safe_get(manifest, ["param_counts", "trainable_params"]),
        "total_params": _safe_get(manifest, ["param_counts", "total_params"]),

        # Per-epoch (Ultralytics)
        "map50_col": map50_col,
        "best_val_map50": best_val_map50,
        "best_epoch": best_epoch,
        "epoch_to_90pct_best": epoch_to_90pct,
        "auc_val_map50": auc_map50,

        "best_train_loss_at_best_epoch": best_train_loss,
        "best_val_loss_at_best_epoch": best_val_loss,
        "gen_gap_val_minus_train_at_best": gen_gap_at_best,

        # Evaluator final (test-side; depends on your evaluation contract)
        "primary_metric_name": primary_metric_name,
        "primary_metric_test": primary_metric_val,

        "avg_inference_time_ms": _safe_get(preds_test, ["inference_time_ms", "avg_inference_time_ms"]),
        "num_images": _safe_get(preds_test, ["inference_time_ms", "num_images"]),
    })

df = pd.DataFrame(rows).sort_values(["model", "freeze_id"])
display(df)


## 7. Plots

Key plots for E1:
1. Best Validation mAP@0.5 vs. Trainable Parameters (Freeze Ladder Curve)
2. Speed–Accuracy Tradeoff: mAP@0.5 vs. Avg Inference Time (ms/image)
3. Per-Model Training Loss vs. Epoch (F0–F3 Overlay)
4. Per-Model Validation Loss vs. Epoch (F0–F3 Overlay)
5. Per-Model Validation mAP@0.5 vs. Epoch (F0–F3 Overlay)
6. Cross-Model Validation mAP@0.5 vs. Epoch (YOLO vs RT-DETR, per Freeze Regime)
7. Generalization Gap vs. Freeze Regime (Val Loss − Train Loss at Best Epoch)
8. Convergence Speed vs. Freeze Regime (Epoch to 90% of Best Val mAP@0.5)
9. Learning Rate Schedule vs. Epoch (Representative Runs)


In [None]:
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path

PLOTS_DIR = Path("experiments/Experiment_1/runs") / "_plots"
PLOTS_DIR.mkdir(parents=True, exist_ok=True)

def savefig(name: str):
    png = PLOTS_DIR / f"{name}.png"
    pdf = PLOTS_DIR / f"{name}.pdf"
    plt.savefig(png, bbox_inches="tight", dpi=200)
    plt.savefig(pdf, bbox_inches="tight")
    print(f"Saved: {png} and {pdf}")

def get_run_df(model: str, freeze_id: str):
    return epoch_logs.get((model, freeze_id))

def plot_loss_overlays_for_model(model: str):
    # Plot (1): loss curves vs epoch, overlay F0-F3
    freezes = sorted([f for (m, f) in epoch_logs.keys() if m == model])
    if not freezes:
        print(f"[SKIP] No epoch logs for model={model}")
        return

    # Total train loss
    plt.figure(figsize=(8,4))
    for fz in freezes:
        ep = get_run_df(model, fz).copy()
        train_cols = get_train_loss_cols(ep)
        if not train_cols:
            continue
        ep["_train_total_loss"] = ep[train_cols].sum(axis=1)
        plt.plot(ep["epoch"], ep["_train_total_loss"], label=fz)
    plt.xlabel("Epoch")
    plt.ylabel("Total Train Loss (sum of train/*loss cols)")
    plt.title(f"{model}: Train Loss vs Epoch (F0–F3)")
    plt.grid(True)
    plt.legend()
    savefig(f"{model}_train_loss_overlay")
    plt.show()

    # Total val loss (if exists)
    plt.figure(figsize=(8,4))
    any_val = False
    for fz in freezes:
        ep = get_run_df(model, fz).copy()
        val_cols = get_val_loss_cols(ep)
        if not val_cols:
            continue
        any_val = True
        ep["_val_total_loss"] = ep[val_cols].sum(axis=1)
        plt.plot(ep["epoch"], ep["_val_total_loss"], label=fz)
    if any_val:
        plt.xlabel("Epoch")
        plt.ylabel("Total Val Loss (sum of val/*loss cols)")
        plt.title(f"{model}: Val Loss vs Epoch (F0–F3)")
        plt.grid(True)
        plt.legend()
        savefig(f"{model}_val_loss_overlay")
        plt.show()
    else:
        plt.close()
        print(f"[INFO] No val/*loss columns logged for model={model}.")

def plot_map_overlays_for_model(model: str):
    # helper for plots (2) and (6)
    freezes = sorted([f for (m, f) in epoch_logs.keys() if m == model])
    if not freezes:
        return
    plt.figure(figsize=(8,4))
    any_map = False
    for fz in freezes:
        ep = get_run_df(model, fz)
        map_col = get_map50_col(ep)
        if map_col is None:
            continue
        any_map = True
        plt.plot(ep["epoch"], ep[map_col], label=fz)
    if any_map:
        plt.xlabel("Epoch")
        plt.ylabel("Val mAP@0.5 (from results.csv)")
        plt.title(f"{model}: Val mAP@0.5 vs Epoch (F0–F3)")
        plt.grid(True)
        plt.legend()
        savefig(f"{model}_map50_overlay")
        plt.show()
    else:
        plt.close()
        print(f"[INFO] No mAP column found for model={model}.")

def plot_cross_model_map_per_freeze():
    # Plot (2): compare YOLO vs RT-DETR per freeze regime over epochs
    models = sorted(set(m for (m, _) in epoch_logs.keys()))
    if len(models) < 2:
        print("[SKIP] Need both models present for cross-model plots.")
        return

    freezes = sorted(set(f for (_, f) in epoch_logs.keys()))
    for fz in freezes:
        plt.figure(figsize=(8,4))
        any_curve = False
        for model in models:
            ep = epoch_logs.get((model, fz))
            if ep is None:
                continue
            map_col = get_map50_col(ep)
            if map_col is None:
                continue
            any_curve = True
            plt.plot(ep["epoch"], ep[map_col], label=model)
        if any_curve:
            plt.xlabel("Epoch")
            plt.ylabel("Val mAP@0.5")
            plt.title(f"Cross-Model: Val mAP@0.5 vs Epoch ({fz})")
            plt.grid(True)
            plt.legend()
            savefig(f"cross_model_map50_{fz}")
            plt.show()
        else:
            plt.close()

def plot_perf_vs_trainable_params():
    # Plot (3): best val mAP vs trainable parameters (log scale)
    d = df.copy()
    d = d.dropna(subset=["trainable_params", "best_val_map50"])
    if len(d) == 0:
        print("[SKIP] Missing trainable_params or best_val_map50.")
        return

    plt.figure(figsize=(8,4))
    for model, g in d.groupby("model"):
        g = g.sort_values("trainable_params")
        plt.plot(g["trainable_params"], g["best_val_map50"], marker="o", label=model)
        for _, r in g.iterrows():
            plt.annotate(r["freeze_id"], (r["trainable_params"], r["best_val_map50"]))
    plt.xscale("log")
    plt.xlabel("Trainable parameters (log scale)")
    plt.ylabel("Best Val mAP@0.5")
    plt.title("E1: Best Val mAP@0.5 vs Trainable Parameters")
    plt.grid(True)
    plt.legend()
    savefig("best_map50_vs_trainable_params")
    plt.show()

def plot_generalization_gap():
    # Plot (4): generalization gap at best epoch (val_loss - train_loss)
    d = df.copy()
    d = d.dropna(subset=["gen_gap_val_minus_train_at_best"])
    if len(d) == 0:
        print("[SKIP] No gen gap computed (need train/*loss and val/*loss in results.csv).")
        return

    plt.figure(figsize=(8,4))
    for model, g in d.groupby("model"):
        # order F0-F3 nicely
        g = g.sort_values("freeze_id")
        plt.plot(g["freeze_id"], g["gen_gap_val_minus_train_at_best"], marker="o", label=model)
    plt.xlabel("Freeze regime")
    plt.ylabel("Val Loss - Train Loss (at best-mAP epoch)")
    plt.title("E1: Generalization Gap vs Freeze Regime")
    plt.grid(True)
    plt.legend()
    savefig("generalization_gap_vs_freeze")
    plt.show()

def plot_convergence_speed():
    # Plot (5): epoch to reach 90% of best val mAP
    d = df.copy()
    d = d.dropna(subset=["epoch_to_90pct_best"])
    if len(d) == 0:
        print("[SKIP] No convergence speed computed (need mAP column in results.csv).")
        return

    plt.figure(figsize=(8,4))
    for model, g in d.groupby("model"):
        g = g.sort_values("freeze_id")
        plt.plot(g["freeze_id"], g["epoch_to_90pct_best"], marker="o", label=model)
    plt.xlabel("Freeze regime")
    plt.ylabel("Epoch to reach 90% of best Val mAP@0.5")
    plt.title("E1: Convergence Speed vs Freeze Regime")
    plt.grid(True)
    plt.legend()
    savefig("convergence_speed_vs_freeze")
    plt.show()

def plot_lr_and_map_example(model: str, freeze_id: str):
    # Plot (6): LR curve + mAP curve (same run) — shown as two figures (cleaner than dual axis)
    ep = epoch_logs.get((model, freeze_id))
    if ep is None:
        print(f"[SKIP] Missing epoch log for {model}/{freeze_id}")
        return

    lr_cols = get_lr_cols(ep)
    if not lr_cols:
        print(f"[INFO] No lr/* columns found for {model}/{freeze_id}")
    else:
        plt.figure(figsize=(8,4))
        for c in lr_cols:
            plt.plot(ep["epoch"], ep[c], label=c)
        plt.xlabel("Epoch")
        plt.ylabel("Learning rate")
        plt.title(f"{model}/{freeze_id}: LR Schedule")
        plt.grid(True)
        plt.legend()
        savefig(f"{model}_{freeze_id}_lr")
        plt.show()

    map_col = get_map50_col(ep)
    if map_col is None:
        print(f"[INFO] No mAP column found for {model}/{freeze_id}")
    else:
        plt.figure(figsize=(8,4))
        plt.plot(ep["epoch"], ep[map_col], label=map_col)
        plt.xlabel("Epoch")
        plt.ylabel("Val mAP@0.5")
        plt.title(f"{model}/{freeze_id}: Val mAP@0.5")
        plt.grid(True)
        plt.legend()
        savefig(f"{model}_{freeze_id}_map50")
        plt.show()

# -----------------------------
# Produce plots (1)–(6)
# -----------------------------
models_present = sorted(set(m for (m, _) in epoch_logs.keys()))

# (1) per model loss overlays
for m in models_present:
    plot_loss_overlays_for_model(m)

# (2) cross-model mAP per freeze
plot_cross_model_map_per_freeze()

# (3) best mAP vs trainable params
plot_perf_vs_trainable_params()

# (4) generalization gap
plot_generalization_gap()

# (5) convergence speed
plot_convergence_speed()

# (6) LR + mAP for one representative run (edit selection if you want)
if models_present:
    # choose first available run in epoch_logs
    m0, f0 = list(epoch_logs.keys())[0]
    plot_lr_and_map_example(m0, f0)

print("All plots saved under:", PLOTS_DIR.resolve())


## 8. Inspect One Run (Optional)

Display a few evaluator plots for a selected run directory.


In [None]:
from IPython.display import Image, display

# Choose a run to inspect
inspect_run = run_dirs[0] if run_dirs else None
print(f"Inspecting: {inspect_run}")

if inspect_run:
    for rel in [
        "eval/test/threshold_sweep.png",
        "eval/test/per_class_f1.png",
        "eval/test/confusion_matrix.png",
        "eval/test/count_mae_comparison.png",
    ]:
        p = inspect_run / rel
        if p.exists():
            print(f"\n{rel}:")
            display(Image(filename=str(p)))
        else:
            print(f"Missing: {rel}")


## Summary

- Stages 1–3 prepare the dataset, build ground-truth indices, and generate `data/processed/data.yaml`.
- Stage 4 runs E1 (8 runs) using your experiment runner scripts.
- Stages 5–8 verify artifacts and aggregate results into tables and plots suitable for reporting.
