# Week 2 — Part 01: ML Training Loop Lab

**Estimated time:** 120–150 minutes

---

## Pre-study (Self-learn)

Foundations Course assumes Self-learn is complete. If you need a refresher on evaluation mindset and metrics:

- [Foundations Course Pre-study index](../PRESTUDY.md)
- [Self-learn — Evaluation metrics (accuracy/precision/recall/F1)](../self_learn/Chapters/4/02_core_concepts.md)

---

## What success looks like (end of Part 01)

- You can run a full loop:
  - split → train → evaluate
- You save artifacts under `output/`:
  - one run file under a timestamped folder (e.g. `output/run_.../result.json`)
  - one summary file (e.g. `output/training_loop_summary.json`)

### Checkpoint

After running this notebook, you should be able to point to:

- the exact `result.json` that produced one metric
- the `training_loop_summary.json` that ranks multiple configs

## Learning Objectives

- Implement a complete ML training loop (split → train → evaluate → save)
- Understand train/validation splits
- Practice model evaluation metrics
- Save and reload artifacts for reproducibility
- Compare model configurations

### What this part covers
This notebook implements the **ML training loop** — the core engineering pattern for running, evaluating, and saving machine learning experiments.

The loop has 5 steps: **Load → Split → Train → Evaluate → Save artifacts**

Each step produces something concrete you can inspect. By the end you will have a timestamped folder under `output/` containing the config, metrics, and model for every run — so you can always trace back "what produced this result?"

## Overview

This lab is a minimal end-to-end baseline:

1. Load data
2. Split train/validation (fixed seed)
3. Train a baseline model
4. Evaluate on validation
5. Save artifacts to `output/`

If you need a refresher on why we split data and how to interpret metrics, use the Self-learn links at the top of the notebook.

### What this cell does
Defines `TrainConfig` — a typed dataclass holding all hyperparameters — and `run_once()` which executes one full training loop iteration.

**Key design decisions:**
- **`TrainConfig` dataclass:** All parameters in one place. When you save `config.json`, you save the exact settings that produced the result. No guessing later.
- **`stratify=y` in `train_test_split`:** Ensures each class appears proportionally in both train and validation sets. Without this, a small dataset might have all examples of one class in train and none in validation.
- **`StandardScaler`:** Fit on train data only, then applied to validation. Fitting on the full dataset would "leak" validation statistics into training — a subtle but common mistake.
- **Artifacts saved per run:** `result.json` (single run) and `training_loop_summary.json` (all candidates ranked). This is your audit trail.

**What to check:** After running, open `output/run_.../result.json` and verify the metrics match what's printed.

In [None]:
from dataclasses import dataclass
from pathlib import Path
import json

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from sklearn.preprocessing import StandardScaler


OUTPUT_DIR = Path("output")
OUTPUT_DIR.mkdir(exist_ok=True)


@dataclass
class TrainConfig:
    seed: int = 7
    test_size: float = 0.25
    max_iter: int = 250


cfg = TrainConfig()
print(cfg)

### Task 1.1: Load Data

Load a dataset and inspect basic shapes/labels. We'll use Iris for a small reproducible example.

In [None]:
from datetime import datetime

import numpy as np


def run_once(seed: int, test_size: float, max_iter: int) -> dict:
    data = load_iris(as_frame=True)
    X = data.data
    y = data.target

    X_train, X_val, y_train, y_val = train_test_split(
        X, y, test_size=test_size, random_state=seed, stratify=y
    )

    scaler = StandardScaler()
    X_train_s = scaler.fit_transform(X_train)
    X_val_s = scaler.transform(X_val)

    model = LogisticRegression(max_iter=max_iter, random_state=seed)
    model.fit(X_train_s, y_train)

    pred = model.predict(X_val_s)
    metrics = {
        "accuracy": float(accuracy_score(y_val, pred)),
        "f1_macro": float(f1_score(y_val, pred, average="macro")),
    }

    return {
        "config": {"seed": seed, "test_size": test_size, "max_iter": max_iter},
        "metrics": metrics,
        "class_counts_train": [int(x) for x in np.bincount(y_train)],
        "class_counts_val": [int(x) for x in np.bincount(y_val)],
    }


single = run_once(seed=cfg.seed, test_size=cfg.test_size, max_iter=cfg.max_iter)
print("single run:", single["config"], single["metrics"])

run_id = datetime.utcnow().strftime("run_%Y%m%d_%H%M%S")
run_dir = OUTPUT_DIR / run_id
run_dir.mkdir(exist_ok=True)

(run_dir / "result.json").write_text(json.dumps(single, indent=2), encoding="utf-8")
print("wrote:", run_dir / "result.json")

candidates = [
    {"seed": 7, "test_size": 0.25, "max_iter": 100},
    {"seed": 7, "test_size": 0.25, "max_iter": 400},
    {"seed": 13, "test_size": 0.20, "max_iter": 250},
]

results = [run_once(**c) for c in candidates]
results_sorted = sorted(results, key=lambda r: r["metrics"]["accuracy"], reverse=True)

summary = {
    "best": results_sorted[0],
    "all": results_sorted,
}

(OUTPUT_DIR / "training_loop_summary.json").write_text(json.dumps(summary, indent=2), encoding="utf-8")
print("wrote:", OUTPUT_DIR / "training_loop_summary.json")
print("best config:", summary["best"]["config"], "metrics:", summary["best"]["metrics"])