# Level 1 — Week 2 Practice (Starter Notebook)

This notebook gives you starter code for the **ML training loop** using scikit-learn.

## What success looks like (end of practice)

- You ran at least 2 experiments (changing one variable).
- You saved artifacts under `output/`:
  - `metrics.json`
  - `config.json`
  - `report.md` (your comparison write-up)

### Checkpoint

After running the notebook, you should be able to open:

- `output/metrics.json`
- `output/config.json`
- `output/report.md`

## References (docs)
- scikit-learn getting started: https://scikit-learn.org/stable/getting_started.html
- scikit-learn train/test split: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- scikit-learn model evaluation: https://scikit-learn.org/stable/modules/model_evaluation.html
- scikit-learn cross-validation concepts: https://scikit-learn.org/stable/modules/cross_validation.html
- F1 score (Wikipedia): https://en.wikipedia.org/wiki/F1_score
- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html

## Setup

You should run this in an environment with `scikit-learn` installed.


In [None]:
from dataclasses import dataclass
from pathlib import Path
import json

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, classification_report


In [None]:
OUTPUT_DIR = Path('output')
OUTPUT_DIR.mkdir(exist_ok=True)
OUTPUT_DIR


## Load data

We use Iris as a starter dataset. Replace it later with your own dataset as needed.


In [None]:
data = load_iris(as_frame=True)
X = data.data
y = data.target
X.head(), y.head()


## Parameterize experiment config

In your assignment, this becomes CLI args (e.g., `--seed`, `--model_type`).


In [None]:
@dataclass
class Config:
    seed: int = 42
    test_size: float = 0.2
    max_iter: int = 200

cfg = Config()
cfg


## Split -> train -> evaluate

Notes:
- Use a fixed `random_state` for reproducibility.
- Evaluate on the hold-out set (not training).


In [None]:
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=cfg.test_size, random_state=cfg.seed, stratify=y
)

model = LogisticRegression(max_iter=cfg.max_iter)
model.fit(X_train, y_train)

pred = model.predict(X_val)
acc = accuracy_score(y_val, pred)
f1 = f1_score(y_val, pred, average='macro')

acc, f1


In [None]:
print(classification_report(y_val, pred))


## Save artifacts

In a real project you should save:
- model file
- config used
- metrics

This is the minimum evidence that supports your report.


In [None]:
metrics = {
    'accuracy': float(acc),
    'f1_macro': float(f1),
}

(OUTPUT_DIR / 'metrics.json').write_text(json.dumps(metrics, indent=2), encoding='utf-8')
(OUTPUT_DIR / 'config.json').write_text(json.dumps(cfg.__dict__, indent=2), encoding='utf-8')

# Optional: save model (requires joblib)
try:
    import joblib
    joblib.dump(model, OUTPUT_DIR / 'model.joblib')
    saved_model = True
except ModuleNotFoundError:
    saved_model = False

metrics, cfg.__dict__, saved_model


## Exercise: Compare two experiments (TODO)

Goal:

- Run **two** experiments that differ by exactly one change.
- Write a short `output/report.md` explaining:
  - what changed
  - what happened (metrics)
  - what you think caused it
  - what you’d try next

Checkpoint:

- `output/report.md` exists and mentions both experiments.

In [None]:
def run_experiment(cfg: Config):
    X_train, X_val, y_train, y_val = train_test_split(
        X, y, test_size=cfg.test_size, random_state=cfg.seed, stratify=y
    )
    m = LogisticRegression(max_iter=cfg.max_iter)
    m.fit(X_train, y_train)
    pred = m.predict(X_val)

    return {
        "config": cfg.__dict__.copy(),
        "metrics": {
            "accuracy": float(accuracy_score(y_val, pred)),
            "f1_macro": float(f1_score(y_val, pred, average="macro")),
        },
    }


cfg_a = cfg
cfg_b = Config(seed=cfg.seed, test_size=cfg.test_size, max_iter=cfg.max_iter * 2)

run_a = run_experiment(cfg_a)
run_b = run_experiment(cfg_b)

report_md = "\n".join(
    [
        "# Experiment Comparison Report",
        "",
        "## What changed",
        "TODO: Describe the one change you made (max_iter / solver / model type).",
        "",
        "## Results",
        f"- Experiment A config: {run_a['config']}",
        f"- Experiment A metrics: {run_a['metrics']}",
        f"- Experiment B config: {run_b['config']}",
        f"- Experiment B metrics: {run_b['metrics']}",
        "",
        "## Why you think it happened",
        "TODO: Write 2-5 sentences.",
        "",
        "## Next experiment",
        "TODO: What will you try next?",
        "",
    ]
)

(OUTPUT_DIR / "report.md").write_text(report_md, encoding="utf-8")
print("wrote", OUTPUT_DIR / "report.md")
run_a, run_b

## Appendix: Solutions (peek only after trying)

Reference approach for comparing experiments and writing a short report.

In [None]:
def format_run(run: dict) -> str:
    cfg = run["config"]
    metrics = run["metrics"]
    return "\n".join(
        [
            f"- config: {cfg}",
            f"- accuracy: {metrics['accuracy']}",
            f"- f1_macro: {metrics['f1_macro']}",
        ]
    )


report_solution = "\n".join(
    [
        "# Experiment Comparison Report",
        "",
        "## What changed",
        "In Experiment B, I increased `max_iter` while holding `seed` and `test_size` constant.",
        "",
        "## Results",
        "### Experiment A",
        format_run(run_a),
        "",
        "### Experiment B",
        format_run(run_b),
        "",
        "## Why you think it happened",
        "Logistic regression sometimes needs more optimization steps to converge; increasing `max_iter` can improve metrics if the model was under-trained.",
        "",
        "## Next experiment",
        "Try a different solver (e.g. `lbfgs` vs `liblinear`) or add feature scaling and compare again.",
        "",
    ]
)

(OUTPUT_DIR / "report_solution.md").write_text(report_solution, encoding="utf-8")
print("wrote", OUTPUT_DIR / "report_solution.md")