# Level 1 — Week 2 Practice (Starter Notebook)

This notebook gives you starter code for the **ML training loop** using scikit-learn.

## References (docs)
- scikit-learn getting started: https://scikit-learn.org/stable/getting_started.html
- scikit-learn train/test split: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- scikit-learn model evaluation: https://scikit-learn.org/stable/modules/model_evaluation.html
- scikit-learn cross-validation concepts: https://scikit-learn.org/stable/modules/cross_validation.html
- F1 score (Wikipedia): https://en.wikipedia.org/wiki/F1_score
- scikit-learn model persistence: https://scikit-learn.org/stable/model_persistence.html


## Setup

You should run this in an environment with `scikit-learn` installed.


In [None]:
from dataclasses import dataclass
from pathlib import Path
import json

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, classification_report


In [None]:
OUTPUT_DIR = Path('output')
OUTPUT_DIR.mkdir(exist_ok=True)
OUTPUT_DIR


## Load data

We use Iris as a starter dataset. Replace it later with your own dataset as needed.


In [None]:
data = load_iris(as_frame=True)
X = data.data
y = data.target
X.head(), y.head()


## Parameterize experiment config

In your assignment, this becomes CLI args (e.g., `--seed`, `--model_type`).


In [None]:
@dataclass
class Config:
    seed: int = 42
    test_size: float = 0.2
    max_iter: int = 200

cfg = Config()
cfg


## Split -> train -> evaluate

Notes:
- Use a fixed `random_state` for reproducibility.
- Evaluate on the hold-out set (not training).


In [None]:
X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=cfg.test_size, random_state=cfg.seed, stratify=y
)

model = LogisticRegression(max_iter=cfg.max_iter)
model.fit(X_train, y_train)

pred = model.predict(X_val)
acc = accuracy_score(y_val, pred)
f1 = f1_score(y_val, pred, average='macro')

acc, f1


In [None]:
print(classification_report(y_val, pred))


## Save artifacts

In a real project you should save:
- model file
- config used
- metrics

This is the minimum evidence that supports your report.


In [None]:
metrics = {
    'accuracy': float(acc),
    'f1_macro': float(f1),
}

(OUTPUT_DIR / 'metrics.json').write_text(json.dumps(metrics, indent=2), encoding='utf-8')
(OUTPUT_DIR / 'config.json').write_text(json.dumps(cfg.__dict__, indent=2), encoding='utf-8')

# Optional: save model (requires joblib)
try:
    import joblib
    joblib.dump(model, OUTPUT_DIR / 'model.joblib')
    saved_model = True
except ModuleNotFoundError:
    saved_model = False

metrics, cfg.__dict__, saved_model


## TODO: Compare two experiments

- Change **one thing** (e.g., `max_iter`, solver, or model type).
- Re-run and compare metrics.
- Write a short `report.md`: what changed, what happened, why you think it happened, and what you’ll try next.
