# Week 11 — Experiment Design & Evaluation

*Last updated:* 2025-09-09

## Objectives
- [ ] Understand experiment design & evaluation
- [ ] Complete guided exercises (theory → code → evaluation)
- [ ] Apply learning in a small project or lab
- [ ] Reflect using self-assessment checklist

## Mini-Theory (Deep Dive)
- Cross-validation variants; nested CV; ablations
- A/B testing basics; power & effect size; sequential testing cautions
- Reproducibility playbook; experiment tracking taxonomy

## Guided Exercises
    The following exercises are structured to help you learn by doing. Each has **starter code**, **hints**, and **checks**.

In [None]:
# Exercise: Train/validate a model with solid CV and leakage checks
import numpy as np, pandas as pd
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

# Synthetic demo dataset (replace with real dataset)
X = np.random.randn(1000, 10)
y = (X[:, 0] + 0.5*X[:, 1] > 0).astype(int)

pipe = Pipeline([("scaler", StandardScaler()), ("clf", LogisticRegression(max_iter=1000))])
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(pipe, X, y, cv=cv, scoring="roc_auc")
print("ROC-AUC (5-fold):", scores.mean(), "+/-", scores.std())
# TODO: add calibration curve & threshold selection

## Project Work
- This week connects to: `projects/10-tabular-ml/README.md`
- Implement the **Build** task described in the project README. Tie your notebook experiments into that code (e.g., import your module or save artifacts for the project).

### Deliverable
- A short write-up (5–10 bullets) on **what worked, what didn’t, and what you’ll try next**.

## Self-Assessment Checklist
- [ ] I can explain the key concepts of **Experiment Design & Evaluation** in my own words.
- [ ] I completed the guided exercises and validated outputs.
- [ ] I produced a small artifact (code, plot, or report) and linked it to the project.
- [ ] I captured 3–5 learnings and 2 next steps.

---
**Tip:** Keep each week to ~10 hours: ~3h study, ~3h coding, ~3h project, ~1h reflection.