# Lightweight AutoML Demo — Iris Dataset

This notebook demonstrates how to:

1. Run the **Lightweight AutoML** system on the Iris dataset.
2. Inspect the resulting leaderboard of models.
3. Load the saved **best model checkpoint** for Iris.
4. Evaluate the model and visualize performance.


## 1. Setup

Make sure you've installed the project requirements (from the repo root):

```bash
pip install -r requirements.txt
```

Then run this notebook **from the project root directory**, so paths like `data/iris.csv` and `automl_orchestrator.py` resolve correctly.

In [None]:
# Imports and path setup
import os
from pathlib import Path

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix

# Ensure we can import from the project root
PROJECT_ROOT = Path(os.getcwd()).resolve()
print("Project root:", PROJECT_ROOT)

import sys
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

from automl_orchestrator import run_automl


## 2. Run AutoML on Iris

This will:
- Load `data/iris.csv`
- Automatically infer schema and build preprocessing
- Run hyperparameter search over the model zoo
- Build ensembles
- Save a leaderboard CSV in `results/`
- Save the **best overall model** checkpoint to `checkpoints/best_model_iris.pkl`

In [None]:
iris_path = "data/iris.csv"
target_col = "class"

leaderboard_df = run_automl(data_path=iris_path, target_column=target_col)

leaderboard_df

## 3. Load the Best Iris Model Checkpoint

The orchestrator saves the best model (by held-out test score) for each dataset as:

```text
checkpoints/best_model_<dataset_name>.pkl
```

For Iris, this should be `checkpoints/best_model_iris.pkl`.

In [None]:
from pathlib import Path
import joblib

checkpoint_path = Path("checkpoints") / "best_model_iris.pkl"
print("Checkpoint path:", checkpoint_path)

best_model_iris = joblib.load(checkpoint_path)
best_model_iris


## 4. Evaluate the Best Model on the Full Iris Dataset

Here we:
- Reload `iris.csv`
- Separate features and target
- Use the saved best model to predict
- Print a classification report and confusion matrix


In [None]:
iris_df = pd.read_csv(iris_path)
X_iris = iris_df.drop(columns=[target_col])
y_iris = iris_df[target_col]

y_pred = best_model_iris.predict(X_iris)

print("Classification report (best Iris model):\n")
print(classification_report(y_iris, y_pred))

cm = confusion_matrix(y_iris, y_pred)
cm

### Confusion Matrix Heatmap

Let's visualize the confusion matrix for a clearer view of which classes are confused.

In [None]:
fig, ax = plt.subplots(figsize=(4, 4))
im = ax.imshow(cm, interpolation="nearest")
ax.set_title("Iris — Confusion Matrix (Best Model)")
plt.colorbar(im, ax=ax)

classes = sorted(y_iris.unique())
ax.set_xticks(range(len(classes)))
ax.set_yticks(range(len(classes)))
ax.set_xticklabels(classes, rotation=45, ha="right")
ax.set_yticklabels(classes)
ax.set_xlabel("Predicted label")
ax.set_ylabel("True label")

for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        ax.text(j, i, cm[i, j], ha="center", va="center")

plt.tight_layout()
plt.show()

## 5. Visualize Leaderboard Scores

Finally, let's make a simple bar chart of the test scores for each model on Iris. This corresponds directly to the model zoo performance described in the report.

In [None]:
plt.figure(figsize=(6, 4))
plt.bar(leaderboard_df["model"], leaderboard_df["test_score"])
plt.xticks(rotation=45, ha="right")
plt.ylabel("Test Score")
plt.title("Model Zoo Performance on Iris (Test Scores)")
plt.tight_layout()
plt.show()

## 6. Summary

In this notebook, we:
- Ran the **Lightweight AutoML** system on the Iris dataset.
- Inspected the model leaderboard.
- Loaded the **best model checkpoint** and evaluated it on the full dataset.
- Visualized the confusion matrix and model zoo performance.

This example demonstrates the full pipeline from **AutoML orchestration** to **saved checkpoints** and **downstream analysis**, matching the behavior described in the final report.