# ML Assignment 2 — Model Evaluation Notebook

This notebook trains all **6 required classification models** on the **Breast Cancer Wisconsin (Diagnostic)** dataset and produces:

- Accuracy
- AUC
- Precision
- Recall
- F1
- MCC
- A Markdown comparison table ready to paste into README.md

---

In [1]:
import pandas as pd
import numpy as np

from model.data import load_dataset, split_data, make_scaled
from model.evaluate import compute_metrics

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

## Load Data

In [2]:
X, y, feature_names, target_names = load_dataset()
X_train, X_test, y_train, y_test = split_data(X, y, test_size=0.2)

## Define Models
Same configuration used in the Streamlit app.

In [3]:
models = {
    "Logistic Regression": make_scaled(LogisticRegression(max_iter=500)),
    "Decision Tree": DecisionTreeClassifier(max_depth=5, random_state=42),
    "KNN": make_scaled(KNeighborsClassifier(n_neighbors=5)),
    "Naive Bayes (Gaussian)": GaussianNB(),
    "Naive Bayes (Multinomial)": MultinomialNB(),
    "Random Forest": RandomForestClassifier(n_estimators=200, random_state=42),
    "XGBoost": XGBClassifier(
        n_estimators=200,
        learning_rate=0.1,
        max_depth=4,
        subsample=0.9,
        colsample_bytree=0.9,
        eval_metric="logloss",
        tree_method="hist",
        random_state=42,
    ),
}

## Train & Evaluate All Models

In [4]:
results = []

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_proba = model.predict_proba(X_test)[:,1] if hasattr(model, "predict_proba") else None
    m = compute_metrics(y_test, y_pred, y_proba)
    row = {"ML Model Name": name}
    row.update(m)
    results.append(row)

df_results = pd.DataFrame(results)
df_results

## Generate README-Friendly Markdown Table

In [5]:
md = "| ML Model Name | Accuracy | AUC | Precision | Recall | F1 | MCC |\n"
md += "|---|---:|---:|---:|---:|---:|---:|\n"

for _, r in df_results.iterrows():
    def fmt(x): return "-" if pd.isna(x) else f"{x:.4f}"
    md += f"| {r['ML Model Name']} | {fmt(r['accuracy'])} | {fmt(r.get('roc_auc'))} | {fmt(r['precision'])} | {fmt(r['recall'])} | {fmt(r['f1'])} | {fmt(r['mcc'])} |\n"

md

'| ML Model Name | Accuracy | AUC | Precision | Recall | F1 | MCC |\n|---|---:|---:|---:|---:|---:|---:|\n| Logistic Regression | 0.9649 | 0.9938 | 0.9649 | 0.9649 | 0.9649 | 0.9297 |\n| Decision Tree | 0.9123 | 0.9217 | 0.9217 | 0.9123 | 0.9123 | 0.8258 |\n| KNN | 0.9561 | 0.9842 | 0.9561 | 0.9561 | 0.9561 | 0.9105 |\n| Naive Bayes (Gaussian) | 0.9386 | 0.9764 | 0.9386 | 0.9386 | 0.9386 | 0.8777 |\n| Naive Bayes (Multinomial) | 0.8947 | 0.9351 | 0.8947 | 0.8947 | 0.8947 | 0.7897 |\n| Random Forest | 0.9737 | 0.9928 | 0.9737 | 0.9737 | 0.9737 | 0.9473 |\n| XGBoost | 0.9737 | 0.9971 | 0.9737 | 0.9737 | 0.9737 | 0.9473 |'

### Copy the above Markdown table into your `README.md`

This satisfies the assignment’s requirement for presenting a comparison table across all 6 models.