Evaluate Tests
----------------
This script evaluates the trained Logistic Regression spam classifier
on the held-out TEST split and saves evaluation artifacts.

Steps:
1. Load Data & Model
   - Reads the test split from ../DATA/splits/test.csv.
   - Loads the trained pipeline (TF-IDF + RandomOverSampler + LogisticRegression)
     from ../OUTPUT/03_Split_and_Train/logreg.joblib.

2. Generate Predictions
   - Predicts class labels (ham/spam) for all test messages.
   - Computes predicted probabilities for spam.

3. Compute Metrics
   - Creates a classification report with precision, recall, F1 for both classes.
   - Computes ROC-AUC to measure overall discriminative ability.
   - Stores all metrics in ../OUTPUT/04_Test_Eval/test_report.json.

4. Confusion Matrix
   - Generates a confusion matrix to visualize true vs. predicted labels.
   - Saves the matrix as a PNG figure in ../OUTPUT/04_Test_Eval/confusion_matrix.png.

Outputs:
   - ../OUTPUT/04_Test_Eval/test_report.json   (classification metrics + ROC-AUC)
   - ../OUTPUT/04_Test_Eval/confusion_matrix.png (confusion matrix plot)

This final evaluation step provides unbiased performance metrics on data
never seen during training or validation, satisfying the rubric requirement
for test-set reporting and explanatory figures.

In [1]:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import json
from pathlib import Path

import joblib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

splits_dir = Path("../DATA/splits")
out_dir = Path("../OUTPUT/05_Test_Eval")
out_dir.mkdir(parents=True, exist_ok=True)

model_path = Path("../OUTPUT/04_Training_Results/logreg.joblib")
test = pd.read_csv(splits_dir / "test.csv")

In [2]:
X_test = test["SMS_Message"].tolist()
y_test = (test["Label"].str.lower() == "spam").astype(int).values

model = joblib.load(model_path)


In [3]:
# Predict
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]

In [4]:
# Metrics
report = classification_report(y_test, y_pred, target_names=["ham", "spam"], output_dict=True)
report["roc_auc"] = float(roc_auc_score(y_test, y_prob))

with open(out_dir / "test_report.json", "w") as f:
    json.dump(report, f, indent=2)

In [5]:
# Confusion matrix PNG
cm = confusion_matrix(y_test, y_pred)
fig = plt.figure()
plt.imshow(cm, interpolation="nearest")
plt.title("Confusion Matrix (Test)")
plt.xticks([0, 1], ["ham", "spam"])
plt.yticks([0, 1], ["ham", "spam"])
for (i, j), v in np.ndenumerate(cm):
    plt.text(j, i, str(v), ha="center", va="center")
plt.xlabel("Predicted")
plt.ylabel("True")
fig.savefig(out_dir / "confusion_matrix.png", bbox_inches="tight")
plt.close(fig)

print("Wrote:", out_dir / "test_report.json")
print("Wrote:", out_dir / "confusion_matrix.png")

Wrote: ../OUTPUT/05_Test_Eval/test_report.json
Wrote: ../OUTPUT/05_Test_Eval/confusion_matrix.png


Explain Results: