# Lesson 5.6: Model Evaluation

## Why Accuracy Isn't Enough

Imagine 95% of water filters are fine, 5% need maintenance.
A model that ALWAYS says "fine" gets 95% accuracy! But it misses every bad filter.

### PHP Parallel
Like monitoring your app - you need multiple metrics:
- Response time (fast but wrong?)
- Error rate (few errors but missing edge cases?)
- Uptime (available but slow?)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import (accuracy_score, precision_score, recall_score, 
                             f1_score, confusion_matrix, classification_report)

%matplotlib inline

In [None]:
# Simulated predictions for a water filter model
y_actual =    [0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1]
y_predicted = [0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1]
# 0=OK, 1=Needs maintenance

In [None]:
# Confusion Matrix - the foundation of all metrics
cm = confusion_matrix(y_actual, y_predicted)

plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Predicted OK', 'Predicted Maint.'],
            yticklabels=['Actually OK', 'Actually Maint.'])
plt.title('Confusion Matrix')
plt.show()

print("Reading the matrix:")
print(f"  True Negatives (TN):  {cm[0][0]} - Correctly said OK")
print(f"  False Positives (FP): {cm[0][1]} - Said maintenance but was OK (false alarm)")
print(f"  False Negatives (FN): {cm[1][0]} - Said OK but needed maintenance (DANGEROUS!)")
print(f"  True Positives (TP):  {cm[1][1]} - Correctly caught bad filter")

In [None]:
# The Key Metrics
print(f"Accuracy:  {accuracy_score(y_actual, y_predicted):.1%}")
print(f"  → Of ALL predictions, how many were correct?")
print()
print(f"Precision: {precision_score(y_actual, y_predicted):.1%}")
print(f"  → When we flagged maintenance, how often were we right?")
print()
print(f"Recall:    {recall_score(y_actual, y_predicted):.1%}")
print(f"  → Of ALL filters needing maintenance, how many did we catch?")
print()
print(f"F1 Score:  {f1_score(y_actual, y_predicted):.1%}")
print(f"  → Balance between precision and recall")

In [None]:
# For water filters: RECALL matters most!
# Missing a bad filter (False Negative) is DANGEROUS - unsafe water!
# A false alarm (False Positive) just means an unnecessary check - much less harmful.

print("=== Classification Report (all metrics at once) ===")
print(classification_report(y_actual, y_predicted, target_names=['OK', 'Maintenance']))

## Exercise

1. If we have 100 filters: 90 OK, 10 need maintenance. A model says ALL are OK. What is accuracy? Precision? Recall?
2. Why is recall more important than precision for medical tests?
3. Build a model on water filter data and generate a full classification_report

In [None]:
# YOUR CODE HERE