# Lesson 06: Model Evaluation

**What you'll learn:**
- Why accuracy alone isn't enough
- Confusion matrix (what the model got right/wrong)
- Precision, Recall, F1-score
- When to use which metric

---

## Section 1: The Problem with Accuracy

### READ

**Accuracy can be MISLEADING with imbalanced data!**

Example: 100 emails, 95 normal, 5 spam
- A model that predicts "normal" for EVERYTHING gets 95% accuracy!
- But it never detects spam - useless as a spam filter.

We need better metrics.

### TRY IT

In [None]:
from sklearn.metrics import accuracy_score

# Simulate imbalanced predictions
actual = ['normal']*95 + ['spam']*5
predicted = ['normal']*100  # Lazy model: always predict normal

print(f"Accuracy: {accuracy_score(actual, predicted):.0%}")
print("\nBut this model NEVER detects spam!")
print("95% accuracy, 0% spam detection - useless!")

---

## Section 2: Confusion Matrix

### READ

A **Confusion Matrix** shows exactly what the model got right and wrong.

```
                    Predicted
                 Normal    Attack
Actual Normal      TN        FP
       Attack      FN        TP
```

- **TN (True Negative)**: Correctly predicted Normal
- **TP (True Positive)**: Correctly predicted Attack
- **FP (False Positive)**: Predicted Attack, was Normal (false alarm)
- **FN (False Negative)**: Predicted Normal, was Attack (MISSED attack!)

### TRY IT

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Load and prepare data
df = pd.read_csv('../datasets/tomatjus.csv')
X = df.drop('quality', axis=1)
y = df['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

# Create confusion matrix
cm = confusion_matrix(y_test, predictions)
print("Confusion Matrix:")
print(cm)

In [None]:
# Visualize confusion matrix
plt.figure(figsize=(8, 6))
disp = ConfusionMatrixDisplay(cm, display_labels=model.classes_)
disp.plot(cmap='Blues')
plt.title('Confusion Matrix')
plt.show()

### EXPLAIN

Reading the matrix:
- **Diagonal** (top-left to bottom-right) = Correct predictions
- **Off-diagonal** = Mistakes
- Look for which classes are confused with each other

---

## Section 3: Precision, Recall, F1-Score

### READ

**PRECISION**: Of all predicted as Attack, how many really were?
```
Precision = TP / (TP + FP)
"When I say Attack, am I usually right?"
```

**RECALL**: Of all actual Attacks, how many did we catch?
```
Recall = TP / (TP + FN)
"Am I catching most attacks?"
```

**F1-SCORE**: Balance between Precision and Recall
```
F1 = 2 * (Precision * Recall) / (Precision + Recall)
```

### TRY IT

In [None]:
from sklearn.metrics import classification_report

# Get the full classification report
print("Classification Report:")
print("="*60)
print(classification_report(y_test, predictions))

### EXPLAIN

Reading the report:
- **precision**: When model predicts this class, how often is it correct?
- **recall**: Of all actual samples of this class, how many did we find?
- **f1-score**: Balance of precision and recall
- **support**: Number of actual samples in each class

**Averages:**
- **macro avg**: Simple average (treats all classes equally)
- **weighted avg**: Weighted by class size

---

## Section 4: Which Metric to Use?

| Situation | Use This Metric | Why |
|-----------|-----------------|-----|
| Balanced data | Accuracy | All classes equally important |
| Imbalanced data | F1-score (macro) | Treats all classes equally |
| Missing attacks is costly | Recall | Want to catch all attacks |
| False alarms are costly | Precision | Want accurate predictions |
| General imbalanced | F1-weighted | Considers class sizes |

**For your assignment (intrusion detection):**
- Missing attacks (FN) is dangerous → Focus on **Recall**
- But also need reasonable precision → Use **F1-score**

In [None]:
from sklearn.metrics import f1_score, precision_score, recall_score

print("Individual Metrics:")
print(f"Accuracy:  {accuracy_score(y_test, predictions):.3f}")
print(f"Precision: {precision_score(y_test, predictions, average='weighted'):.3f}")
print(f"Recall:    {recall_score(y_test, predictions, average='weighted'):.3f}")
print(f"F1-score:  {f1_score(y_test, predictions, average='weighted'):.3f}")

---

## Quick Reference

```python
from sklearn.metrics import (
    accuracy_score,
    confusion_matrix,
    classification_report,
    precision_score,
    recall_score,
    f1_score
)

# Basic metrics
accuracy_score(y_test, predictions)
confusion_matrix(y_test, predictions)
classification_report(y_test, predictions)

# For multi-class, specify average
f1_score(y_test, predictions, average='weighted')
f1_score(y_test, predictions, average='macro')
```

---

## Next Lesson

In **Lesson 07: Hyperparameter Tuning**, you'll learn:
- What are hyperparameters
- How to find the best values
- GridSearchCV (automated tuning)