# Chapter 3.4: Confusion Matrix and Metrics

Goal: Build and interpret confusion matrices, and reason about when to prioritize precision vs recall.

### Topics:
- True positives, true negatives, false positives, false negatives
- Accuracy, precision, recall, and F1 score
- Choosing the right metric for the problem
- Normalized confusion matrices

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

## Quick Recap

- **TP** (True Positive): Predicted positive, actually positive
- **TN** (True Negative): Predicted negative, actually negative
- **FP** (False Positive): Predicted positive, actually negative — "false alarm"
- **FN** (False Negative): Predicted negative, actually positive — "missed case"
- **Accuracy** = (TP + TN) / Total — overall correctness
- **Precision** = TP / (TP + FP) — "when I say positive, am I right?"
- **Recall** = TP / (TP + FN) — "did I catch all the positives?"
- **F1 Score** = harmonic mean of precision and recall

## Data

We'll use the **Titanic** dataset — survival prediction makes the TP/FP/FN concepts very concrete.

In [None]:
# Load Titanic data
titanic = pd.read_csv('../../Textbook/data/titanic.csv')
titanic.head()

## Practice

### 1. Use AI — Prepare data and fit a model

Select numeric features (`Pclass`, `Age`, `SibSp`, `Parch`, `Fare`), fill missing `Age` with the median, split into train/test (80/20), and fit a `LogisticRegression` model. Get predictions on the test set.

In [None]:
# Step 1: Select features and target, fill missing Age


# Step 2: Train/test split (80/20, random_state=42)


# Step 3: Fit LogisticRegression


# Step 4: Get predictions on test set


### 2. Use AI — Create and display a confusion matrix

Use `confusion_matrix()` to create the confusion matrix, then display it as a heatmap with `sns.heatmap(annot=True, fmt='d')`.

In [None]:
# Step 1: Create confusion matrix


# Step 2: Display as heatmap


### 3. Interpretation — What do the errors mean?

In the Titanic context (positive = survived):
- What does a **false positive** mean? (Model predicted ___ but actually ___)
- What does a **false negative** mean?
- Which type of error feels "worse" to you in this context?

**Your answer:**

(Write your answer here)

### 4. Use AI — Generate a classification report

Use `classification_report()` to see precision, recall, and F1 for each class.

In [None]:
# Print the classification report


### 5. By hand — Disease screening scenario

You're building a model to screen patients for a disease.
- A **false negative** means a sick patient is sent home untreated
- A **false positive** means a healthy patient gets additional (unnecessary) tests

Which metric should you prioritize — **precision** or **recall**? Why?

**Your answer:**

(Write your answer here)

### 6. By hand — Spam filter scenario

You're building an email spam filter.
- A **false positive** means a legitimate email goes to the spam folder
- A **false negative** means a spam email reaches the inbox

Which metric matters more here — **precision** or **recall**? Why?

**Your answer:**

(Write your answer here)

### 7. Use AI — Normalized confusion matrix

Normalize the confusion matrix by row (use `normalize='true'` in `confusion_matrix()`) and display as a heatmap. This shows the **percentage** of each actual class that was correctly/incorrectly classified.

In [None]:
# Step 1: Create normalized confusion matrix


# Step 2: Display as heatmap with percentage format


**Your interpretation:** What percentage of actual survivors did the model correctly identify? What about non-survivors?

(Write your answer here)

### 8. Interpretation — The "95% accuracy" question

If someone told you "my model is 95% accurate," what follow-up question would you ask before being impressed?

**Your answer:**

(Write your answer here)

## Discussion

Can a model have high accuracy but terrible recall for one class? When would that happen?

(Discuss with a neighbor)