# Confusion Matrix - Sensitivity & Specificity

YT video 1 - https://www.youtube.com/watch?v=Kdsp6soqA7o&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=3

YT video 2 - https://www.youtube.com/watch?v=vP06aMoz4v8&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=4

A confusion matrix is a table that shows how well a classification model performs by comparing predicted values with actual values. It displays four types of predictions:

True Positive (TP): Correctly predicted positive 

True Negative (TN): Correctly predicted negative 

False Positive (FP): Incorrectly predicted positive
 
False Negative (FN): Incorrectly predicted negative 

## Confusion Matrix Using Cross Validation - 

First, we split our data into Training and Testing sets. We use the Training data to build the model, and then we use the model to make predictions on the Testing data.

### Create simple data - 

In [2]:
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_predict
from sklearn.ensemble import RandomForestClassifier

# Create simple heart disease data
np.random.seed(42)

# Generate 100 people with random ages and cholesterol levels
age = np.random.normal(55, 15, 100)  # Mean=55, SD=15, n=100
cholesterol = np.random.normal(200, 50, 100)  # Mean=200, SD=50, n=100

# Simple rule: high age + high cholesterol = heart disease
risk = (age - 50) / 20 + (cholesterol - 200) / 100 # Create a simple rule to simulate risk  

# astype(int) converts the risk values to integers (0 or 1), not booleans
actual = (risk > 0.5).astype(int) # 1 = has heart disease, 0 = no heart disease

# Combine age and cholesterol into a 2D feature array
X = np.column_stack((age, cholesterol))

If we have N classes/categories to predict, the confusion matrix will have N rows and N columns, creating an N×N matrix.

### Train Models and Get Predictions -

In [3]:
# Use cross validation to get predictions (5-fold CV)
model = RandomForestClassifier(n_estimators=10, random_state=42)  # Use 10 trees in the forest, same trees every run 
predicted = cross_val_predict(model, X, actual, cv=5)  # Get predictions using CV, each part of CV is used once as a test set

# Create confusion matrix
cm = confusion_matrix(actual, predicted)

### Display Results -

In [4]:
# Display confusion matrix
print("Confusion Matrix (with Cross-Validation):")
print("                Predicted")
print("Actual    | No Disease | Disease")
print("----------|------------|--------")
print(f"No Disease|     {cm[0,0]:3d}     |   {cm[0,1]:3d}")
print(f"Disease   |     {cm[1,0]:3d}     |   {cm[1,1]:3d}")

# Extract values from confusion matrix
tn, fp, fn, tp = cm.ravel() # cm.ravel() flattens the matrix into a 1D array

# Show metrics derived from the confusion matrix
print(f"\nMetrics:")
print(f"True Negatives (TN): {tn} - Correctly predicted no disease")
print(f"False Positives (FP): {fp} - Incorrectly predicted disease")
print(f"False Negatives (FN): {fn} - Incorrectly predicted no disease")
print(f"True Positives (TP): {tp} - Correctly predicted disease")


Confusion Matrix (with Cross-Validation):
                Predicted
Actual    | No Disease | Disease
----------|------------|--------
No Disease|      67     |     2
Disease   |       9     |    22

Metrics:
True Negatives (TN): 67 - Correctly predicted no disease
False Positives (FP): 2 - Incorrectly predicted disease
False Negatives (FN): 9 - Incorrectly predicted no disease
True Positives (TP): 22 - Correctly predicted disease


### Comparing Confusion Matrices Across Different Machine Learning Models

Logistic Regression vs Random Forest

In [5]:
import numpy as np
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict

# Create data again (same as above)
np.random.seed(42)
age = np.random.normal(55, 15, 100)
cholesterol = np.random.normal(200, 50, 100)
risk = (age - 50) / 20 + (cholesterol - 200) / 100
actual = (risk > 0.5).astype(int)
X = np.column_stack([age, cholesterol])

### Display Function -

In [6]:
# Function to extract and display confusion matrix details for a model
def show_results(cm, model_name):
    tn, fp, fn, tp = cm.ravel()
    accuracy = (tp + tn) / (tp + tn + fp + fn)
    
    print(f"\n{model_name}:")
    print(f"TN: {tn}, FP: {fp}, FN: {fn}, TP: {tp}")
    print(f"Accuracy: {accuracy:.3f} ({accuracy*100:.1f}%)")
    return accuracy

### Test Logistic Regression - 

In [7]:
# Initialize and evaluate Logistic Regression
lr_model = LogisticRegression(random_state=42)
lr_predicted = cross_val_predict(lr_model, X, actual, cv=5)
lr_cm = confusion_matrix(actual, lr_predicted)
lr_accuracy = show_results(lr_cm, "Logistic Regression")


Logistic Regression:
TN: 69, FP: 0, FN: 0, TP: 31
Accuracy: 1.000 (100.0%)


### Test Random Forest -

In [8]:
# Initialize and evaluate Random Forest
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
rf_predicted = cross_val_predict(rf_model, X, actual, cv=5) # 5-fold cv
rf_cm = confusion_matrix(actual, rf_predicted)
rf_accuracy = show_results(rf_cm, "Random Forest")


Random Forest:
TN: 67, FP: 2, FN: 9, TP: 22
Accuracy: 0.890 (89.0%)


### Compare Both Models - 

In [9]:
# Compare accuracies between the two models
print(f"\nComparison:")
print(f"Logistic Regression: {lr_accuracy:.3f}")
print(f"Random Forest: {rf_accuracy:.3f}")

# Determine which model wins
if lr_accuracy > rf_accuracy:
    print("Logistic Regression wins!")
else:
    print("Random Forest wins!")


Comparison:
Logistic Regression: 1.000
Random Forest: 0.890
Logistic Regression wins!


Confusion matrices help compare different ML models on the same dataset. By testing multiple algorithms, we can see which performs best and what types of errors each makes. This is especially important in medical applications where false negatives (missing disease) are more dangerous than false positives (false alarm). The confusion matrix guides us to choose the right model based on error patterns that matter most for our specific problem.

## Precision - 

When our model predicts "Yes" (has disease), how often is it correct?

From the confusion matrix:

True Positives (TP): Model correctly predicted disease.

False Positives (FP): Model incorrectly predicted disease (but they were healthy).

In [10]:
# Calculate precision
precision = tp / (tp + fp) if (tp + fp) > 0 else 0

# Display precision
print(f"Precision: {precision:.3f}")


Precision: 0.917


High precision means few false alarms.

### Sensitivity (True Positive Rate):

The proportion of actual positive cases that were correctly identified

Formula: TP / (TP + FN)

Measures how well the model identifies people who actually have the condition

### Recall - means we found most of the real cases.

For example, in medical screening: it's better to catch every possible case, even if that includes a few false alarms.

### Recall = True Positives / (True Positives + False Negatives)



In [11]:
# Recall is the same as sensitivity
recall = tp / (tp + fn) if (tp + fn) > 0 else 0

# Display recall
print(f"Recall (Sensitivity): {recall:.3f}")


Recall (Sensitivity): 0.710


### The Precision-Recall Trade-Off
Improving one often lowers the other:

High Precision → fewer false positives (but might miss some real cases)

High Recall → catch more true positives (but might get more false alarms)

### Specificity (True Negative Rate):

The proportion of actual negative cases that were correctly identified

Formula: TN / (TN + FP)

Measures how well the model identifies people who don't have the condition

### Calculating Sensitivity & Specificity - Binary Classification

In [10]:
import numpy as np 
from sklearn.metrics import confusion_matrix

cm = np.array([[67, 2], [9, 22]])

# Extract values from confusion confusion_matrix
tn, fp, fn, tp = cm.ravel()

# Calculate Sensitivity (True positive rate)
sensitivity = tp / (tp + fn)

# Calculate Specificity (True Negative rate)
specificity = tn / (tn + fp)

print(f"Confusion Matrix:")
print(f"TN: {tn}, FP: {fp}")
print(f"FN: {fn}, TP: {tp}")
print(f"\nSensitivity = {tp} / ({tp} + {fn}) = {sensitivity:.3f} ({sensitivity*100:.1f}%)")
print(f"Specificity = {tn} / ({tn} + {fp}) = {specificity:.3f} ({specificity*100:.1f}%)")
print(f"\nInterpretation:")
print(f"- {sensitivity*100:.1f}% of actual positive cases were correctly identified")
print(f"- {specificity*100:.1f}% of actual negative cases were correctly identified")

Confusion Matrix:
TN: 67, FP: 2
FN: 9, TP: 22

Sensitivity = 22 / (22 + 9) = 0.710 (71.0%)
Specificity = 67 / (67 + 2) = 0.971 (97.1%)

Interpretation:
- 71.0% of actual positive cases were correctly identified
- 97.1% of actual negative cases were correctly identified


## F1 Score – The Balance Between Precision and Recall

Used to measure a model's performance in a classification task. Provides a single score that balances the trade-off between precision and recall.

F1 scores value ranged from 0 (worst performance) to 1 (best performance)

### F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

In [12]:
# Precision and Recall
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0

# F1 Score
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

# Print Results
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1 Score: {f1:.3f}")


Precision: 0.917
Recall: 0.710
F1 Score: 0.800


### Multi-class Confusion Matrix

What if there are more than two categories? 

A confusion matrix isn't limited to just two categories! If we were trying to predict if the disease is healthy, mild or severe, our confusion matrix would be 3x3.

In [None]:
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report

# Multi-class example: Disease classification (Healthy, Mild, Severe)

# Example: 3-class confusion matrix (Healthy=0, Mild=1, Severe=2)
cm_multi = np.array([[45, 5, 0],    # Healthy patients
                     [3, 32, 5],     # Mild patients  
                     [0, 2, 8]])     # Severe patients

print("Multi-class Confusion Matrix:")
print("Predicted:  Healthy  Mild  Severe")
print("Actual:")
print(f"Healthy    {cm_multi[0,0]:3d}    {cm_multi[0,1]:3d}    {cm_multi[0,2]:3d}")
print(f"Mild       {cm_multi[1,0]:3d}    {cm_multi[1,1]:3d}    {cm_multi[1,2]:3d}")
print(f"Severe     {cm_multi[2,0]:3d}    {cm_multi[2,1]:3d}    {cm_multi[2,2]:3d}")

# Calculate sensitivity for each class (one-vs-rest approach)
for i, class_name in enumerate(['Healthy', 'Mild', 'Severe']): # enumerate is used to get the index and the value of the class_name
    # True positives for this class
    tp = cm_multi[i, i]
    # False negatives (sum of row minus true positives)
    fn = np.sum(cm_multi[i, :]) - tp
    sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
    
    print(f"\n{class_name} Sensitivity = {tp} / ({tp} + {fn}) = {sensitivity:.3f} ({sensitivity*100:.1f}%)")

Multi-class Confusion Matrix:
Predicted:  Healthy  Mild  Severe
Actual:
Healthy     45      5      0
Mild         3     32      5
Severe       0      2      8

Healthy Sensitivity = 45 / (45 + 5) = 0.900 (90.0%)

Mild Sensitivity = 32 / (32 + 8) = 0.800 (80.0%)

Severe Sensitivity = 8 / (8 + 2) = 0.800 (80.0%)


Trade-off Relationship: Sensitivity and specificity often have an inverse relationship - improving one typically decreases the other


A Confusion Matrix is a fundamental tool for understanding the performance of a classification model.

--> It tells you what your algorithm did right.

--> It tells you what your algorithm did wrong.

--> It works for both binary (2x2) and multi-class (NxN) problems.

--> It is the foundation for other important metrics like Sensitivity and Specificity.