## 🧠 Classification Metrics Overview

Classification metrics help evaluate how well a model performs when predicting categorical (discrete) outcomes, such as spam vs. not spam or disease vs. no disease.

---

### 🔹 1. Confusion Matrix

A 2x2 matrix for binary classification:

|                | Predicted Positive | Predicted Negative |
|----------------|--------------------|--------------------|
| **Actual Positive** | True Positive (TP)   | False Negative (FN)  |
| **Actual Negative** | False Positive (FP)  | True Negative (TN)   |

---

### 🔸 2. Accuracy

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

- Measures overall correctness
- Can be **misleading** if classes are imbalanced

---

### 🔸 3. Precision

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

- Of all predicted positives, how many were actually correct?
- High precision = **low false positive rate**

---

### 🔸 4. Recall (Sensitivity or True Positive Rate)

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

- Of all actual positives, how many were correctly predicted?
- High recall = **low false negative rate**

---

### 🔸 5. F1 Score

$$
\text{F1 Score} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

- Harmonic mean of precision and recall
- Best when classes are **imbalanced**

---

### 🔸 6. ROC Curve and AUC

- **ROC Curve**: Plots True Positive Rate (Recall) vs. False Positive Rate
- **AUC (Area Under the Curve)**: Measures the model's ability to distinguish between classes

---

### 🔸 7. Additional Metrics

- **Specificity** = \(\frac{TN}{TN + FP}\): True negative rate
- **Log Loss**: Penalizes false confident predictions
- **Balanced Accuracy**: Average of recall across all classes (good for imbalanced data)



# Import confusion matrix
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

knn = KNeighborsClassifier(n_neighbors=6)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Fit the model to the training data
knn.fit(X_train, y_train)

# Predict the labels of the test data: y_pred
y_pred = knn.predict(X_test)

# Generate the confusion matrix and classification report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
```

```python
# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Calculate roc_auc_score
print(roc_auc_score(y_test, y_pred_probs))

# Calculate the confusion matrix
print(confusion_matrix(y_test, y_pred))

# Calculate the classification report
print(classification_report(y_test, y_pred))

```