
# Performance Metrics

## 1. **Accuracy**

$$
\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}}
$$

* Measures overall correctness.
* Works well when classes are **balanced**.
* Misleading for **imbalanced datasets**.

  * Example: If 95% of emails are "ham", predicting "ham" always gives 95% accuracy but is useless.

---

## 2. **Confusion Matrix**

A **table** comparing predicted vs actual classes.

For binary classification:

|                     | Predicted Positive  | Predicted Negative  |
| ------------------- | ------------------- | ------------------- |
| **Actual Positive** | True Positive (TP)  | False Negative (FN) |
| **Actual Negative** | False Positive (FP) | True Negative (TN)  |

From this, we compute other metrics.

---

## 3. **Precision**

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

* Of all items predicted positive, how many are truly positive?
* Good when **false positives** are costly (e.g., classifying ham as spam).

---

## 4. **Recall (Sensitivity, True Positive Rate)**

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

* Of all true positives, how many did we correctly find?
* Good when **false negatives** are costly (e.g., missing a cancer diagnosis).

---

## 5. **F1 Score**

$$
F1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
$$

* Harmonic mean of precision and recall.
* Useful for **imbalanced data**.

---

## 6. **ROC Curve & AUC**

* **ROC Curve** → plots True Positive Rate (Recall) vs False Positive Rate (FP / (FP+TN)) for different probability thresholds.
* **AUC (Area Under Curve)** → measures how well the model separates classes.

  * AUC = 1 → perfect.
  * AUC = 0.5 → random guessing.

Naïve Bayes outputs probabilities ($P(y|x)$), so you can directly use ROC-AUC.

---

## 7. **Log Loss (Cross-Entropy Loss)**

$$
\text{LogLoss} = -\frac{1}{m} \sum_{j=1}^m \log P(y^{(j)} | x^{(j)})
$$

* Evaluates the **probabilistic predictions**, not just labels.
* Penalizes confident but wrong predictions.
* Useful when probability calibration matters (e.g., medical risk prediction).

---

## 8. **Calibration Metrics**

Naïve Bayes often produces **poorly calibrated probabilities** (too extreme, close to 0 or 1).

* Tools like **calibration curves** or **Brier score** check if predicted probabilities match actual outcomes.

---

**Summary**

For Naïve Bayes classification, use:

* **Accuracy** → if classes balanced.
* **Precision, Recall, F1** → if data imbalanced.
* **ROC-AUC** → for probability-based evaluation.
* **Log Loss** → if probability quality matters.
* **Calibration** → if decision thresholds rely on well-calibrated probabilities.



In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report,
    roc_curve, auc, log_loss
)
from sklearn.datasets import make_classification

# Generate synthetic binary classification dataset
X, y = make_classification(
    n_samples=500, n_features=10, n_informative=5, n_redundant=2,
    n_classes=2, weights=[0.7, 0.3], random_state=42
)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train Naive Bayes
nb = GaussianNB()
nb.fit(X_train, y_train)

# Predictions
y_pred = nb.predict(X_test)
y_proba = nb.predict_proba(X_test)[:, 1]

# Metrics
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=["Class 0", "Class 1"])
logloss_val = log_loss(y_test, y_proba)

# ROC-AUC
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)

(acc, cm, report, logloss_val, roc_auc)

(0.8866666666666667,
 array([[100,   4],
        [ 13,  33]]),
 '              precision    recall  f1-score   support\n\n     Class 0       0.88      0.96      0.92       104\n     Class 1       0.89      0.72      0.80        46\n\n    accuracy                           0.89       150\n   macro avg       0.89      0.84      0.86       150\nweighted avg       0.89      0.89      0.88       150\n',
 0.4033059439714829,
 0.8760451505016723)

### Results

* **Accuracy**: `0.887` (\~89%)

* **Confusion Matrix**:

  ```
  [[100   4]
   [ 13  33]]
  ```

  * True Negatives = 100
  * False Positives = 4
  * False Negatives = 13
  * True Positives = 33

* **Classification Report**:

  ```
              precision    recall  f1-score   support

     Class 0       0.88      0.96      0.92       104
     Class 1       0.89      0.72      0.80        46

    accuracy                           0.89       150
   macro avg       0.89      0.84      0.86       150
  ```

weighted avg       0.89      0.89      0.88       150

```

- **Log Loss**: `0.403` (lower is better; penalizes wrong confident predictions)  
- **ROC-AUC**: `0.876` (good separation; 1.0 = perfect, 0.5 = random)  

---

These metrics show:
- Model is strong overall (~89% accuracy).  
- Slight imbalance in recall → Class 1 (minority) has lower recall (0.72), meaning some positives are missed.  
- ROC-AUC confirms good probability separation.  

---

⚡ Do you want me to also **plot ROC curve + confusion matrix heatmap** for clearer visualization?
```


### **Macro Average (`macro_avg`)**

* **Definition**: Takes the **arithmetic mean** of the metric across all classes **without considering class imbalance**.

* Formula for precision (example):

  $$
  \text{Precision}_{macro} = \frac{1}{C} \sum_{i=1}^{C} \text{Precision}_i
  $$

  where $C$ = number of classes.

* **Effect**:

  * Treats **all classes equally**.
  * Useful when you want to evaluate **performance per class fairly**, even if one class has fewer samples.

👉 In your Naïve Bayes example:

* `macro avg precision = 0.89`
* `macro avg recall = 0.84`
* Shows average performance across **Class 0 and Class 1**, equally weighted.

---

### **Weighted Average (`weighted_avg`)**

* **Definition**: Takes the **support (number of true samples per class)** into account while averaging.

* Formula for precision (example):

  $$
  \text{Precision}_{weighted} = \frac{\sum_{i=1}^{C} ( \text{Support}_i \times \text{Precision}_i )}{\sum_{i=1}^{C} \text{Support}_i}
  $$

* **Effect**:

  * Gives **more importance to larger classes**.
  * If dataset is imbalanced, the metric will be skewed toward majority class.

👉 In your Naïve Bayes example:

* `weighted avg precision = 0.89`
* `weighted avg recall = 0.89`
* Since **Class 0 has 104 samples vs Class 1 has 46**, Class 0 has more influence on the weighted averages.

---

**Summary**:

* **Macro Avg** → Equal weight to each class (good for imbalanced dataset evaluation).
* **Weighted Avg** → Weighted by class size (good for overall performance reflection).

