

---

# Classification Metrics: Definitions and Use Cases

In supervised learning, especially binary classification, evaluating model performance requires more than a single metric like accuracy. Depending on the **class imbalance**, **misclassification cost**, and **application domain**, one or more of the following metrics are used.

Assume the binary classification confusion matrix is as follows:

|                        | Predicted Positive | Predicted Negative |
|------------------------|--------------------|--------------------|
| **Actual Positive**    | True Positive (TP) | False Negative (FN)|
| **Actual Negative**    | False Positive (FP)| True Negative (TN) |

We now define each metric rigorously.

---

## 1. Accuracy

**Definition**: The proportion of all correctly classified instances out of the total number of instances.

\[
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
\]

**When to use**: When classes are balanced and misclassification costs are symmetric.

**Limitations**: Misleading in imbalanced datasets (e.g., high accuracy possible even when minority class is poorly predicted).

---

## 2. Recall (Sensitivity, True Positive Rate)

**Definition**: The proportion of actual positive cases that are correctly identified.

\[
\text{Recall} = \frac{TP}{TP + FN}
\]

**When to use**: In scenarios where failing to detect positive cases is costly (e.g., medical diagnosis, fraud detection).

---

## 3. Precision

**Definition**: The proportion of predicted positive cases that are truly positive.

\[
\text{Precision} = \frac{TP}{TP + FP}
\]

**When to use**: In scenarios where false positives are costly (e.g., spam filtering, legal applications).

---

## 4. F1 Score

**Definition**: The harmonic mean of Precision and Recall, balancing the two.

\[
\text{F1} = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}
\]

**When to use**: When a balance between false positives and false negatives is important, especially under class imbalance.

---

## 5. Specificity (True Negative Rate)

**Definition**: The proportion of actual negatives that are correctly classified.

\[
\text{Specificity} = \frac{TN}{TN + FP}
\]

**When to use**: In domains where false positives must be minimized (e.g., judicial systems, quality control).

---

## 6. False Positive Rate (FPR)

**Definition**: The proportion of actual negatives that are incorrectly classified as positives.

\[
\text{FPR} = \frac{FP}{FP + TN}
\]

**Note**: \( \text{FPR} = 1 - \text{Specificity} \)

**When to use**: Often used in ROC analysis, where the trade-off between TPR and FPR is examined.

---

## 7. ROC AUC (Receiver Operating Characteristic - Area Under Curve)

**Definition**: The area under the ROC curve, which plots True Positive Rate (TPR) vs. False Positive Rate (FPR) at various threshold levels.

\[
\text{ROC AUC} = \int_{0}^{1} \text{TPR}(x) \, dx
\]

**When to use**: To evaluate the model’s ability to discriminate between classes, especially under class imbalance.

**Advantages**: Threshold-independent, robust to imbalance.

---

## 8. PR AUC (Precision-Recall Area Under Curve)

**Definition**: The area under the Precision-Recall curve, which plots Precision vs. Recall across thresholds.

\[
\text{PR AUC} = \int_{0}^{1} \text{Precision}(\text{Recall}) \, d(\text{Recall})
\]

**When to use**: Preferable to ROC AUC in highly imbalanced datasets, where the minority class is of greater interest.

---

## 9. G-Mean (Geometric Mean)

**Definition**: The geometric mean of Recall (TPR) and Specificity (TNR). Reflects balance in performance across both classes.

\[
\text{G-Mean} = \sqrt{ \text{Recall} \cdot \text{Specificity} } = \sqrt{ \frac{TP}{TP + FN} \cdot \frac{TN}{TN + FP} }
\]

**When to use**: Particularly useful in imbalanced classification, where balanced performance on both classes is desired.

---

# Comparative Summary Table

| **Metric**      | **Formula**                                                                 | **Sensitive to Imbalance** | **Threshold Dependent** | **Best Use Case**                                      |
|-----------------|------------------------------------------------------------------------------|-----------------------------|--------------------------|---------------------------------------------------------|
| Accuracy        | \( \frac{TP + TN}{TP + TN + FP + FN} \)                                     | Yes                         | Yes                      | Balanced datasets                                       |
| Recall (TPR)    | \( \frac{TP}{TP + FN} \)                                                     | Yes                         | Yes                      | Prioritize detecting positives                          |
| Precision       | \( \frac{TP}{TP + FP} \)                                                     | Yes                         | Yes                      | Prioritize avoiding false positives                     |
| F1 Score        | \( 2 \cdot \frac{P \cdot R}{P + R} \)                                        | Yes                         | Yes                      | Balance between Precision and Recall                    |
| Specificity     | \( \frac{TN}{TN + FP} \)                                                     | Yes                         | Yes                      | Important to avoid false positives                      |
| FPR             | \( \frac{FP}{FP + TN} \)                                                     | Yes                         | Yes                      | ROC analysis and cost-sensitive evaluation              |
| ROC AUC         | Area under ROC curve (TPR vs. FPR)                                           | No                          | No                       | Ranking ability of classifier under class imbalance     |
| PR AUC          | Area under Precision-Recall curve                                            | Yes                         | No                       | Rare class detection (e.g., fraud, disease)             |
| G-Mean          | \( \sqrt{ \text{Recall} \cdot \text{Specificity} } \)                        | Yes                         | Yes                      | Balanced performance across classes in imbalanced tasks |

---

Let me know if you'd like this in PDF or notebook format, or implemented in Python.