# Evaluation Metrics for Classification

1. Confusion Matrix
2. Accuracy / Missclassification Rate
3. Precision
4. Recall
5. F- Beta Score
6. True Position Rate (Senstivity)
7. False Position Rate
8. True Negative Rate (Specificity)
9. ROC-AUC
10. Precision-Recall / Senstivity-Specificity Trade-off
---

## 1. Confusion Matrix


A **confusion matrix** is a 2x2 table used to check how good a classification model is.
It shows where the model is correct and where it is confused.

Imagine a binary classification (Yes/No).


- **Confusion Matrix (Markdown Table)**

| Predicted \ Actual | Actual 1 (Positive) | Actual 0 (Negative) |
|--------------------|----------------------|----------------------|
| **Predicted 1**    | TP (True Positive)   | FP (False Positive)  |
| **Predicted 0**    | FN (False Negative)  | TN (True Negative)   |


### TP, FP, FN, TN Explanation Table

| Term | Full Form        | Model Prediction | Actual Value | Meaning                  |
|------|------------------|------------------|--------------|---------------------------|
| TP   | True Positive     | 1                | 1            | Correct positive prediction |
| FP   | False Positive    | 1                | 0            | Incorrect positive prediction |
| FN   | False Negative    | 0                | 1            | Model missed a real positive   |
| TN   | True Negative     | 0                | 0            | Correct negative prediction |



### Reading of Matrix

- **TP high** :  model is good at catching positives
- **TN high** : model is good at catching negatives
- **FP high** : model gives too many false alarms
- **FN high** : model misses important cases
---

## 2. Accuracy

Accuracy tells how much the model is correct overall.
It measures the percentage of total correct predictions.

- **Formula**
- Accuracy = `(TP + TN) / (TP + FP + FN + TN)`

- Where:
    - TP = True Positive
    - TN = True Negative
    - FP = False Positive
    - FN = False Negative

- **Interpretation**
    - High Accuracy : Model is performing well overall.
    - Low Accuracy : Model is making many mistakes.
    - Works best when dataset is balanced (equal positives & negatives).

- **Example Understanding**
- If model made:
    - TP = 30  
    - TN = 50  
    - FP = 10  
    - FN = 10  

- Then:
    - Accuracy = (30 + 50) / (30 + 50 + 10 + 10)  
    - Accuracy = 80 / 100  
Accuracy = **0.80 (80%)**


Accuracy is correct classification rate.
- Opposite of Accuracy/ Classification Rate : Missclassification Rate = `All wrong Prediction / Total datapoints`
---

## 3. Precision

- **Formula**
Precision = `TP / (TP + FP)`

- **Meaning of Precision Components**

| Term | Full Form       | Meaning                                      |
|------|------------------|----------------------------------------------|
| TP   | True Positive     | Correctly predicted positive cases           |
| FP   | False Positive    | Incorrectly predicted positive cases         |


- **What Precision Tells**
    - Out of all **predicted positives**, how many were actually positive?
    - Focuses on *positive prediction correctness*.
    - High Precision = fewer false alarms.


- **Example**
    - If TP = 45 and FP = 5:

    - Precision = 45 / (45 + 5)  
    - Precision = 45 / 50  
    - Precision = **0.90 (90%)**

- **When Precision Matters?**
    - When false positives are dangerous
  (like disease diagnosis, fraud alert, spam detection).
    - Whan all class 1 dps are correctly predicted and all class 0 dps are wrongly predicted.
---


## 4. Recall

- **Formula**
Recall = `TP / (TP + FN)`

- **Meaning of Recall Components**

| Term | Full Form        | Meaning                                      |
|------|-------------------|----------------------------------------------|
| TP   | True Positive     | Correctly predicted positive cases           |
| FN   | False Negative    | Model missed actual positive cases           |

- **What Recall Tells**
    - Out of all **actual positives**, how many did the model correctly catch?
    - Measures the model’s ability to **find positives**.
    - High Recall = model rarely misses positives.

- **Example**
    - If TP = 30 and FN = 20:

    - Recall = 30 / (30 + 20)  
    - Recall = 30 / 50  
    - Recall = **0.60 (60%)**

- **When Recall Matters?**
    - When missing a positive is dangerous  
  (disease detection, fraud detection, cancer detection).


- **Note** :
    - Both **Precision** and **Recall** have seen in terms of class 1.

---

## 5. F-Beta Score

- **Definition**

F-Beta Score is a metric that combines **Precision** and **Recall**.
It allows you to decide which one is more important by adjusting **β (beta)**.

    - If β > 1 : Give more importance to **Recall**
    - If β < 1 : Give more importance to **Precision**
    - If β = 1 : It becomes **F1 Score** (balanced)

- **Formula**

Fβ = (1 + β²) * ( (Precision * Recall) / ( (β² * Precision) + Recall ) )


- **Meaning of Beta (β)**

| β Value | Priority | Meaning                                      |
|---------|----------|----------------------------------------------|
| β < 1   | Precision | You care more about minimizing false positives |
| β = 1   | Balanced  | Precision and Recall equally important (F1)   |
| β > 1   | Recall    | You care more about minimizing false negatives |


- **Component Terms**

| Term       | Meaning                                      |
|------------|----------------------------------------------|
| Precision  | TP / (TP + FP) — correctness of predicted positives |
| Recall     | TP / (TP + FN) — coverage of actual positives     |

- **Example**
Suppose:
    - Precision = 0.80  
    - Recall = 0.60  
    - β = 2 → Recall is more important

- F2 = (1 + 2²) * ( (0.80 * 0.60) / ( (2² * 0.80) + 0.60 ) )  
- F2 = 5 * (0.48 / (3.2 + 0.6))  
- F2 = 5 * (0.48 / 3.8)  
- F2 ≈ **0.63**

- **When F-Beta is Useful?**

| Scenario                          | Metric Focus                  |
|-----------------------------------|-------------------------------|
| Missing positives is dangerous    | High Recall (β > 1)           |
| False alarms are dangerous        | High Precision (β < 1)        |
| Balanced tasks                    | β = 1 (F1 Score)              |

- **Quick Summary**
- F-Beta adjusts the balance between Precision and Recall.  
- β controls “which metric matters more”.  
- Useful when dataset is imbalanced.

---

## 6. True Positive Rate

- **Other Names**
    - TPR = Recall
    - TPR = Sensitivity
    - TPR = Hit Rate

All three mean the same thing.


- **Formula**
`TPR = TP / (TP + FN)`


- **Meaning of Terms**

| Term | Full Form        | Meaning                                      |
|------|-------------------|----------------------------------------------|
| TP   | True Positive     | Model correctly predicted positive cases     |
| FN   | False Negative    | Model missed actual positive cases           |


- **What TPR Tells**
    - Out of **all actual positive cases**, how many the model successfully caught.
    - Measures **how sensitive** the model is to detecting positives.

    - High TPR : Model rarely misses positives  
    - Low TPR : Model misses many actual positives

- **Example**
- If TP = 45 and FN = 5:

    - TPR = 45 / (45 + 5)  
    - TPR = 45 / 50  
    - TPR = **0.90 (90%)**

- **When TPR Matters**
    - Disease detection
    - Fraud detection
    - Any case where **missing a positive is dangerous**

---


## 7. False Positive Rate

- **Other Names**
    - FPR = Fall-Out
    - FPR = False Alarm Rate

All refer to the same concept.

- **Formula**
`FPR = FP / (FP + TN)`

- **Meaning of Terms**

| Term | Full Form        | Meaning                                      |
|------|-------------------|----------------------------------------------|
| FP   | False Positive    | Model predicted positive but was wrong       |
| TN   | True Negative     | Model correctly predicted negative           |

- **What FPR Tells**
    - Out of all **actual negative cases**, how many the model incorrectly marked as positive.
    - Measures how often the model **raises false alarms**.

    - High FPR : Model gives too many false positives  
    - Low FPR : Model avoids false alarms well

- **Example**
- If FP = 10 and TN = 90:

    - FPR = 10 / (10 + 90)  
    - FPR = 10 / 100  
    - FPR = **0.10 (10%)**

- **When FPR Matters?**
    - Spam filters (don’t mark real emails as spam)
    - Security systems (don’t trigger alerts unnecessarily)
    - Any scenario where false alarms cause problems

---


## 8. True Negative Rate

- **Formula**
`TNR = TN / (TN + FP)`

- **Meaning of Terms**

| Term | Full Form        | Meaning                                        |
|------|-------------------|------------------------------------------------|
| TN   | True Negative     | Model correctly predicted negative cases       |
| FP   | False Positive    | Model predicted positive but was wrong         |


- **What TNR Tells**
    - Out of all **actual negative cases**, how many were correctly predicted as negative.
    - Measures the model’s ability to **avoid false alarms**.

    - High TNR : Model rarely gives false positives  
    - Low TNR → Model keeps raising unnecessary alerts

- **Example**
- If TN = 90 and FP = 10:

    - TNR = 90 / (90 + 10)  
    - TNR = 90 / 100  
    - TNR = **0.90 (90%)**
- **Other Names**
    - Specificity
    - True Negative Proportion
---

## 9. False Negative Rate

- **Formula**
`FNR = FN / (TP + FN)`


- **Meaning of Terms**

| Term | Full Form        | Meaning                                      |
|------|-------------------|----------------------------------------------|
| FN   | False Negative    | Model predicted negative but it was positive |
| TP   | True Positive     | Model correctly predicted positive           |


- **What FNR Tells**
    - Out of all **actual positive cases**, how many the model **failed to detect**.
    - Measures how often the model **misses positive cases**.

    - High FNR : Model misses many positives  
    - Low FNR : Model catches positives well

- **Example**
- If FN = 20 and TP = 80:

    - FNR = 20 / (80 + 20)  
    - FNR = 20 / 100  
    - FNR = **0.20 (20%)**


- **Other Names**
    - Miss Rate
    - False Negative Proportion
---

## 10. ROC-AUC

- **ROC**
    - ROC = Receiver Operating Characteristic Curve

    - It is a graph that shows the performance of a classification model at different threshold values.


- **ROC Curve Axes**

| Axis | Full Form               | What It Represents                     |
|------|--------------------------|-----------------------------------------|
| X    | False Positive Rate (FPR) | Wrong positives (FP / (FP + TN))        |
| Y    | True Positive Rate (TPR)  | Correct positives (TP / (TP + FN))      |

- ROC curve = plot of **TPR (y-axis)** vs **FPR (x-axis)**.


- **What is AUC?**
    - AUC = Area Under the ROC Curve

    - It measures **how well the model separates classes**.


- **AUC Interpretation**

| AUC Value | Model Meaning                                      |
|-----------|-----------------------------------------------------|
| 1.0       | Perfect classifier                                  |
| 0.9 - 1.0 | Excellent                                           |
| 0.8 - 0.9 | Good                                               |
| 0.7 - 0.8 | Fair                                               |
| 0.5 - 0.7 | Poor                                               |
| 0.5       | No skill (same as random guessing)                  |
| < 0.5     | Worse than random                                   |

- Higher AUC = Better model performance.


- **Why ROC–AUC is Useful?**

    - Works well even for **imbalanced datasets**  
    - Looks at **all threshold values**, not just one  
    - Helps compare multiple models easily  


- **Key Terms Used in ROC**

| Term | Full Form            | Formula                      |
|------|-----------------------|-------------------------------|
| TPR  | True Positive Rate    | TP / (TP + FN) (Recall)       |
| FPR  | False Positive Rate   | FP / (FP + TN)                |
| TNR  | True Negative Rate    | TN / (TN + FP)                |
| FNR  | False Negative Rate   | FN / (TP + FN)                |

---



## 11. Precision-Recall Trade Off

- **What is the Trade-off?**
    - Precision and Recall cannot increase together all the time.
    - When you try to increase one, the other often decreases.

- *Reason:*
    - Both depend on the **classification threshold**.
    - Changing threshold changes TP, FP, FN → which affects Precision & Recall differently.


- **Effect of Threshold on Precision & Recall**

| Threshold | Precision Effect                     | Recall Effect                          |
|-----------|---------------------------------------|-----------------------------------------|
| High      | Higher Precision                      | Lower Recall                            |
| Low       | Lower Precision                       | Higher Recall                           |


- **Why This Happens?**

| Metric     | Increases When…                                 |
|------------|--------------------------------------------------|
| Precision  | You predict fewer positives → fewer false alarms |
| Recall     | You predict more positives → catch more positives |

    So:
        - Predict **more positives** : Recall ↑ , Precision ↓  
        - Predict **fewer positives** : Precision ↑ , Recall ↓  


- **Example Understanding**
    - If you want **high precision** (fewer false positives):
        - Model becomes very strict.
        - Predicts positive only when very sure.
        - Misses some real positives → Recall drops.

    - If you want **high recall** (catch all positives):
        - Model becomes generous.
        - Predicts positive more often.
        - Creates more false positives → Precision drops.

- **When to Prefer What?**

| Situation                                 | Focus On     | Reason                                      |
|--------------------------------------------|--------------|---------------------------------------------|
| Disease detection                          | Recall ↑     | Missing a positive is dangerous             |
| Fraud detection                            | Recall ↑     | Better to investigate extra cases           |
| Spam detection                             | Precision ↑  | Don't mark real emails as spam              |
| Job candidate filtering                    | Precision ↑  | Only select strong candidates               |


- **Summary**
    - Precision and Recall fight over threshold.  
    - Increase one → the other often decreases.  
    - Choose based on what mistake is more expensive:
        - **False Positive expensive : Precision**
        - **False Negative expensive : Recall**


# Summary of Evaluation matrix

## MEGA SUMMARY TABLE — ALL CLASSIFICATION METRICS

| Metric | Full Form / Other Names | Formula | Uses / What It Tells | High Value Means | Low Value Means |
|--------|---------------------------|---------|------------------------|-------------------|------------------|
| **TPR** | True Positive Rate / Recall / Sensitivity | TP / (TP + FN) | How many actual positives the model catches | Few positives missed | Many positives missed |
| **FNR** | False Negative Rate / Miss Rate | FN / (TP + FN) | How many positives the model failed to detect | Many missed positives | Few missed positives |
| **TNR** | True Negative Rate / Specificity | TN / (TN + FP) | How well the model avoids false alarms | Few false positives | Many false positives |
| **FPR** | False Positive Rate / Fall-Out | FP / (FP + TN) | How often model raises false alarms | Many false positives | Few false positives |
| **Precision** | Positive Predictive Value | TP / (TP + FP) | How many predicted positives were correct | Accurate positive predictions | Many wrong positive predictions |
| **Accuracy** | Overall correctness | (TP + TN) / (TP + FP + FN + TN) | How often model is right overall | Good general performance | Many overall mistakes |
| **F1 Score** | Harmonic mean of Precision & Recall | 2 * (P * R) / (P + R) | Balances precision and recall | Balanced P & R | Imbalance between P & R |
| **F-Beta** | Weighted F-score | (1+β²) * (PR / (β²P + R)) | Choose P or R importance | Controlled balance | Wrong balance |
| **ROC Curve** | Receiver Operating Characteristic | TPR vs FPR plot | Performance at all thresholds | Better separation | Poor separation |
| **AUC** | Area Under ROC Curve | Area value | How well model separates classes | Excellent classifier | Bad classifier |



## QUICK MAPPING OF TERMS

| Term | Meaning |
|------|---------|
| TP | Correct positive prediction |
| TN | Correct negative prediction |
| FP | Wrong positive prediction |
| FN | Missed positive prediction |


## MEMORY CHEATSHEET (Super Short)

- **Precision** : Out of predicted positives, how many are real  
- **Recall (TPR)** : Out of actual positives, how many caught  
- **TNR (Specificity)** : Out of actual negatives, how many caught  
- **FPR** : Out of actual negatives, how many wrongly marked positive  
- **FNR** : Out of actual positives, how many missed  
- **F1** : Balance of Precision + Recall  
- **AUC** : Overall class separation quality  

