```{contents}
```

# Evaluation Metrics 

## `accuracy_score`

**Definition:**

* Measures the **overall proportion of correct predictions**.

$$
\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total predictions}}
$$

**Example:**

```python
from sklearn.metrics import accuracy_score

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

accuracy_score(y_true, y_pred)  # Output: 0.8
```

**Interpretation:**

* 80% of predictions are correct.
* **Limitation:** For **imbalanced datasets**, accuracy can be misleading.

  * Example: If 90% of samples are class 0, predicting everything as 0 gives 90% accuracy, but the minority class is completely ignored.

---

## `confusion_matrix`

**Definition:**

* Shows the **count of true vs predicted labels**.
* For binary classification:

|             | Predicted 0    | Predicted 1    |
| ----------- | -------------- | -------------- |
| True 0 (TN) | True Negative  | False Positive |
| True 1 (TP) | False Negative | True Positive  |

**Example:**

```python
from sklearn.metrics import confusion_matrix

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

confusion_matrix(y_true, y_pred)
# Output:
# [[2 0]
#  [1 2]]
```

**Interpretation:**

* TN = 2 → True 0 predicted correctly
* FP = 0 → No 0 predicted incorrectly
* FN = 1 → One 1 predicted incorrectly
* TP = 2 → Two 1 predicted correctly

**Why it matters:**

* Helps visualize **errors by class**
* Essential for **imbalanced datasets**, as accuracy alone may be misleading.

---

## `classification_report`

**Definition:**

* Provides **precision, recall, F1-score, and support** for each class.

**Example:**

```python
from sklearn.metrics import classification_report

y_true = [0, 0, 1, 1, 1]
y_pred = [0, 0, 1, 0, 1]

print(classification_report(y_true, y_pred))
```

**Output:**

```
              precision    recall  f1-score   support

           0       0.67      1.00      0.80         2
           1       1.00      0.67      0.80         3

    accuracy                           0.80         5
   macro avg       0.83      0.83      0.80         5
weighted avg       0.87      0.80      0.80         5
```

**Interpretation:**

| Metric        | Meaning                                                                                                       |
| ------------- | ------------------------------------------------------------------------------------------------------------- |
| **Precision** | Out of all predicted as class X, how many were actually X. <br> High precision → few false positives.         |
| **Recall**    | Out of all actual class X samples, how many were correctly predicted. <br> High recall → few false negatives. |
| **F1-score**  | Harmonic mean of precision and recall. Balances the two.                                                      |
| **Support**   | Number of true samples for each class.                                                                        |

**Imbalanced datasets:**

* F1-score is more informative than accuracy.
* Weighted or macro averages help summarize overall performance.

---

### **Summary Table for Quick Reference**

| Metric                  | Best Used For                         | Interpretation in Imbalanced Data           |
| ----------------------- | ------------------------------------- | ------------------------------------------- |
| `accuracy_score`        | Overall correctness                   | Can be misleading if classes are imbalanced |
| `confusion_matrix`      | Counts of TP, TN, FP, FN              | Shows where the model is failing            |
| `classification_report` | Precision, Recall, F1-score per class | Gives balanced evaluation across classes    |

