# Performance Metrics in Classification

In this video, we are going to discuss **performance metrics** which are specifically used in:

- **Binary Classification**
- **Multi-class Classification**

We will cover:

- Confusion Matrix  
- Accuracy  
- Precision  
- Recall  
- F-beta Score  

---

## Logistic Regression Recap

Logistic regression is used for **classification problems**.  
We separate categories using a **decision boundary (linear line)**.  

Example:  
- If a point lies above the line â†’ category 1  
- If a point lies below the line â†’ category 0  

To evaluate how the model is performing, we need **performance metrics**.  

For regression problems, we used:  
- $R^2$ score  
- Adjusted $R^2$ score  

For classification problems, we use:  
- Confusion Matrix  
- Accuracy  
- Precision  
- Recall  
- F-beta Score  

---

## Confusion Matrix

The **confusion matrix** is the foundation of classification metrics.  
For **binary classification**, it is a **2 Ã— 2 matrix**.

|                | **Predicted 1** | **Predicted 0** |
|----------------|-----------------|-----------------|
| **Actual 1**   | True Positive (TP) | False Negative (FN) |
| **Actual 0**   | False Positive (FP) | True Negative (TN) |

- **TP (True Positive):** Model predicts **1**, actual is **1**  
- **TN (True Negative):** Model predicts **0**, actual is **0**  
- **FP (False Positive):** Model predicts **1**, actual is **0** (Type I Error)  
- **FN (False Negative):** Model predicts **0**, actual is **1** (Type II Error)  

---

## Accuracy

Accuracy measures the fraction of **correct predictions**:

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

Example: If  
- $TP = 3$, $TN = 1$, $FP = 2$, $FN = 1$

Then:

$$
\text{Accuracy} = \frac{3 + 1}{3 + 1 + 2 + 1} = \frac{4}{7}
$$

---

## Problem with Accuracy: Imbalanced Dataset

If dataset has **imbalanced categories**, accuracy may give a **misleading result**.

Example:  
- 1000 samples â†’ 900 = Class 1, 100 = Class 0  
- If a model predicts **all as Class 1**, accuracy = $90\%$  
- But the model is **useless**, since it never predicts Class 0 correctly.  

Thus, in **imbalanced datasets**, we use **Precision and Recall**.

---

## Precision

Precision focuses on **False Positives**.  

Formula:

$$
\text{Precision} = \frac{TP}{TP + FP}
$$

Interpretation:  
- Out of all **predicted positives**, how many are actually positive?  

---

## Recall

Recall focuses on **False Negatives**.  

Formula:

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

Interpretation:  
- Out of all **actual positives**, how many are correctly predicted?  

---

## Precision vs Recall: Use Cases

### 1. Spam Classification
- If a mail is **not spam**, but predicted as **spam** â†’ **False Positive**  
- This is a **big blunder** (important mails lost).  
- **Focus:** Reduce False Positives â†’ Use **Precision**  

### 2. Medical Diagnosis (e.g., Diabetes Detection)
- If a person **has diabetes**, but model predicts **no diabetes** â†’ **False Negative**  
- This is **dangerous** (disease missed).  
- **Focus:** Reduce False Negatives â†’ Use **Recall**

---

## F-beta Score

When **both FP and FN are important**, we use **F-beta score**:

$$
F_\beta = \frac{(1 + \beta^2) \cdot (\text{Precision} \cdot \text{Recall})}{(\beta^2 \cdot \text{Precision}) + \text{Recall}}
$$

Special cases:

1. **F1 Score** (balanced case, FP = FN important):  
   $\beta = 1$

   $$
   F_1 = \frac{2 \cdot (\text{Precision} \cdot \text{Recall})}{\text{Precision} + \text{Recall}}
   $$

   â†’ Harmonic mean of Precision and Recall  

2. **F0.5 Score** (FP more important than FN):  
   $\beta = 0.5$

   $$
   F_{0.5} = \frac{1.25 \cdot (\text{Precision} \cdot \text{Recall})}{0.25 \cdot \text{Precision} + \text{Recall}}
   $$

3. **F2 Score** (FN more important than FP):  
   $\beta = 2$

   $$
   F_2 = \frac{5 \cdot (\text{Precision} \cdot \text{Recall})}{4 \cdot \text{Precision} + \text{Recall}}
   $$

---

## Summary

- **Accuracy** â†’ Use when dataset is **balanced**  
- **Precision** â†’ Use when **False Positives** are critical (e.g., spam classification)  
- **Recall** â†’ Use when **False Negatives** are critical (e.g., medical diagnosis)  
- **F-beta Score** â†’ Use when **both FP and FN are important**  

---

> Next: We will discuss the **ROC Curve** and AUC as performance metrics. ðŸš€


# Performance Metrics in Classification

In this video, we are going to discuss **performance metrics** which are specifically used in:

- **Binary Classification**
- **Multi-class Classification**

We will cover:

- Confusion Matrix  
- Accuracy  
- Precision  
- Recall  
- F-beta Score  

---

## Logistic Regression Recap

Logistic regression is used for **classification problems**.  
We separate categories using a **decision boundary (linear line)**.  

Example:  
- If a point lies above the line â†’ category 1  
- If a point lies below the line â†’ category 0  

To evaluate how the model is performing, we need **performance metrics**.  

For regression problems, we used:  
- $R^2$ score  
- Adjusted $R^2$ score  

For classification problems, we use:  
- Confusion Matrix  
- Accuracy  
- Precision  
- Recall  
- F-beta Score  

---

## Confusion Matrix

The **confusion matrix** is the foundation of classification metrics.  
For **binary classification**, it is a **2 Ã— 2 matrix**.

|                | **Predicted 1** | **Predicted 0** |
|----------------|-----------------|-----------------|
| **Actual 1**   | True Positive (TP) | False Negative (FN) |
| **Actual 0**   | False Positive (FP) | True Negative (TN) |

- **TP (True Positive):** Model predicts **1**, actual is **1**  
- **TN (True Negative):** Model predicts **0**, actual is **0**  
- **FP (False Positive):** Model predicts **1**, actual is **0** (Type I Error)  
- **FN (False Negative):** Model predicts **0**, actual is **1** (Type II Error)  

---

## Accuracy

Accuracy measures the fraction of **correct predictions**:

$$
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
$$

Example: If  
- $TP = 3$, $TN = 1$, $FP = 2$, $FN = 1$

Then:

$$
\text{Accuracy} = \frac{3 + 1}{3 + 1 + 2 + 1} = \frac{4}{7}
$$

---





# Logistic Regression for Multi-Class Classification
---

## Recap: Binary Classification with Logistic Regression

- In binary classification, the goal was to create a **best-fit line** (decision boundary) that separates two classes.  
- Example: Classes **A** and **B** divided using a line.

But what if we have **more than two categories**?  
For example, three categories: **Class 1, Class 2, Class 3**.

---

## Multi-Class Classification with Logistic Regression

When the output feature has **three or more categories**, we need a way to extend logistic regression.  

A common technique is:

### **One-vs-Rest (OvR)**

- For **k output categories**, create **k binary classifiers**.  
- Each classifier decides:  
  - **One class vs. all the others combined**  

---

## Example Dataset

Suppose we have features $f_1, f_2, f_3$ and an output variable with **3 categories**:

$$
y \in \{1, 2, 3\}
$$

### One-Hot Encoding of Output

Convert categories into **one-hot encoded vectors**:

- If $y = 1 \quad \rightarrow \quad [1, 0, 0]$  
- If $y = 2 \quad \rightarrow \quad [0, 1, 0]$  
- If $y = 3 \quad \rightarrow \quad [0, 0, 1]$

---

## OvR Model Training

We create **3 internal models**: $M_1, M_2, M_3$.

1. **Model $M_1$**  
   - Input: $(f_1, f_2, f_3)$  
   - Output: $y_1 \in \{0,1\}$  
   - Learns to separate **Class 1 vs {Class 2, Class 3}**

2. **Model $M_2$**  
   - Input: $(f_1, f_2, f_3)$  
   - Output: $y_2 \in \{0,1\}$  
   - Learns to separate **Class 2 vs {Class 1, Class 3}**

3. **Model $M_3$**  
   - Input: $(f_1, f_2, f_3)$  
   - Output: $y_3 \in \{0,1\}$  
   - Learns to separate **Class 3 vs {Class 1, Class 2}**

---

## Logistic Regression Hypothesis

For each model $M_k$:

$$
h_{\theta}^{(k)}(x) = \sigma(\theta^{(k)T} x) = \frac{1}{1 + e^{-\theta^{(k)T}x}}
$$

Where:

- $x = (f_1, f_2, f_3, 1)$ (with bias term)  
- $\theta^{(k)}$ are the parameters for model $M_k$  
- $\sigma(z) = \dfrac{1}{1 + e^{-z}}$ is the sigmoid function  

---

## Training Objective

For each classifier $M_k$, we minimize the **binary cross-entropy loss**:

$$
J(\theta^{(k)}) = - \frac{1}{m} \sum_{i=1}^m \Big[ y^{(i)}_k \cdot \log(h_{\theta}^{(k)}(x^{(i)})) \; + \; (1 - y^{(i)}_k) \cdot \log(1 - h_{\theta}^{(k)}(x^{(i)})) \Big]
$$

Where:

- $m$ = number of training samples  
- $y^{(i)}_k$ = 1 if sample $i$ belongs to class $k$, else 0  

---

## Prediction Phase

For a **new test data point $x$**:

1. Pass $x$ to all models $M_1, M_2, M_3$  
2. Each model outputs a probability:  

$$
p_1 = h_{\theta}^{(1)}(x), \quad 
p_2 = h_{\theta}^{(2)}(x), \quad 
p_3 = h_{\theta}^{(3)}(x)
$$

Example:

- $p_1 = 0.25$  
- $p_2 = 0.20$  
- $p_3 = 0.55$

3. Choose the class with **maximum probability**:

$$
\hat{y} = \arg \max_{k \in \{1,2,3\}} \; p_k
$$

Here:

$$
\hat{y} = 3 \quad \text{(since $p_3 = 0.55$ is highest)}
$$

So, the predicted category is **Class 3**.

---

## Summary: One-vs-Rest (OvR) Logistic Regression

- For $K$ classes, train **$K$ logistic regression models**  
- Each model is **binary classification (one class vs rest)**  
- During prediction, compare probabilities from all models  
- Final prediction = **class with highest probability**  

---

