# ROC Curve (Receiver Operating Characteristic)

## 1. Sample Dataset

We consider a binary classification problem with 5 observations.

| ID | Actual Value ($y$) | Predicted Probability ($\hat{y}_{prob}$) | $\hat{y}_{0.7}$ | $\hat{y}_{0.5}$ | $\hat{y}_{0.3}$ |
|----|------------------|------------------------------------------|----------------|----------------|----------------|
| 1  | 1 | 0.90 | 1 | 1 | 1 |
| 2  | 0 | 0.80 | 1 | 1 | 1 |
| 3  | 1 | 0.60 | 0 | 1 | 1 |
| 4  | 0 | 0.40 | 0 | 0 | 1 |
| 5  | 1 | 0.20 | 0 | 0 | 0 |

---

## 2. Threshold-based Classification Rule

$$
\hat{y} =
\begin{cases}
1 & \text{if } \hat{y}_{prob} \ge \text{threshold} \\
0 & \text{otherwise}
\end{cases}
$$

---

## 3. Confusion Matrix Definitions

- **True Positive (TP)**: $y=1$ and $\hat{y}=1$
- **False Positive (FP)**: $y=0$ and $\hat{y}=1$
- **True Negative (TN)**: $y=0$ and $\hat{y}=0$
- **False Negative (FN)**: $y=1$ and $\hat{y}=0$

---

## 4. Evaluation Metrics

### True Positive Rate (TPR)

$$
TPR = \frac{TP}{TP + FN}
$$

### False Positive Rate (FPR)

$$
FPR = \frac{FP}{FP + TN}
$$

---

## 5. ROC Points from Different Thresholds

| Threshold | TP | FP | TN | FN | FPR | TPR |
|---------|----|----|----|----|-----|-----|
| 0.7 | 1 | 1 | 1 | 2 | 0.50 | 0.33 |
| 0.5 | 2 | 1 | 1 | 1 | 0.50 | 0.67 |
| 0.3 | 2 | 2 | 0 | 1 | 1.00 | 0.67 |

---

## 6. ROC Curve Explanation

The **ROC Curve** is a plot of:

- **X-axis**: False Positive Rate (FPR)
- **Y-axis**: True Positive Rate (TPR)

Each point on the ROC curve corresponds to a **different threshold**.

### Key Insights:
- A **random classifier** lies on the diagonal ($TPR = FPR$)
- A **better model** moves toward the **top-left corner**
- ROC curve is **threshold-independent**

---

## 7. Area Under Curve (AUC)

- **AUC = 1.0** → Perfect classifier
- **AUC = 0.5** → Random classifier
- Higher AUC indicates better class separability


![ROC_Curve](./roc_curve_2.png)