# ROC Curve

ROC curve (receiver operating characteristic curve) is a graphical representation used to evaluate the performance of a binary classification model for all classical threshold values. 

# Classification Threshold
A threshold in classification (especially logistic regression) is the cutoff probability that decides whether a sample is predicted as class 1 or class 0.

# ROC Curve

In binary classification, a model outputs a probability score indicating the likelihood that a given instance belongs to the positive class. To make a final classification decision (positive or negative), we need to set a threshold value. If the predicted probability is greater than or equal to the threshold, the instance is classified as positive; otherwise, it is classified as negative.

## True Positive Rate (TPR) / Sensitivity / Recall
TPR = TP / (TP + FN)
It is the proportion of actual positives that are correctly identified by the model.

example : A medical test for a disease correctly identifies 90 out of 100 patients who have the disease. The TPR (sensitivity) of the test is 90 / (90 + 10) = 0.9 or 90%.

## False Positive Rate (FPR) 
FPR = FP / (FP + TN)
It is the proportion of actual negatives that are incorrectly classified as positives by the model.

example : In the same medical test, if 20 out of 100 healthy patients are incorrectly identified as having the disease, the FPR of the test is 20 / (20 + 80) = 0.2 or 20%.

## Example :
Consider a binary classification model that predicts whether a patient has a certain disease (positive class) or not (negative class). The model outputs probabilities, and we can set different thresholds to classify patients as positive or negative.

| Threshold | TP | FP | TN | FN | TPR (Recall) | FPR |
|-----------|----|----|----|----|--------------|-----|
| 0.0       | 100| 100| 0  | 0  | 1.0          | 1.0 |
| 0.2       | 95 | 80 | 20 | 5  | 0.95         | 0.8 |
| 0.4       | 85 | 60 | 40 | 15 | 0.85         | 0.6 |
| 0.6       | 70 | 30 | 70 | 30 | 0.7          | 0.3 |


## Plotting the ROC Curve
To plot the ROC curve, we plot the TPR against the FPR at various threshold settings. The curve starts at (0,0) and ends at (1,1). A model
that performs no better than random guessing will produce a diagonal line from (0,0) to (1,1). A model with good predictive performance will have a curve that bows towards the top-left corner of the plot. 

![ROC Curve Example](./resources/ROC%20curve%20example.png)


## Area Under the Curve (AUC)
The Area Under the ROC Curve (AUC) is a single scalar value that summarizes the overall performance of the model. The AUC ranges from 0 to 1, with a value of 1 indicating perfect classification and a value of 0.5 indicating no discriminative ability (random guessing). A higher AUC value indicates better model performance.   


# Confusion Matrix
A confusion matrix is a table used to evaluate the performance of a classification model. It summarizes the counts of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) predictions made by the model.

|               | Predicted Positive | Predicted Negative |
|---------------|--------------------|--------------------|
| Actual Positive | TP                 | FN                 |
| Actual Negative | FP                 | TN                 |

![Confusion Matrix](./resources/confusion%20matrix.avif)