# ROC Curve

The following care the components of ROC Curve.
- The $y$ axis represents $tpr$ or True Positive Rate
- The $x$ axis represents $fpr$ or False Positive Rate


## TPR | $y$ axis
- $tpr = \frac{tp}{tp + fn}$
- Where
    - $tp \rightarrow$ Number of actual positives classified as positives
    - $fn \rightarrow$ Number of actual positives classified as negatives

## FPR | $x$ axis
- $fpr = \frac{fp}{fp + tn}$
- Where
    - $fp \rightarrow$ Number of actual negatives classified as positives
    - $tn \rightarrow$ Number of actual negatives classified as negatives

## However $...$
- For a given classifier, we only get one of each $(tp, fn, fp, tn)$


## Threshold Sweep
- In order create the plot, we need to test the classifier at various thresholds
    - with the range $[0, 1]$

## Summary
- ROC Curve is $tpr$ vs $fpr$ after a threshold sweep $[0, 1]$

In [6]:
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix


def confusion(df, threshold):
    df['predicted'] = (df['scores'] >= threshold).astype(int)
    cm = confusion_matrix(df['labels'], df['predicted'])
    print(df, end='\n\n')

    # Extract values from confusion matrix
    TN, FP, FN, TP = cm.ravel()

    # Calculate metrics
    TPR = TP / (TP + FN) if (TP + FN) > 0 else 0
    FPR = FP / (FP + TN) if (FP + TN) > 0 else 0
    TNR = TN / (TN + FP) if (TN + FP) > 0 else 0  # Specificity
    precision = TP / (TP + FP) if (TP + FP) > 0 else 0

    # Print metrics clearly
    print(" " * 10 + f"METRICS")
    print("-" * 25)
    print(f"Threshold         | {threshold:.3f}")
    print(f"TPR (Recall)      | {TPR:.3f}")
    print(f"FPR               | {FPR:.3f}")
    print(f"TNR (Specificity) | {TNR:.3f}")
    print(f"Precision         | {precision:.3f}")
    accuracy = (TP + TN) / (TP + TN + FP + FN)
    print(f"Accuracy          | {accuracy:.3f}")
    print("-" * 25, end='\n\n')

    # Create confusion matrix DataFrame
    cm_df = pd.DataFrame(
        cm,
        columns=['Predicted Negative', 'Predicted Positive'],
        index=['Actual Negative', 'Actual Positive']
    )

    return cm_df


# Perfect classifier - no overlap
perfect_negatives = [0.1, 0.2, 0.3, 0.4]  # All low scores
perfect_positives = [0.6, 0.7, 0.8, 0.9]  # All high scores

labels = [0, 0, 0, 0, 1, 1, 1, 1]  # 4 negatives, 4 positives
scores = perfect_negatives + perfect_positives

df = pd.DataFrame({"labels": labels, "scores": scores})

confusion(df, 0.5)

   labels  scores  predicted
0       0     0.1          0
1       0     0.2          0
2       0     0.3          0
3       0     0.4          0
4       1     0.6          1
5       1     0.7          1
6       1     0.8          1
7       1     0.9          1

          METRICS
-------------------------
Threshold         | 0.500
TPR (Recall)      | 1.000
FPR               | 0.000
TNR (Specificity) | 1.000
Precision         | 1.000
Accuracy          | 1.000
-------------------------



Unnamed: 0,Predicted Negative,Predicted Positive
Actual Negative,4,0
Actual Positive,0,4
