# Recall, Precision, PR Curve and ROC Curve Explained

Sources:

- Doug Steen, Precision-Recall Curves, https://medium.com/@douglaspsteen/precision-recall-curves-d32e5b290248

- Juan C Olamendy, Choosing the Right Metrics: Recall, Precision, PR Curve and ROC Curve Explained, https://medium.com/@douglaspsteen/precision-recall-curves-d32e5b290248

- Chris Kuo/Dr. Dataman, Revisiting the ROC and the Precision-Recall Curves, https://medium.com/dataman-in-ai/revisiting-the-roc-and-the-precision-recall-curves-f9c4975b1dd

- Maria Gusarova, Understanding AUC — ROC and Precision-Recall Curves, https://medium.com/@data.science.enthusiast/auc-roc-curve-ae9180eaf4f7

- Juan Esteban de la Calle, How and Why I Switched from the ROC Curve to the Precision-Recall Curve to Analyze My Imbalanced Models: A Deep Dive, https://juandelacalle.medium.com/how-and-why-i-switched-from-the-roc-curve-to-the-precision-recall-curve-to-analyze-my-imbalanced-6171da91c6b8


#### Precision and Recall

**Recall**:

Recall = True Positives / (True Positives + False Negatives)

**Precision:**

Precision = True Positives / (True Positives + False Positives)

**Example:**
- If there were 100 people with a disease and the test correctly identified 80 of them, the recall would be 0.8.
- If the test predicted that 50 people had the disease, but only 30 of them actually did, the precision would be 0.6.


In [None]:
from IPython.display import Image
Image(filename="recallprecision1.png")

In [None]:
from sklearn.metrics import precision_score, recall_score
import matplotlib.pyplot as plt

In [None]:
# Assume you have the true labels (y_true) and predicted labels (y_pred)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 1, 1, 1, 0, 0, 0, 1, 1, 0]
# Calculate precision
precision = precision_score(y_true, y_pred)
print(f"Precision: {precision:.2f}")
# Calculate recall
recall = recall_score(y_true, y_pred)
print(f"Recall: {recall:.2f}")

#### The Precision-Recall (PR) Curve


- In a **PR curve**, precision is plotted on the y-axis, and recall is plotted on the x-axis. 
- Each point on the curve represents a different threshold value. 
- As the threshold varies, the balance between precision and recall changes.
- **High Precision and Low Recall:** This indicates that the model is very accurate in its positive predictions but fails to capture a significant number of actual positive cases.
- **Low Precision and High Recall:** This suggests that the model captures most of the positive cases but at the expense of making more false positive errors.

In [None]:
from IPython.display import Image
Image(filename="recallprecision2.png")

In [None]:
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

In [None]:
# Train a Logistic Regression classifier
# model = LogisticRegression()
# model.fit(X_train, y_train)
# # Predict probabilities for the test set
# y_scores = model.predict_proba(X_test)[:, 1]  # Get probabilities for the positive class
# Assume you have the true labels (y_true) and predicted probabilities (y_scores)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_scores = [0.8, 0.6, 0.9, 0.7, 0.4, 0.6, 0.3, 0.5, 0.8, 0.2]
# Compute precision-recall curve
# precision is an array of precision values at different thresholds.
# recall is an array of recall values at different thresholds.
# thresholds is an array of threshold values used to compute precision and recall.
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)

In [None]:
# Plot precision-recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, marker='.')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc='lower left')
plt.grid(True)
plt.show()

- The **PR curve** can be used to select an appropriate threshold for making predictions. 
- By examining the curve, you can find the point where precision begins to drop significantly and set the threshold just before this drop.
- This allows you to balance both precision and recall effectively. 
- Once the threshold is identified, predictions can be made by checking whether the model’s score for each instance is greater than or equal to this threshold.

**PR-AUC (Area Under the PR Curve):**
- Summary metric that captures the model’s performance across all thresholds.
- **Perfect classifier: PR-AUC = 1.0** (perfect precision and recall at all thresholds).
- **Random classifier: PR-AUC equal to the proportion of positive labels in the dataset** (no better than chance performance).

#### The Receiver Operating Characteristic (ROC) Curve

- Evaluate binary classification models.

- Plot the **True Positive Rate (TPR)** against the **False Positive Rate (FPR)** at various threshold settings.

- **True Positive Rate (TPR):**
    * Also called **recall** or **sensitivity.**
    * Ratio of positive instances correctly classified as positive
    
- **True Negative Rate (TNR):**
    * Also called **specificity.**
    * Ratio of negative instances correctly classified as negative

- **False Positive Rate (FPR):**
    * Ratio of negative instances that are incorrectly classified as positive. 
    * Equal to 1 — True Negative Rate (TNR).
      
- **ROC-AUC (Area Under the ROC Curve):**
    * A single scalar value that summarizes the overall ability of the model to discriminate between the positive and negative classes over all possible thresholds.

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score

In [None]:
## Predict probabilities for the test set
# y_scores = model.predict_proba(X_test)[:, 1]  # probabilities for the positive class
# Assume you have the true labels (y_true) and predicted probabilities (y_scores)
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_scores = [0.8, 0.6, 0.9, 0.7, 0.4, 0.6, 0.3, 0.5, 0.8, 0.2]
# Compute ROC curve and AUC score
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
roc_auc = roc_auc_score(y_true, y_scores)

In [None]:
# Plot ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', lw=2, linestyle='--', label='Random Guess')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

#### Curve Analysis of ROC Curve

- **A curve closer to the top-left corner** indicates a high sensitivity and specificity: the model is effective in classifying both classes correctly.
- **A higher curve** indicates better performance, with the ideal point being in the top left corner of the plot (high TPR, low FPR).
- **A curve near the diagonal line** (from bottom-left to top-right) indicates that the classifier is performing no better than random guessing.
- **ROC-AUC (Area Under the ROC Curve)** ranges from 0.0 to 1.0:
    * 0.5: This indicates a model with no discriminative ability, equivalent to random guessing.
    * 1.0: This represents a perfect model that correctly classifies all positive and negative instances.
    * < 0.5: This suggests a model that performs worse than random chance, often indicating serious issues in model training or data handling.
    * The **ROC-AUC** is particularly useful in scenarios where the class distribution is imbalanced. 
    * **ROC-AUC** is not affected by the proportion of positive and negative instances.

#### Key Benefits of Using **ROC-AUC**

- **Robust to Class Imbalance:** Unlike accuracy, ROC-AUC is not influenced by the number of cases in each class, making it suitable for imbalanced datasets.
- **Threshold Independence:** It evaluates the model’s performance across all possible thresholds, providing a comprehensive measure of its effectiveness.
- **Scale Invariance:** The ROC-AUC is not affected by the scale of the scores or probabilities generated by the model, assessing performance based on the ranking of predictions.

#### Threshold Selection using the ROC Curve

The ROC curve can be used to select an appropriate threshold for making predictions:

- Lowering the threshold means the model starts classifying more instances as positive, increasing recall but potentially decreasing precision. 
- The trade-off between precision and recall needs to be managed carefully based on the application’s tolerance for false positives.
- The point where the precision and recall curves cross might be considered an optimal balance, especially when false positives and false negatives carry similar costs.

#### PR Curve vs. ROC Curve

**When to Use the PR Curve:**

- **Imbalanced Datasets:** 
    * When the positive class is rare, and the dataset is heavily imbalanced, the PR curve is more informative than the ROC curve. 
    * In imbalanced datasets with rare positive instances, the ROC curve can be misleading, showing high performance even if the model performs poorly on the minority class.
    * Examples: fraud detection, disease diagnosis. 
- **Costly False Positives:** If false positives are more costly or significant than false negatives, the PR curve is more suitable as it focuses on precision. Example: spam email detection.

**When to Use the ROC Curve:**

- **More Balanced Datasets:** When the dataset is more balanced or when equal emphasis is placed on the performance regarding both false positives and false negatives, the ROC curve is preferred.

#### Example: Heart Disease Diagnosis

In [None]:
!pip install ucimlrepo

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
heart_disease = fetch_ucirepo(id=45) 
  
# data (as pandas dataframes) 
X = heart_disease.data.features 
y = heart_disease.data.targets 
  
# metadata 
print(heart_disease.metadata) 
  
# variable information 
print(heart_disease.variables) 

In [None]:
# Split data into train and test sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=56)

In [None]:
# Fit a vanilla Logistic Regression classifier and make predictions
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_pred_test = clf.predict(X_test)

In [None]:
# Function to calculate Precision and Recall

def calc_precision_recall(y_true, y_pred):
    
    # Convert predictions to series with index matching y_true:
    y_pred = pd.Series(y_pred, index=y_true.index)
    
    # Instantiate counters:
    TP = 0
    FP = 0
    FN = 0

    # Determine whether each prediction is TP, FP, TN, or FN:
    for i in y_true.index: 
        if y_true[i]==y_pred[i]==1:
           TP += 1
        if y_pred[i]==1 and y_true[i]!=y_pred[i]:
           FP += 1
        if y_pred[i]==0 and y_test[i]!=y_pred[i]:
           FN += 1
    
    # Calculate true positive rate and false positive rate
    # Use try-except statements to avoid problem of dividing by 0:
    try:
        precision = TP / (TP + FP)
    except:
        precision = 1
    
    try:
        recall = TP / (TP + FN)
    except:
        recall = 1

    return precision, recall

# Test the function:
calc_precision_recall(y_test, y_pred_test)

In [None]:
# LOGISTIC REGRESSION (NO REGULARIZATION)

# Fit and predict test class probabilities:
lr = LogisticRegression(max_iter=10000, penalty='none')
lr.fit(X_train, y_train)
y_test_probs = lr.predict_proba(X_test)[:,1]

# Containers for true positive / false positive rates:
precision_scores = []
recall_scores = []

# Define probability thresholds to use, between 0 and 1:
probability_thresholds = np.linspace(0, 1, num=100)

# Find true positive / false positive rate for each threshold:
for p in probability_thresholds:
    
    y_test_preds = []
    
    for prob in y_test_probs:
        if prob > p:
            y_test_preds.append(1)
        else:
            y_test_preds.append(0)
            
    precision, recall = calc_precision_recall(y_test, y_test_preds)
        
    precision_scores.append(precision)
    recall_scores.append(recall)

In [None]:
# LOGISTIC REGRESSION (L2 REGULARIZATION)

# Fit and predict test class probabilities
lr_l2 = LogisticRegression(max_iter=1000, penalty='l2')
lr_l2.fit(X_train, y_train)
y_test_probs = lr_l2.predict_proba(X_test)[:,1]

# Containers for true positive / false positive rates
l2_precision_scores = []
l2_recall_scores = []

# Define probability thresholds to use, between 0 and 1
probability_thresholds = np.linspace(0,1,num=100)

# Find true positive / false positive rate for each threshold
for p in probability_thresholds:
    
    y_test_preds = []
    
    for prob in y_test_probs:
        if prob > p:
            y_test_preds.append(1)
        else:
            y_test_preds.append(0)
            
    precision, recall = calc_precision_recall(y_test, y_test_preds)
        
    l2_precision_scores.append(precision)
    l2_recall_scores.append(recall)

In [None]:
# Plot precision-recall curve

fig, ax = plt.subplots(figsize=(6,6))
ax.plot(recall_scores, precision_scores, label='Logistic Regression')
ax.plot(l2_recall_scores, l2_precision_scores, label='L2 Logistic Regression')
baseline = len(y_test[y_test==1]) / len(y_test)
ax.plot([0, 1], [baseline, baseline], linestyle='--', label='Baseline')
ax.set_xlabel('Recall')
ax.set_ylabel('Precision')
ax.legend(loc='center left')

In [None]:
# Get AUC-PR scores

from sklearn.metrics import auc, average_precision_score

print(f'LR (No reg.) AUC-PR: {round(auc(recall_scores, precision_scores),2)}')
print(f'LR(L2 reg.) AUC-PR: {round(auc(l2_recall_scores, l2_precision_scores),2)}')
print('\n')
print(f'LR (No reg.) Avg. Prec.: {round(average_precision_score(y_test, lr.predict_proba(X_test)[:,1]),2)}')
print(f'LR (L2 reg.) Avg. Prec.: {round(average_precision_score(y_test, lr_l2.predict_proba(X_test)[:,1]),2)}')

In [None]:
# Use sklearn to plot precision-recall curves and ROC curves

# from sklearn.metrics import plot_precision_recall_curve
# plot_precision_recall_curve(lr, X_test, y_test, name = 'Logistic Regression')
# plot_precision_recall_curve(lr_l2, X_test, y_test, name = 'L2 Logistic Regression')

from sklearn.metrics import PrecisionRecallDisplay, RocCurveDisplay
PrecisionRecallDisplay.from_estimator(lr, X_test, y_test, name = 'Logistic Regression')
PrecisionRecallDisplay.from_estimator(lr_l2, X_test, y_test, name = 'L2 Logistic Regression')

RocCurveDisplay.from_estimator(lr, X_test, y_test, name = 'Logistic Regression')
RocCurveDisplay.from_estimator(lr_l2, X_test, y_test, name = 'L2 Logistic Regression')