# Binary Classification Threshold Tuning

In this notebook we'll use another dataset to look at threshold tuning. We won't fall into the same trap as before and start retraining a model without a validation and test-set, but train one model and keep on using that.

## Import data and create model

You did this before, so the following code should give no surprises.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix, classification_report, roc_curve, auc
)

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

# Fit model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predict probabilities
probs = model.predict_proba(X_test)[:, 1]

## The ROC-curve

Plot the ROC-curve agian.

In [None]:
# Up to you!



This is a very good ROC-curve. The area under the curve is 1, which means the model is nearly perfect. But even so (perhaps especially so) it's a good idea to investigate which threshold is best for our use case.

Start with the following function.

In [None]:
# Function to evaluate threshold
def evaluate_threshold(threshold):
    y_pred_thresh = (probs >= threshold).astype(int)
    cm = confusion_matrix(y_test, y_pred_thresh)
    tn, fp, fn, tp = cm.ravel()

    sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0  # Recall
    specificity = tn / (tn + fp) if (tn + fp) > 0 else 0

    print(f"\nThreshold: {threshold:.2f}")
    # print("Confusion Matrix:")
    print(cm)
    print(f"Sensitivity (Recall): {sensitivity:.3f}")
    print(f"Specificity:          {specificity:.3f}")
    # print("\nClassification Report:")
    # print(classification_report(y_test, y_pred_thresh))


Try the following thresholds:

.15, .3, .5, .7, .9

In [None]:
# Up to you!



(you can uncomment the confusion matrix or the full report in the function if you'd like.) When you scroll through you'll notice how the precision goes down and the recall (or sensitivity) goes up. Remember we are working with a breast cancer dataset. We don't want to leave that untreated, but the treatment is also nothing to look forward to.

* Threshold 0.15: No false positives, 5 false negatives.
    * We won't start treating anybody who's not sick, but we send 5 women home without treatment.
* ...
* Threshold 0.90: 11 false positives, no false negatives.
    * We'll treat 11 healthy people, but nobody goes home with cancer.

Luckily this is not a decision an IT'er should make. It is something that should be well worked over with people who have a medical background (domain specialists, in other words).