# Machine Learning Models – From Static Rules to Adaptive Risk Scoring

This notebook introduces machine learning models to address the limitations
observed in the rule-based baseline.

The goal is **not** to maximize model metrics in isolation,
but to demonstrate how adaptive risk scoring improves
fraud detection **decision quality and business outcomes**.


In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    precision_score,
    recall_score,
    roc_auc_score
)
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import IsolationForest

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 120)

# Load dataset
paths = ["data/creditcard.csv", "creditcard.csv"]
data_path = next((p for p in paths if os.path.exists(p)), None)

if data_path is None:
    raise FileNotFoundError("creditcard.csv not found.")

df = pd.read_csv(data_path)
df.shape


(284807, 31)

## Modeling Strategy

We use two complementary approaches:

### 1. Unsupervised Learning (Isolation Forest)
- Detects **unknown or emerging fraud patterns**
- Does not rely on historical fraud labels
- Useful for anomaly discovery and early warning

### 2. Supervised Learning (XGBoost)
- Learns from known fraud examples
- Optimizes fraud recall while controlling false positives
- Enables calibrated risk scoring

This mirrors real-world fraud systems,
where multiple signals are combined rather than relying on a single model.


In [2]:
X = df.drop(columns=["Class"])
y = df["Class"]

# Standardize features for Isolation Forest
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


## Train–Test Split

We split the data to evaluate how well the models generalize.
Fraud prevalence is preserved using stratification
to reflect real-world class imbalance.


In [3]:
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.3, stratify=y, random_state=42
)


## Unsupervised Fraud Detection – Isolation Forest

Isolation Forest identifies anomalies by isolating observations
that require fewer splits to separate from the rest of the data.

From a business perspective:
- It can surface **previously unseen fraud patterns**
- It acts as a complementary signal rather than a final decision-maker


In [4]:
iso_forest = IsolationForest(
    n_estimators=200,
    contamination=0.002,  # approximate fraud rate
    random_state=42,
    n_jobs=-1
)

iso_forest.fit(X_train)


0,1,2
,n_estimators,200
,max_samples,'auto'
,contamination,0.002
,max_features,1.0
,bootstrap,False
,n_jobs,-1
,random_state,42
,verbose,0
,warm_start,False


In [5]:
iso_preds = iso_forest.predict(X_test)
iso_preds_binary = np.where(iso_preds == -1, 1, 0)

iso_recall = recall_score(y_test, iso_preds_binary)
iso_precision = precision_score(y_test, iso_preds_binary)
iso_review_rate = iso_preds_binary.mean()

iso_recall, iso_precision, iso_review_rate


(0.25675675675675674, 0.2389937106918239, np.float64(0.0018608897159509848))

## Interpreting Isolation Forest Results

- Isolation Forest captures **some fraud cases without labels**
- Precision is low due to anomaly-based detection
- Review rate must be carefully controlled to avoid operational overload

Business takeaway:
Isolation Forest is best used as an **early-warning or supporting signal**,
not as a standalone decision engine.


## Supervised Fraud Detection – XGBoost

XGBoost is used to learn complex, non-linear fraud patterns
from labeled historical data.

Key business advantage:
It enables **continuous risk scoring** rather than binary rule decisions.


In [7]:
from xgboost import XGBClassifier

scale_pos_weight = (y_train == 0).sum() / (y_train == 1).sum()

xgb_model = XGBClassifier(
    n_estimators=200,
    max_depth=5,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    scale_pos_weight=scale_pos_weight,
    eval_metric="logloss",
    random_state=42
)

xgb_model.fit(X_train, y_train)


0,1,2
,objective,'binary:logistic'
,base_score,
,booster,
,callbacks,
,colsample_bylevel,
,colsample_bynode,
,colsample_bytree,0.8
,device,
,early_stopping_rounds,
,enable_categorical,False


In [8]:
y_proba = xgb_model.predict_proba(X_test)[:, 1]

roc_auc = roc_auc_score(y_test, y_proba)
roc_auc


0.971664913345258

## From Scores to Decisions

Instead of a single threshold,
we evaluate how different cutoffs affect fraud recall
and operational review volume.


In [9]:
def evaluate_threshold(y_true, y_scores, threshold):
    preds = (y_scores >= threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_true, preds).ravel()

    return {
        "threshold": threshold,
        "recall": tp / (tp + fn),
        "precision": tp / (tp + fp),
        "review_rate": (tp + fp) / len(y_true)
    }

thresholds = [0.1, 0.3, 0.5, 0.7]
results = [evaluate_threshold(y_test, y_proba, t) for t in thresholds]
pd.DataFrame(results)


Unnamed: 0,threshold,recall,precision,review_rate
0,0.1,0.858108,0.279736,0.005313
1,0.3,0.844595,0.523013,0.002797
2,0.5,0.837838,0.677596,0.002142
3,0.7,0.817568,0.780645,0.001814


## Why ML Outperforms Static Rules

Compared to rule-based thresholds:
- Risk scores allow **flexible decision policies**
- Thresholds can be adjusted based on:
  fraud trends, operational capacity, and business priorities
- The same model supports multiple decision layers
  (block, review, allow)

This adaptability is what static rules fundamentally lack.


## Model Comparison Summary

| Approach | Strength | Limitation | Business Role |
|--------|---------|------------|---------------|
| Rules | Simple & interpretable | Brittle, static | Baseline protection |
| Isolation Forest | Detects unknown fraud | High noise | Supporting signal |
| XGBoost | High recall & control | Needs labels | Core risk engine |

Real-world fraud systems combine all three.


## Key Takeaways

- Machine learning enables **adaptive fraud detection**
  where static rules fail.
- Risk scoring allows the business to balance
  fraud loss, customer experience, and operational cost.
- Unsupervised and supervised models play **complementary roles**.
- ML models support decision-making — not just prediction.

> Effective fraud detection is not about catching every fraud,
> but about making the **right decision at scale**.
