# Payment Fraud Detection for Treasury

## Use Case Overview

**Business Problem:** Treasury processes millions of dollars in payments daily. Fraudulent payments can result from:
- Business Email Compromise (BEC)
- Invoice manipulation
- Unauthorized payments
- Sanctions violations

**Impact:** The US Treasury reported preventing **$4+ billion in fraud** using ML in 2024.

---

## Detection Approaches

| Approach | Method | Use Case |
|----------|--------|----------|
| **Rules-Based** | Static thresholds | Known fraud patterns |
| **Anomaly Detection** | Isolation Forest, Autoencoders | Unknown/novel fraud |
| **Supervised ML** | XGBoost, Neural Networks | When labeled data exists |
| **Hybrid** | Rules + ML | Production systems |

---

## Learning Objectives

1. Generate payment data with realistic fraud patterns
2. Build rules-based fraud detection
3. Implement Isolation Forest for anomaly detection
4. Combine approaches into a hybrid system
5. Evaluate with precision, recall, and ROC curves

## 1. Setup and Imports

In [None]:
# Core imports
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# ML
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    classification_report, confusion_matrix, 
    precision_recall_curve, roc_curve, auc,
    precision_score, recall_score, f1_score
)

# Set style
plt.style.use('seaborn-v0_8-whitegrid')

print("âœ… All imports successful!")

In [None]:
# Add parent directory to path
import sys
sys.path.append('..')

from src.treasury_sim.generators import generate_payments, set_seed

print("âœ… Custom generators loaded!")

## 2. Generate Payment Data with Fraud Labels

### Thinking Trace ðŸ§ 

> **Fraud patterns we're simulating:**
> 
> | Pattern | Description | Real-World Example |
> |---------|-------------|--------------------|
> | **Unusual Amount** | 5-20x normal transaction | BEC requesting urgent large payment |
> | **New Beneficiary** | Unknown/unverified recipient | Fake vendor setup |
> | **High-Risk Country** | Sanctioned or high-risk jurisdiction | Sanctions violation |
> | **Unusual Time** | Outside business hours | Compromised credentials |
> | **Round Amount** | Suspiciously round numbers | Invoice manipulation |

In [None]:
# Generate 6 months of payment data
set_seed(42)

payments = generate_payments(
    days=180,
    daily_count=100,
    anomaly_rate=0.03,  # 3% fraud rate
    seed=42
)

print(f"ðŸ“Š Generated {len(payments):,} payments")
print(f"ðŸš¨ Anomalies: {payments['is_anomaly'].sum()} ({payments['is_anomaly'].mean()*100:.1f}%)")
print(f"\nðŸ“… Date range: {payments['timestamp'].min().date()} to {payments['timestamp'].max().date()}")

In [None]:
# Preview data
print("=" * 70)
print("PAYMENT DATA SAMPLE")
print("=" * 70)
payments.head(10)

In [None]:
# Anomaly breakdown
print("\nðŸš¨ ANOMALY TYPES BREAKDOWN")
print("=" * 50)

anomalies = payments[payments['is_anomaly']]
all_reasons = []
for reasons in anomalies['anomaly_reasons'].dropna():
    all_reasons.extend(reasons.split('|'))

reason_counts = pd.Series(all_reasons).value_counts()
print(reason_counts)

## 3. Exploratory Data Analysis

### Thinking Trace ðŸ§ 

> **What distinguishes fraudulent payments?**
> 1. **Amount distribution** - Fraudulent payments are often outliers
> 2. **Timing patterns** - Outside normal business hours
> 3. **Beneficiary patterns** - New or unusual recipients
> 4. **Geographic patterns** - High-risk countries

In [None]:
# Add derived features for analysis
payments['hour'] = payments['timestamp'].dt.hour
payments['day_of_week'] = payments['timestamp'].dt.dayofweek
payments['is_business_hours'] = payments['hour'].between(9, 17)
payments['log_amount'] = np.log1p(payments['amount'])

# Amount statistics by fraud status
print("ðŸ’° AMOUNT STATISTICS")
print("=" * 50)
print(payments.groupby('is_anomaly')['amount'].describe().round(2))

In [None]:
# Comprehensive visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Amount Distribution (Log Scale)',
        'Payments by Hour (Fraud Highlighted)',
        'Country Distribution',
        'Anomaly Score Distribution'
    )
)

# 1. Amount distribution
for is_fraud, color, name in [(False, 'blue', 'Normal'), (True, 'red', 'Anomaly')]:
    subset = payments[payments['is_anomaly'] == is_fraud]
    fig.add_trace(
        go.Histogram(x=subset['log_amount'], name=name, opacity=0.7,
                    marker_color=color, nbinsx=50),
        row=1, col=1
    )

# 2. Hourly distribution
hourly = payments.groupby(['hour', 'is_anomaly']).size().unstack(fill_value=0)
fig.add_trace(
    go.Bar(x=hourly.index, y=hourly[False], name='Normal', marker_color='blue'),
    row=1, col=2
)
fig.add_trace(
    go.Bar(x=hourly.index, y=hourly[True], name='Anomaly', marker_color='red'),
    row=1, col=2
)

# 3. Country distribution
country_fraud = payments.groupby(['beneficiary_country', 'is_anomaly']).size().unstack(fill_value=0)
country_fraud['fraud_rate'] = country_fraud[True] / (country_fraud[True] + country_fraud[False]) * 100
country_fraud = country_fraud.sort_values('fraud_rate', ascending=True)
fig.add_trace(
    go.Bar(y=country_fraud.index, x=country_fraud['fraud_rate'], 
           orientation='h', name='Fraud Rate %', marker_color='orange'),
    row=2, col=1
)

# 4. Anomaly score
fig.add_trace(
    go.Histogram(x=payments['anomaly_score'], nbinsx=20, 
                name='Anomaly Score', marker_color='purple'),
    row=2, col=2
)

fig.update_layout(height=700, title_text='Payment Fraud - Exploratory Analysis', barmode='stack')
fig.show()

### ðŸ“Š Key EDA Insights

1. **Amount**: Fraudulent payments show higher amounts (right tail of distribution)
2. **Timing**: Some anomalies occur outside business hours (before 9am, after 6pm)
3. **Geography**: High-risk countries (RU, IR, KP, etc.) have 100% fraud rate
4. **Multiple signals**: Many frauds trigger multiple anomaly reasons

## 4. Approach 1: Rules-Based Detection

### Thinking Trace ðŸ§ 

> **Why start with rules?**
> - **Interpretable**: Easy to explain to auditors/compliance
> - **Fast**: No training required
> - **Deterministic**: Same input = same output
> - **Domain knowledge**: Captures known fraud patterns
>
> **Rules we'll implement:**
> 1. Amount > $500,000 (large payment)
> 2. High-risk country
> 3. Outside business hours (before 8am or after 7pm)
> 4. New beneficiary (contains 'NEW')
> 5. Round amount (divisible by 10,000)

In [None]:
def rules_based_detection(df):
    """
    Apply rules-based fraud detection.
    
    Returns DataFrame with rule flags and overall risk score.
    """
    result = df.copy()
    
    # Define high-risk countries
    high_risk_countries = ['RU', 'IR', 'KP', 'SY', 'VE', 'CU', 'XX']
    
    # Rule 1: Large amount
    result['rule_large_amount'] = result['amount'] > 500000
    
    # Rule 2: High-risk country
    result['rule_high_risk_country'] = result['beneficiary_country'].isin(high_risk_countries)
    
    # Rule 3: Outside business hours
    result['rule_unusual_time'] = ~result['hour'].between(8, 19)
    
    # Rule 4: New beneficiary
    result['rule_new_beneficiary'] = result['beneficiary_name'].str.contains('NEW', na=False)
    
    # Rule 5: Round amount
    result['rule_round_amount'] = (result['amount'] % 10000 == 0) & (result['amount'] > 50000)
    
    # Calculate risk score (number of rules triggered)
    rule_columns = [col for col in result.columns if col.startswith('rule_')]
    result['rules_risk_score'] = result[rule_columns].sum(axis=1)
    
    # Flag as suspicious if any rule triggered
    result['rules_flagged'] = result['rules_risk_score'] > 0
    
    return result

# Apply rules
payments_rules = rules_based_detection(payments)

# Evaluate rules-based approach
print("ðŸ“‹ RULES-BASED DETECTION RESULTS")
print("=" * 50)
print(f"\nPayments flagged: {payments_rules['rules_flagged'].sum()} ({payments_rules['rules_flagged'].mean()*100:.1f}%)")
print(f"Actual anomalies: {payments_rules['is_anomaly'].sum()} ({payments_rules['is_anomaly'].mean()*100:.1f}%)")

In [None]:
# Rules performance metrics
y_true = payments_rules['is_anomaly'].astype(int)
y_pred_rules = payments_rules['rules_flagged'].astype(int)

print("\nðŸ“Š RULES-BASED PERFORMANCE")
print("=" * 50)
print(classification_report(y_true, y_pred_rules, target_names=['Normal', 'Anomaly']))

# Confusion matrix
cm_rules = confusion_matrix(y_true, y_pred_rules)
print("\nConfusion Matrix:")
print(pd.DataFrame(cm_rules, 
                   index=['Actual Normal', 'Actual Anomaly'],
                   columns=['Pred Normal', 'Pred Anomaly']))

In [None]:
# Rule-by-rule analysis
rule_columns = [col for col in payments_rules.columns if col.startswith('rule_')]

rule_stats = []
for rule in rule_columns:
    triggered = payments_rules[rule].sum()
    true_positives = ((payments_rules[rule]) & (payments_rules['is_anomaly'])).sum()
    precision = true_positives / triggered if triggered > 0 else 0
    recall = true_positives / payments_rules['is_anomaly'].sum()
    
    rule_stats.append({
        'Rule': rule.replace('rule_', '').replace('_', ' ').title(),
        'Triggered': triggered,
        'True Positives': true_positives,
        'Precision': f"{precision:.1%}",
        'Recall': f"{recall:.1%}"
    })

print("\nðŸ“‹ RULE-BY-RULE PERFORMANCE")
print("=" * 70)
print(pd.DataFrame(rule_stats).to_string(index=False))

## 5. Approach 2: Isolation Forest (Anomaly Detection)

### How Isolation Forest Works

Isolation Forest detects anomalies by **isolating** observations:

1. **Build trees**: Randomly select features and split values
2. **Measure isolation**: Anomalies are easier to isolate (fewer splits needed)
3. **Score**: Average path length across all trees

$$s(x, n) = 2^{-\frac{E(h(x))}{c(n)}}$$

Where:
- $h(x)$ = path length for observation $x$
- $c(n)$ = average path length for $n$ samples
- Score close to 1 = anomaly

### Thinking Trace ðŸ§ 

> **Feature engineering for Isolation Forest:**
> - Numerical features only (encode categoricals)
> - Scale features for better performance
> - Include behavioral features (amount relative to history)

In [None]:
# Prepare features for Isolation Forest
def prepare_features(df):
    """
    Engineer features for anomaly detection.
    """
    features = pd.DataFrame()
    
    # Amount features
    features['amount'] = df['amount']
    features['log_amount'] = np.log1p(df['amount'])
    
    # Time features
    features['hour'] = df['timestamp'].dt.hour
    features['day_of_week'] = df['timestamp'].dt.dayofweek
    features['is_business_hours'] = df['timestamp'].dt.hour.between(9, 17).astype(int)
    
    # Encode currency
    le_currency = LabelEncoder()
    features['currency_encoded'] = le_currency.fit_transform(df['currency'])
    
    # Encode country (with risk weighting)
    high_risk = ['RU', 'IR', 'KP', 'SY', 'VE', 'CU', 'XX']
    features['is_high_risk_country'] = df['beneficiary_country'].isin(high_risk).astype(int)
    
    # New beneficiary flag
    features['is_new_beneficiary'] = df['beneficiary_name'].str.contains('NEW', na=False).astype(int)
    
    # Round amount flag
    features['is_round_amount'] = ((df['amount'] % 10000 == 0) & (df['amount'] > 10000)).astype(int)
    
    return features

# Prepare features
X = prepare_features(payments)
y = payments['is_anomaly'].astype(int)

print(f"ðŸ“Š Feature matrix shape: {X.shape}")
print(f"\nFeatures used:")
for i, col in enumerate(X.columns, 1):
    print(f"  {i}. {col}")

In [None]:
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train Isolation Forest
iso_forest = IsolationForest(
    n_estimators=100,
    contamination=0.03,  # Expected fraud rate
    random_state=42,
    n_jobs=-1
)

print("ðŸ”§ Training Isolation Forest...")
iso_forest.fit(X_scaled)

# Get predictions and scores
# Isolation Forest returns -1 for anomalies, 1 for normal
predictions_if = iso_forest.predict(X_scaled)
scores_if = -iso_forest.score_samples(X_scaled)  # Higher = more anomalous

# Convert to binary (1 = anomaly)
y_pred_if = (predictions_if == -1).astype(int)

print("âœ… Isolation Forest trained!")
print(f"\nAnomalies detected: {y_pred_if.sum()} ({y_pred_if.mean()*100:.1f}%)")

In [None]:
# Evaluate Isolation Forest
print("\nðŸ“Š ISOLATION FOREST PERFORMANCE")
print("=" * 50)
print(classification_report(y, y_pred_if, target_names=['Normal', 'Anomaly']))

# Confusion matrix
cm_if = confusion_matrix(y, y_pred_if)
print("\nConfusion Matrix:")
print(pd.DataFrame(cm_if, 
                   index=['Actual Normal', 'Actual Anomaly'],
                   columns=['Pred Normal', 'Pred Anomaly']))

In [None]:
# Add scores to dataframe
payments_rules['if_score'] = scores_if
payments_rules['if_flagged'] = y_pred_if

# Visualize score distribution
fig = go.Figure()

fig.add_trace(go.Histogram(
    x=payments_rules[~payments_rules['is_anomaly']]['if_score'],
    name='Normal', opacity=0.7, nbinsx=50
))

fig.add_trace(go.Histogram(
    x=payments_rules[payments_rules['is_anomaly']]['if_score'],
    name='Anomaly', opacity=0.7, nbinsx=50
))

# Add threshold line
threshold = np.percentile(scores_if, 97)  # Top 3%
fig.add_vline(x=threshold, line_dash="dash", line_color="red",
              annotation_text="Detection Threshold")

fig.update_layout(
    title='Isolation Forest Anomaly Scores',
    xaxis_title='Anomaly Score (higher = more suspicious)',
    yaxis_title='Count',
    barmode='overlay',
    height=400
)
fig.show()

## 6. Approach 3: Hybrid System

### Thinking Trace ðŸ§ 

> **Why combine approaches?**
> - Rules catch **known patterns** with high confidence
> - ML catches **novel patterns** that rules miss
> - Ensemble reduces false positives
>
> **Combination strategy:**
> - **High Alert**: Rules OR high ML score
> - **Medium Alert**: Rules XOR moderate ML score
> - **Low Alert**: Only ML flagged (below threshold)

In [None]:
def hybrid_detection(df, if_threshold_percentile=95):
    """
    Combine rules and ML for hybrid fraud detection.
    """
    result = df.copy()
    
    # Calculate IF threshold
    if_threshold = np.percentile(result['if_score'], if_threshold_percentile)
    
    # Scoring: Combine rules and ML
    result['hybrid_score'] = (
        result['rules_risk_score'] * 0.4 +  # Rules contribution
        (result['if_score'] / result['if_score'].max()) * 3 * 0.6  # ML contribution (normalized)
    )
    
    # Risk levels
    conditions = [
        (result['rules_risk_score'] >= 2) | (result['if_score'] > np.percentile(result['if_score'], 99)),
        (result['rules_risk_score'] >= 1) | (result['if_score'] > np.percentile(result['if_score'], 97)),
        (result['if_score'] > np.percentile(result['if_score'], 95)),
    ]
    choices = ['HIGH', 'MEDIUM', 'LOW']
    result['risk_level'] = np.select(conditions, choices, default='NORMAL')
    
    # Binary flag for evaluation
    result['hybrid_flagged'] = result['risk_level'] != 'NORMAL'
    
    return result

# Apply hybrid detection
payments_hybrid = hybrid_detection(payments_rules)

print("ðŸ“Š HYBRID DETECTION RESULTS")
print("=" * 50)
print(f"\nRisk Level Distribution:")
print(payments_hybrid['risk_level'].value_counts())

In [None]:
# Evaluate hybrid approach
y_pred_hybrid = payments_hybrid['hybrid_flagged'].astype(int)

print("\nðŸ“Š HYBRID SYSTEM PERFORMANCE")
print("=" * 50)
print(classification_report(y, y_pred_hybrid, target_names=['Normal', 'Anomaly']))

# Confusion matrix
cm_hybrid = confusion_matrix(y, y_pred_hybrid)
print("\nConfusion Matrix:")
print(pd.DataFrame(cm_hybrid, 
                   index=['Actual Normal', 'Actual Anomaly'],
                   columns=['Pred Normal', 'Pred Anomaly']))

## 7. Model Comparison Dashboard

In [None]:
# Compare all approaches
def evaluate_model(y_true, y_pred, name):
    return {
        'Model': name,
        'Precision': precision_score(y_true, y_pred),
        'Recall': recall_score(y_true, y_pred),
        'F1 Score': f1_score(y_true, y_pred),
        'False Positives': ((y_pred == 1) & (y_true == 0)).sum(),
        'False Negatives': ((y_pred == 0) & (y_true == 1)).sum()
    }

comparison = pd.DataFrame([
    evaluate_model(y, y_pred_rules, 'Rules-Based'),
    evaluate_model(y, y_pred_if, 'Isolation Forest'),
    evaluate_model(y, y_pred_hybrid, 'Hybrid System')
])

print("ðŸ“Š MODEL COMPARISON")
print("=" * 80)
print(comparison.to_string(index=False))

In [None]:
# Visualization
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Precision vs Recall', 'False Positives vs False Negatives')
)

colors = ['#636EFA', '#EF553B', '#00CC96']

# Precision vs Recall
fig.add_trace(
    go.Bar(x=comparison['Model'], y=comparison['Precision'], 
           name='Precision', marker_color=colors[0]),
    row=1, col=1
)
fig.add_trace(
    go.Bar(x=comparison['Model'], y=comparison['Recall'], 
           name='Recall', marker_color=colors[1]),
    row=1, col=1
)

# False Positives vs False Negatives
fig.add_trace(
    go.Bar(x=comparison['Model'], y=comparison['False Positives'], 
           name='False Positives', marker_color='orange', showlegend=True),
    row=1, col=2
)
fig.add_trace(
    go.Bar(x=comparison['Model'], y=comparison['False Negatives'], 
           name='False Negatives', marker_color='red', showlegend=True),
    row=1, col=2
)

fig.update_layout(height=400, title_text='Fraud Detection Model Comparison', barmode='group')
fig.show()

## 8. Production Alert Dashboard (Mockup)

### Thinking Trace ðŸ§ 

> **What would a production system look like?**
> - Real-time payment scoring
> - Alert queue for review
> - Risk-based prioritization
> - Audit trail for compliance

In [None]:
# Create alert dashboard view
alerts = payments_hybrid[payments_hybrid['risk_level'] != 'NORMAL'].copy()
alerts = alerts.sort_values('hybrid_score', ascending=False)

# Select columns for display
alert_display = alerts[[
    'payment_id', 'timestamp', 'amount', 'currency',
    'beneficiary_name', 'beneficiary_country',
    'risk_level', 'hybrid_score', 'anomaly_reasons'
]].head(20)

print("ðŸš¨ FRAUD ALERT DASHBOARD")
print("=" * 100)
print(f"\nTotal Alerts: {len(alerts)}")
print(f"  HIGH:   {(alerts['risk_level'] == 'HIGH').sum()}")
print(f"  MEDIUM: {(alerts['risk_level'] == 'MEDIUM').sum()}")
print(f"  LOW:    {(alerts['risk_level'] == 'LOW').sum()}")
print("\n" + "=" * 100)
print("\nTOP 20 ALERTS (by risk score):")
alert_display

In [None]:
# Alert summary visualization
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Alerts by Risk Level',
        'Alert Amount Distribution',
        'Alerts by Country',
        'Daily Alert Trend'
    ),
    specs=[[{"type": "pie"}, {"type": "histogram"}],
           [{"type": "bar"}, {"type": "scatter"}]]
)

# 1. Pie chart - risk levels
risk_counts = alerts['risk_level'].value_counts()
fig.add_trace(
    go.Pie(labels=risk_counts.index, values=risk_counts.values,
           marker_colors=['red', 'orange', 'yellow']),
    row=1, col=1
)

# 2. Amount distribution
fig.add_trace(
    go.Histogram(x=alerts['amount'], nbinsx=30, marker_color='red'),
    row=1, col=2
)

# 3. By country
country_alerts = alerts['beneficiary_country'].value_counts().head(10)
fig.add_trace(
    go.Bar(x=country_alerts.index, y=country_alerts.values, marker_color='orange'),
    row=2, col=1
)

# 4. Daily trend
daily_alerts = alerts.groupby(alerts['timestamp'].dt.date).size()
fig.add_trace(
    go.Scatter(x=daily_alerts.index, y=daily_alerts.values, mode='lines+markers'),
    row=2, col=2
)

fig.update_layout(height=600, title_text='Fraud Alert Dashboard', showlegend=False)
fig.show()

## 9. Key Takeaways

### Model Selection Guide

| Scenario | Recommended Approach | Rationale |
|----------|---------------------|----------|
| **Quick deployment** | Rules-Based | Fast, interpretable |
| **Novel fraud types** | Isolation Forest | Detects unknown patterns |
| **Production system** | Hybrid | Best of both worlds |
| **Labeled historical data** | Supervised ML (XGBoost) | Higher accuracy |

### Business Recommendations

1. **Start with rules** for immediate protection
2. **Add ML layer** to catch what rules miss
3. **Tune thresholds** based on operational capacity
4. **Track false positives** to maintain trust in the system
5. **Human review** for high-risk alerts (never auto-block)

### Production Considerations

| Aspect | Implementation |
|--------|----------------|
| **Latency** | <100ms for real-time scoring |
| **Feedback loop** | Incorporate investigation outcomes |
| **Model refresh** | Retrain monthly with new data |
| **Explainability** | Log which rules/features triggered |
| **Audit trail** | Store all decisions for compliance |

---

*Notebook created for Treasury AI educational purposes*

*Author: Ozgur Guler (ozgur.guler1@gmail.com)*