---
title: "Lab 8: Blockchain Transaction Analysis & Fraud Detection"
subtitle: "Anomaly detection and network analytics for financial surveillance"
format:
  html:
    toc: true
    number-sections: true
execute:
  echo: true
  eval: false
  warning: false
  message: false
---

::: callout-note
### Expected Time

- FIN510: Exercises 1-2 ≈ 75 min
- FIN720: All exercises ≈ 110 min
- Directed learning extensions ≈ 60 min
:::

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab08_blockchain_fraud.ipynb)

## Before You Code: The Big Picture

Blockchain's transparency is a double-edged sword: **all transactions are public**, but **identities are pseudonymous**. Can we detect fraud and money laundering at scale using data science?

::: {.callout-note}
## The Blockchain Transparency Paradox

**The Promise:**
1. **Transparency**: Every transaction recorded immutably on public ledger
2. **Traceability**: Follow the money from source to destination
3. **Auditability**: Regulators can inspect entire transaction history

**The Reality:**
- **Pseudonymity**: Addresses ≠ identities (hard to link to real people)
- **Mixing services**: Tumblers obscure transaction trails
- **Privacy coins**: Monero, Zcash use cryptography to hide amounts/recipients
- **Volume**: Billions of transactions → needle-in-haystack problem

**The Scale of Crypto Crime:**
- $14 billion in crypto scams/hacks in 2021 (Chainalysis 2022)
- ~2.7% of crypto transaction volume illicit in 2022
- Ransomware payments: $602M in 2021 (up 78% YoY)
- North Korea's Lazarus Group: $1.7B stolen in 2022 (largest year ever)

**Regulatory Response:**
- **Travel Rule**: FATF requires identity sharing for transfers >$1000
- **AML/KYC**: Exchanges must screen users, report suspicious activity
- **OFAC sanctions**: Block transactions to/from sanctioned addresses
:::

### What You'll Build Today

By the end of this lab, you will have:

- ✅ Statistical anomaly detection (Z-scores, Mahalanobis distance)
- ✅ Machine learning methods (Isolation Forest, Autoencoders)
- ✅ Network analysis to detect fraud rings
- ✅ Trade-off analysis (detection rate vs. false positives)
- ✅ Understanding of real-world compliance challenges

**Time estimate:** 75 minutes (FIN510) | 110 minutes (FIN720 with all exercises)

::: {.callout-important}
## Why This Matters
Financial crime surveillance is a $100B+ industry (Bloomberg 2023). Traditional banks use rules-based systems (slow, high false positives). Your skills in anomaly detection and network analysis are directly applicable to FinCrime teams at banks, exchanges, and regulators.
:::

## Learning Objectives

By the end of this lab, you will be able to:

- Analyze blockchain transaction data and identify suspicious patterns
- Implement statistical anomaly detection methods for fraud screening
- Apply machine learning algorithms (Isolation Forest, Autoencoders) for anomaly detection
- Conduct network analysis to detect fraud rings and money laundering
- Visualize transaction flows and suspicious subnetworks
- Evaluate trade-offs between detection accuracy and false positive rates
- Connect technical detection methods to regulatory compliance (AML/KYC)

## Setup and Dependencies

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Machine learning
try:
    from sklearn.ensemble import IsolationForest
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
except ImportError:
    print("Installing scikit-learn...")
    !pip install -q scikit-learn

# Network analysis
try:
    import networkx as nx
except ImportError:
    print("Installing networkx...")
    !pip install -q networkx

# Deep learning (for autoencoders)
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
except ImportError:
    print("Installing tensorflow...")
    !pip install -q tensorflow

# Visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 11

print("✓ Setup complete - ready for fraud detection analysis")

## Exercise 1: Transaction Data Analysis and Pattern Identification

### Understanding Blockchain Transaction Structure

Blockchain transactions differ fundamentally from traditional banking transactions. Rather than simple transfers between accounts, Bitcoin transactions consume "unspent transaction outputs" (UTXOs) and create new ones. Understanding this structure is essential for effective fraud detection—patterns that seem normal in account-based systems might indicate suspicious activity in UTXO-based systems.

We'll analyze simulated Bitcoin-like transaction data capturing realistic patterns whilst avoiding issues with actual blockchain data (size, privacy, changing patterns). The simulation includes normal transactions and injected fraud examples.

### Loading and Exploring Transaction Data

In [None]:
# Generate synthetic blockchain transaction data with fraud examples
np.random.seed(42)

# Normal transaction parameters
n_normal = 9500
normal_amounts = np.random.lognormal(mean=3, sigma=1.5, size=n_normal)
normal_inputs = np.random.poisson(lam=1.5, size=n_normal) + 1
normal_outputs = np.random.poisson(lam=1.8, size=n_normal) + 1
normal_fees = normal_amounts * 0.001 + np.random.normal(0, 0.0001, n_normal)

# Fraud transaction parameters (unusual patterns)
n_fraud = 500
fraud_amounts = np.random.lognormal(mean=6, sigma=2, size=n_fraud)  # Larger amounts
fraud_inputs = np.random.poisson(lam=5, size=n_fraud) + 1  # More inputs (mixing)
fraud_outputs = np.random.poisson(lam=8, size=n_fraud) + 1  # More outputs (distribution)
fraud_fees = fraud_amounts * 0.005 + np.random.normal(0, 0.0005, n_fraud)  # Higher fees

# Combine into DataFrame
transactions = pd.DataFrame({
    'amount': np.concatenate([normal_amounts, fraud_amounts]),
    'n_inputs': np.concatenate([normal_inputs, fraud_inputs]),
    'n_outputs': np.concatenate([normal_outputs, fraud_outputs]),
    'fee': np.concatenate([normal_fees, fraud_fees]),
    'is_fraud': np.concatenate([np.zeros(n_normal), np.ones(n_fraud)])
})

# Add temporal features
base_time = datetime(2024, 1, 1)
transactions['timestamp'] = [
    base_time + timedelta(seconds=int(x)) 
    for x in np.random.uniform(0, 86400*30, len(transactions))
]
transactions = transactions.sort_values('timestamp').reset_index(drop=True)

# Add derived features
transactions['hour'] = transactions['timestamp'].dt.hour
transactions['fee_rate'] = transactions['fee'] / transactions['amount']
transactions['total_io'] = transactions['n_inputs'] + transactions['n_outputs']

# Display summary
print("="*70)
print("BLOCKCHAIN TRANSACTION DATASET")
print("="*70)
print(f"\nTotal transactions: {len(transactions):,}")
print(f"Fraud transactions: {int(transactions['is_fraud'].sum()):,} ({transactions['is_fraud'].mean()*100:.1f}%)")
print(f"Normal transactions: {int((~transactions['is_fraud'].astype(bool)).sum()):,}")

print("\n" + "-"*70)
print("STATISTICAL SUMMARY")
print("-"*70)
print(transactions[['amount', 'n_inputs', 'n_outputs', 'fee', 'fee_rate']].describe())

print("\n" + "-"*70)
print("SAMPLE TRANSACTIONS")
print("-"*70)
print(transactions.head(10))

### Visualizing Transaction Patterns

In [None]:
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# Amount distribution (log scale)
axes[0, 0].hist(np.log10(transactions[transactions['is_fraud']==0]['amount']), 
                bins=50, alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[0, 0].hist(np.log10(transactions[transactions['is_fraud']==1]['amount']), 
                bins=50, alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[0, 0].set_xlabel('Log10(Amount)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('Transaction Amount Distribution', fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(alpha=0.3)

# Input count distribution
axes[0, 1].hist(transactions[transactions['is_fraud']==0]['n_inputs'], 
                bins=range(0, 20), alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[0, 1].hist(transactions[transactions['is_fraud']==1]['n_inputs'], 
                bins=range(0, 20), alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[0, 1].set_xlabel('Number of Inputs')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Transaction Inputs Distribution', fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(alpha=0.3)

# Output count distribution
axes[0, 2].hist(transactions[transactions['is_fraud']==0]['n_outputs'], 
                bins=range(0, 25), alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[0, 2].hist(transactions[transactions['is_fraud']==1]['n_outputs'], 
                bins=range(0, 25), alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[0, 2].set_xlabel('Number of Outputs')
axes[0, 2].set_ylabel('Frequency')
axes[0, 2].set_title('Transaction Outputs Distribution', fontweight='bold')
axes[0, 2].legend()
axes[0, 2].grid(alpha=0.3)

# Fee rate distribution
axes[1, 0].hist(transactions[transactions['is_fraud']==0]['fee_rate']*100, 
                bins=50, alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[1, 0].hist(transactions[transactions['is_fraud']==1]['fee_rate']*100, 
                bins=50, alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[1, 0].set_xlabel('Fee Rate (%)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Transaction Fee Rate Distribution', fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(alpha=0.3)

# Hourly transaction volume
hourly_counts = transactions.groupby('hour').size()
hourly_fraud = transactions[transactions['is_fraud']==1].groupby('hour').size()
axes[1, 1].bar(hourly_counts.index, hourly_counts.values, alpha=0.7, label='Total', color='blue')
axes[1, 1].bar(hourly_fraud.index, hourly_fraud.values, alpha=0.9, label='Fraud', color='red')
axes[1, 1].set_xlabel('Hour of Day')
axes[1, 1].set_ylabel('Transaction Count')
axes[1, 1].set_title('Hourly Transaction Patterns', fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(alpha=0.3)

# Scatter: Amount vs Total I/O
axes[1, 2].scatter(transactions[transactions['is_fraud']==0]['amount'], 
                   transactions[transactions['is_fraud']==0]['total_io'],
                   alpha=0.5, s=10, label='Normal', color='blue')
axes[1, 2].scatter(transactions[transactions['is_fraud']==1]['amount'], 
                   transactions[transactions['is_fraud']==1]['total_io'],
                   alpha=0.7, s=20, label='Fraud', color='red', marker='^')
axes[1, 2].set_xlabel('Amount')
axes[1, 2].set_ylabel('Total Inputs + Outputs')
axes[1, 2].set_title('Amount vs Transaction Complexity', fontweight='bold')
axes[1, 2].set_xscale('log')
axes[1, 2].legend()
axes[1, 2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate separation between normal and fraud
print("\n" + "="*70)
print("FRAUD vs NORMAL COMPARISON")
print("="*70)

for col in ['amount', 'n_inputs', 'n_outputs', 'fee_rate', 'total_io']:
    normal_mean = transactions[transactions['is_fraud']==0][col].mean()
    fraud_mean = transactions[transactions['is_fraud']==1][col].mean()
    pct_diff = ((fraud_mean - normal_mean) / normal_mean) * 100
    
    print(f"\n{col}:")
    print(f"  Normal mean: {normal_mean:.4f}")
    print(f"  Fraud mean:  {fraud_mean:.4f}")
    print(f"  Difference:  {pct_diff:+.1f}%")

### Reflection Questions (Exercise 1)

Write 200-250 words addressing:

1. **Pattern Identification**: What transaction features distinguish fraud from normal activity in this dataset? How might real-world fraud patterns differ from these simulated examples?

2. **Detection Challenges**: Why might simple rule-based detection (e.g., "flag all transactions >$X") miss sophisticated fraud whilst generating many false positives?

3. **Temporal Patterns**: The hourly distribution shows some variation. How might criminals deliberately time transactions to avoid detection? What temporal features might improve fraud detection?

## Exercise 2: Statistical and Machine Learning Anomaly Detection

### Statistical Outlier Detection

We'll start with simple statistical methods establishing baseline performance before implementing more sophisticated machine learning approaches.

In [None]:
# Prepare features for anomaly detection
feature_cols = ['amount', 'n_inputs', 'n_outputs', 'fee', 'fee_rate', 'total_io', 'hour']
X = transactions[feature_cols].values
y_true = transactions['is_fraud'].values

# Standardize features (zero mean, unit variance)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Method 1: Z-score based detection (univariate)
def zscore_anomaly_detection(data, threshold=3):
    """
    Detect anomalies using Z-scores (standard deviations from mean).
    
    Classic statistical method: flag transactions with extreme values on
    any feature. Fast and interpretable, but assumes independence.
    
    Parameters
    ----------
    data : ndarray, shape (n_samples, n_features)
        Scaled transaction features (zero mean, unit variance)
    threshold : float, default=3
        Number of standard deviations to use as cutoff
        Common choices: 2.5 (broader), 3 (standard), 4 (conservative)
        
    Returns
    -------
    ndarray, shape (n_samples,)
        Boolean array: True = anomaly, False = normal
        
    Notes
    -----
    **How it works:**
    - Compute Z-score for each feature: z = (x - μ) / σ
    - Flag transaction if |z| > threshold on ANY feature
    - Uses "any" aggregation (union rule), not "all" (intersection)
    
    **Pros:**
    - Fast: O(n × d) where n = samples, d = features
    - Interpretable: Can explain why each transaction flagged
    - No training required: Works on new data instantly
    
    **Cons:**
    - Assumes features independent (ignores correlations)
    - Sensitive to outliers in training data (affects μ, σ)
    - High false positive rate (flags ~0.3% if threshold=3)
    
    **Real-World Usage:**
    - First-line screening for manual review queue
    - Rule-based systems in traditional banks
    - Often combined with domain rules (e.g., "amount > $10k AND ...")
    
    Examples
    --------
    >>> X_scaled = scaler.fit_transform(transactions[features])
    >>> anomalies = zscore_anomaly_detection(X_scaled, threshold=3)
    >>> print(f"Flagged {anomalies.sum()} / {len(anomalies)} transactions")
    Flagged 45 / 10000 transactions (0.45%)
    """
    z_scores = np.abs(stats.zscore(data, axis=0, nan_policy='omit'))
    # Flag if ANY feature exceeds threshold
    anomalies = (z_scores > threshold).any(axis=1)
    return anomalies

# Method 2: Percentile-based detection (multivariate using Mahalanobis distance)
def mahalanobis_anomaly_detection(data, threshold_percentile=95):
    """
    Detect anomalies using Mahalanobis distance (multivariate outlier detection).
    
    Accounts for feature correlations by measuring distance in transformed space
    where features are uncorrelated. More sophisticated than Z-scores.
    
    Parameters
    ----------
    data : ndarray, shape (n_samples, n_features)
        Scaled transaction features (zero mean, unit variance)
    threshold_percentile : float, default=95
        Percentile cutoff (e.g., 95 = flag top 5% most distant)
        Higher percentile = more conservative (fewer flags)
        
    Returns
    -------
    tuple of (anomalies, distances)
        anomalies : ndarray, shape (n_samples,), boolean flags
        distances : ndarray, shape (n_samples,), Mahalanobis distances
        
    Notes
    -----
    **How it works:**
    - Compute covariance matrix Σ capturing feature correlations
    - Mahalanobis distance: d = sqrt((x - μ)ᵀ Σ⁻¹ (x - μ))
    - Flag transactions with distances > percentile threshold
    
    **Why better than Z-scores:**
    - Accounts for correlations (e.g., high amount + low fee = suspicious)
    - Single distance metric (not "any" rule over features)
    - Scale-invariant (like Z-scores, but multivariate)
    
    **Pros:**
    - Captures multivariate patterns Z-scores miss
    - Probabilistic interpretation (Chi-squared distribution)
    - Lower false positive rate than Z-scores (for same recall)
    
    **Cons:**
    - Assumes Gaussian distribution (fails on heavy tails)
    - Requires covariance matrix inversion (unstable if d > n)
    - Computationally slower: O(n × d²) vs O(n × d) for Z-scores
    
    **Real-World Usage:**
    - Second-line screening after Z-score pre-filter
    - Works well when fraud exhibits correlated anomalies
    - Used by payment processors (Visa, Mastercard)
    
    **Mathematical Note:**
    Under Gaussian assumption, squared Mahalanobis distance follows χ²(d)
    distribution. Can use this for principled p-values instead of percentiles.
    
    Examples
    --------
    >>> X_scaled = scaler.fit_transform(transactions[features])
    >>> anomalies, distances = mahalanobis_anomaly_detection(X_scaled, threshold_percentile=95)
    >>> print(f"Flagged {anomalies.sum()} transactions")
    >>> print(f"Max distance: {distances.max():.2f}")
    Flagged 500 transactions (top 5%)
    Max distance: 12.34
    """
    # Calculate covariance matrix
    cov_matrix = np.cov(data, rowvar=False)
    inv_cov = np.linalg.pinv(cov_matrix)
    mean = np.mean(data, axis=0)
    
    # Calculate Mahalanobis distance for each point
    diff = data - mean
    mahal_dist = np.sqrt(np.sum(diff @ inv_cov * diff, axis=1))
    
    # Flag based on percentile threshold
    threshold = np.percentile(mahal_dist, threshold_percentile)
    anomalies = mahal_dist > threshold
    
    return anomalies, mahal_dist

# Apply statistical methods
print("="*70)
print("STATISTICAL ANOMALY DETECTION RESULTS")
print("="*70)

# Z-score method
z_anomalies = zscore_anomaly_detection(X_scaled, threshold=3)
z_precision = y_true[z_anomalies].mean()
z_recall = (y_true * z_anomalies).sum() / y_true.sum()
z_fpr = ((1-y_true) * z_anomalies).sum() / (1-y_true).sum()

print(f"\nZ-Score Method (threshold=3σ):")
print(f"  Flagged: {z_anomalies.sum()} transactions ({z_anomalies.mean()*100:.1f}%)")
print(f"  Precision: {z_precision:.3f} (of flagged, what % are fraud?)")
print(f"  Recall: {z_recall:.3f} (of fraud, what % detected?)")
print(f"  False Positive Rate: {z_fpr:.3f}")

# Mahalanobis method
mahal_anomalies, mahal_dist = mahalanobis_anomaly_detection(X_scaled, threshold_percentile=95)
mahal_precision = y_true[mahal_anomalies].mean()
mahal_recall = (y_true * mahal_anomalies).sum() / y_true.sum()
mahal_fpr = ((1-y_true) * mahal_anomalies).sum() / (1-y_true).sum()

print(f"\nMahalanobis Distance Method (95th percentile threshold):")
print(f"  Flagged: {mahal_anomalies.sum()} transactions ({mahal_anomalies.mean()*100:.1f}%)")
print(f"  Precision: {mahal_precision:.3f}")
print(f"  Recall: {mahal_recall:.3f}")
print(f"  False Positive Rate: {mahal_fpr:.3f}")

### Isolation Forest Anomaly Detection

In [None]:
# Isolation Forest - tree-based anomaly detection
print("\n" + "="*70)
print("ISOLATION FOREST ANOMALY DETECTION")
print("="*70)

# Train Isolation Forest
iso_forest = IsolationForest(
    n_estimators=100,
    contamination=0.05,  # Expected proportion of anomalies
    random_state=42,
    n_jobs=-1
)

iso_forest.fit(X_scaled)

# Predict anomalies (-1 for anomaly, 1 for normal)
iso_predictions = iso_forest.predict(X_scaled)
iso_anomalies = (iso_predictions == -1)

# Calculate scores (more negative = more anomalous)
iso_scores = iso_forest.score_samples(X_scaled)

# Evaluate performance
iso_precision = y_true[iso_anomalies].mean()
iso_recall = (y_true * iso_anomalies).sum() / y_true.sum()
iso_fpr = ((1-y_true) * iso_anomalies).sum() / (1-y_true).sum()

print(f"\nIsolation Forest Results:")
print(f"  Flagged: {iso_anomalies.sum()} transactions ({iso_anomalies.mean()*100:.1f}%)")
print(f"  Precision: {iso_precision:.3f}")
print(f"  Recall: {iso_recall:.3f}")
print(f"  False Positive Rate: {iso_fpr:.3f}")

# Visualize anomaly scores
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Score distribution
axes[0].hist(iso_scores[y_true==0], bins=50, alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[0].hist(iso_scores[y_true==1], bins=50, alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[0].set_xlabel('Anomaly Score (more negative = more anomalous)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Isolation Forest Anomaly Scores', fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# ROC-like curve: varying threshold
thresholds = np.percentile(iso_scores, range(0, 100, 5))
precisions, recalls, fprs = [], [], []

for threshold in thresholds:
    preds = (iso_scores < threshold)
    if preds.sum() > 0:
        precision = y_true[preds].mean()
        recall = (y_true * preds).sum() / y_true.sum()
        fpr = ((1-y_true) * preds).sum() / (1-y_true).sum()
        precisions.append(precision)
        recalls.append(recall)
        fprs.append(fpr)

axes[1].plot(fprs, recalls, marker='o', linewidth=2, markersize=4)
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate (Recall)')
axes[1].set_title('Detection Trade-off Curve', fontweight='bold')
axes[1].grid(alpha=0.3)
axes[1].plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Random')
axes[1].legend()

plt.tight_layout()
plt.show()

### Autoencoder-based Anomaly Detection

In [None]:
print("\n" + "="*70)
print("AUTOENCODER ANOMALY DETECTION")
print("="*70)

# Build autoencoder architecture
input_dim = X_scaled.shape[1]
encoding_dim = 4  # Bottleneck dimension

# Encoder
encoder = keras.Sequential([
    layers.Dense(12, activation='relu', input_shape=(input_dim,)),
    layers.Dense(8, activation='relu'),
    layers.Dense(encoding_dim, activation='relu', name='encoding')
])

# Decoder
decoder = keras.Sequential([
    layers.Dense(8, activation='relu', input_shape=(encoding_dim,)),
    layers.Dense(12, activation='relu'),
    layers.Dense(input_dim, activation='linear')
])

# Complete autoencoder
autoencoder = keras.Sequential([encoder, decoder])

# Compile
autoencoder.compile(
    optimizer='adam',
    loss='mse'
)

print("\nAutoencoder Architecture:")
autoencoder.summary()

# Train on normal transactions only (unsupervised approach)
X_train_normal = X_scaled[y_true == 0]

print(f"\nTraining on {len(X_train_normal)} normal transactions...")

history = autoencoder.fit(
    X_train_normal, X_train_normal,
    epochs=50,
    batch_size=32,
    validation_split=0.2,
    verbose=0
)

# Calculate reconstruction error for all transactions
reconstructions = autoencoder.predict(X_scaled, verbose=0)
reconstruction_errors = np.mean(np.square(X_scaled - reconstructions), axis=1)

# Flag anomalies based on reconstruction error threshold (95th percentile)
error_threshold = np.percentile(reconstruction_errors, 95)
ae_anomalies = reconstruction_errors > error_threshold

# Evaluate
ae_precision = y_true[ae_anomalies].mean()
ae_recall = (y_true * ae_anomalies).sum() / y_true.sum()
ae_fpr = ((1-y_true) * ae_anomalies).sum() / (1-y_true).sum()

print(f"\nAutoencoder Results (95th percentile threshold):")
print(f"  Error threshold: {error_threshold:.4f}")
print(f"  Flagged: {ae_anomalies.sum()} transactions ({ae_anomalies.mean()*100:.1f}%)")
print(f"  Precision: {ae_precision:.3f}")
print(f"  Recall: {ae_recall:.3f}")
print(f"  False Positive Rate: {ae_fpr:.3f}")

# Visualize reconstruction errors
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Error distribution
axes[0].hist(reconstruction_errors[y_true==0], bins=50, alpha=0.7, label='Normal', color='blue', edgecolor='black')
axes[0].hist(reconstruction_errors[y_true==1], bins=50, alpha=0.7, label='Fraud', color='red', edgecolor='black')
axes[0].axvline(error_threshold, color='green', linestyle='--', linewidth=2, label=f'Threshold ({error_threshold:.3f})')
axes[0].set_xlabel('Reconstruction Error')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Autoencoder Reconstruction Errors', fontweight='bold')
axes[0].legend()
axes[0].grid(alpha=0.3)

# Training history
axes[1].plot(history.history['loss'], label='Training Loss', linewidth=2)
axes[1].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('MSE Loss')
axes[1].set_title('Autoencoder Training History', fontweight='bold')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

### Comparing Detection Methods

In [None]:
# Summary comparison
print("\n" + "="*70)
print("METHOD COMPARISON SUMMARY")
print("="*70)

comparison = pd.DataFrame({
    'Method': ['Z-Score', 'Mahalanobis', 'Isolation Forest', 'Autoencoder'],
    'Precision': [z_precision, mahal_precision, iso_precision, ae_precision],
    'Recall': [z_recall, mahal_recall, iso_recall, ae_recall],
    'FPR': [z_fpr, mahal_fpr, iso_fpr, ae_fpr],
    'Flagged': [
        z_anomalies.sum(),
        mahal_anomalies.sum(),
        iso_anomalies.sum(),
        ae_anomalies.sum()
    ]
})

print("\n", comparison.to_string(index=False))

# Calculate F1 scores
comparison['F1'] = 2 * (comparison['Precision'] * comparison['Recall']) / (comparison['Precision'] + comparison['Recall'])

print("\n" + "-"*70)
print("F1 Scores (harmonic mean of precision and recall):")
print("-"*70)
for idx, row in comparison.iterrows():
    print(f"  {row['Method']:20s}: {row['F1']:.3f}")

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Precision-Recall comparison
x_pos = np.arange(len(comparison))
width = 0.35

axes[0].bar(x_pos - width/2, comparison['Precision'], width, label='Precision', alpha=0.8)
axes[0].bar(x_pos + width/2, comparison['Recall'], width, label='Recall', alpha=0.8)
axes[0].set_xlabel('Method')
axes[0].set_ylabel('Score')
axes[0].set_title('Precision vs Recall by Method', fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(comparison['Method'], rotation=45, ha='right')
axes[0].legend()
axes[0].grid(alpha=0.3, axis='y')

# F1 score comparison
axes[1].bar(x_pos, comparison['F1'], color='green', alpha=0.8)
axes[1].set_xlabel('Method')
axes[1].set_ylabel('F1 Score')
axes[1].set_title('Overall Performance (F1 Score)', fontweight='bold')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(comparison['Method'], rotation=45, ha='right')
axes[1].grid(alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n💡 Key Insights:")
print("  - No single method dominates all metrics")
print("  - Precision-recall trade-off varies by method")
print("  - Production systems often ensemble multiple methods")
print("  - False positive management is critical for operational deployment")

### Reflection Questions (Exercise 2)

Write 250-300 words addressing:

1. **Method Comparison**: Which anomaly detection method performed best on this dataset? Consider both quantitative metrics (precision, recall, F1) and qualitative factors (interpretability, computational cost, ease of deployment).

2. **Trade-offs**: The methods show different precision-recall trade-offs. For financial fraud detection, which is more important—catching all fraud (high recall) or minimizing false alarms (high precision)? How does this answer depend on context (transaction amount, customer relationship, regulatory requirements)?

3. **Adversarial Adaptation**: Criminals who know the detection system might craft transactions to evade detection. Which method is most robust to adversarial attacks? Which is most vulnerable? How can detection systems adapt to evolving fraud tactics?

## Exercise 3: Network Analysis for Fraud Detection

### Building Transaction Networks

Fraud often involves networks—money mules, laundering chains, coordinated account takeovers. Graph analytics reveals patterns invisible to transaction-level analysis.

In [None]:
# Generate synthetic transaction network
np.random.seed(42)

# Create addresses (nodes)
n_addresses = 200
addresses = [f"addr_{i:04d}" for i in range(n_addresses)]

# Generate transactions (edges) with different patterns
edges = []

# Normal transactions: random pairs with preferential attachment
for _ in range(300):
    # Preferential attachment: popular addresses receive more transactions
    source = np.random.choice(addresses)
    # Weighted choice favoring certain addresses (simulating exchanges, merchants)
    weights = np.array([1 / (i+1) for i in range(n_addresses)])
    weights = weights / weights.sum()
    target = np.random.choice(addresses, p=weights)
    
    if source != target:
        amount = np.random.lognormal(3, 1.5)
        edges.append((source, target, amount, 'normal'))

# Fraud pattern 1: Mixing service (one source, many outputs)
mixer_source = addresses[0]
for i in range(20):
    target = np.random.choice(addresses[50:150])
    amount = np.random.lognormal(4, 1)
    edges.append((mixer_source, target, amount, 'mixing'))

# Fraud pattern 2: Layering (sequential chain)
chain_start = addresses[150]
chain = [chain_start] + list(np.random.choice(addresses[151:170], size=10, replace=False))
for i in range(len(chain)-1):
    amount = np.random.lognormal(5, 0.5)
    edges.append((chain[i], chain[i+1], amount, 'layering'))

# Fraud pattern 3: Tight fraud ring (densely connected subgraph)
fraud_ring = addresses[170:180]
for i, addr1 in enumerate(fraud_ring):
    for addr2 in fraud_ring[i+1:]:
        if np.random.random() < 0.4:  # 40% connection probability within ring
            amount = np.random.lognormal(4, 1)
            edges.append((addr1, addr2, amount, 'fraud_ring'))

# Convert to DataFrame
edge_df = pd.DataFrame(edges, columns=['source', 'target', 'amount', 'pattern'])

print("="*70)
print("TRANSACTION NETWORK")
print("="*70)
print(f"\nNodes (addresses): {n_addresses}")
print(f"Edges (transactions): {len(edge_df)}")
print("\nPattern distribution:")
print(edge_df['pattern'].value_counts())

### Network Metrics and Centrality Analysis

In [None]:
# Build directed graph
G = nx.DiGraph()

for _, row in edge_df.iterrows():
    G.add_edge(row['source'], row['target'], 
               amount=row['amount'], pattern=row['pattern'])

print("\n" + "="*70)
print("NETWORK STATISTICS")
print("="*70)
print(f"\nNodes: {G.number_of_nodes()}")
print(f"Edges: {G.number_of_edges()}")
print(f"Density: {nx.density(G):.4f}")
print(f"Average degree: {sum(dict(G.degree()).values()) / G.number_of_nodes():.2f}")

# Calculate centrality metrics
degree_centrality = nx.degree_centrality(G)
in_degree_centrality = nx.in_degree_centrality(G)
out_degree_centrality = nx.out_degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)

# Identify suspicious nodes based on centrality
print("\n" + "-"*70)
print("TOP 10 NODES BY CENTRALITY")
print("-"*70)

# Degree centrality (most connected)
top_degree = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:10]
print("\nDegree Centrality (most connected):")
for addr, score in top_degree:
    in_deg = G.in_degree(addr)
    out_deg = G.out_degree(addr)
    print(f"  {addr}: {score:.4f} (in={in_deg}, out={out_deg})")

# Betweenness centrality (bridges/brokers)
top_betweenness = sorted(betweenness_centrality.items(), key=lambda x: x[1], reverse=True)[:10]
print("\nBetweenness Centrality (bridges between communities):")
for addr, score in top_betweenness[:5]:  # Show top 5
    print(f"  {addr}: {score:.4f}")

# Out-degree centrality (distributors)
top_out = sorted(out_degree_centrality.items(), key=lambda x: x[1], reverse=True)[:5]
print("\nOut-Degree Centrality (high output activity):")
for addr, score in top_out:
    print(f"  {addr}: {score:.4f} (out_degree={G.out_degree(addr)})")

### Community Detection

In [None]:
# Convert to undirected for community detection
G_undirected = G.to_undirected()

# Detect communities using Louvain method
try:
    import community as community_louvain
except ImportError:
    print("Installing python-louvain...")
    !pip install -q python-louvain
    import community as community_louvain

# Detect communities
communities = community_louvain.best_partition(G_undirected)

# Analyze community structure
community_sizes = {}
for node, comm_id in communities.items():
    community_sizes[comm_id] = community_sizes.get(comm_id, 0) + 1

print("\n" + "="*70)
print("COMMUNITY DETECTION")
print("="*70)
print(f"\nNumber of communities detected: {len(community_sizes)}")
print(f"\nCommunity sizes:")
for comm_id, size in sorted(community_sizes.items(), key=lambda x: x[1], reverse=True)[:10]:
    print(f"  Community {comm_id}: {size} nodes")

# Identify suspicious communities (very small tight groups)
suspicious_communities = [comm_id for comm_id, size in community_sizes.items() if 5 <= size <= 15]

print(f"\nSuspicious communities (5-15 nodes): {len(suspicious_communities)}")

# Analyze fraud ring community (addresses 170-180)
fraud_ring_communities = [communities[addr] for addr in addresses[170:180]]
fraud_ring_comm_id = max(set(fraud_ring_communities), key=fraud_ring_communities.count)

print(f"\nFraud ring detection:")
print(f"  Known fraud ring addresses assigned to community {fraud_ring_comm_id}")
print(f"  Community size: {community_sizes[fraud_ring_comm_id]} nodes")

### Network Visualization

In [None]:
# Visualize subnetwork focusing on high-activity nodes
fig, axes = plt.subplots(1, 2, figsize=(16, 7))

# Subgraph 1: High degree nodes
high_degree_nodes = [node for node, degree in G.degree() if degree >= 5]
G_high_degree = G.subgraph(high_degree_nodes)

pos1 = nx.spring_layout(G_high_degree, k=0.5, iterations=50, seed=42)
node_colors1 = ['red' if node in addresses[0:1] or node in addresses[170:180] 
                else 'lightblue' for node in G_high_degree.nodes()]

nx.draw_networkx_nodes(G_high_degree, pos1, node_color=node_colors1, 
                       node_size=300, alpha=0.8, ax=axes[0])
nx.draw_networkx_edges(G_high_degree, pos1, alpha=0.3, arrows=True, 
                       arrowsize=10, ax=axes[0])
nx.draw_networkx_labels(G_high_degree, pos1, font_size=7, ax=axes[0])

axes[0].set_title('High-Activity Nodes Subnetwork\n(Red = suspicious)', 
                  fontweight='bold')
axes[0].axis('off')

# Subgraph 2: Known fraud ring
fraud_ring_extended = addresses[170:180]
# Add neighbors for context
for addr in fraud_ring_extended:
    fraud_ring_extended.extend(list(G.neighbors(addr)))
fraud_ring_extended = list(set(fraud_ring_extended))

G_fraud_ring = G.subgraph(fraud_ring_extended)

pos2 = nx.spring_layout(G_fraud_ring, k=0.8, iterations=50, seed=42)
node_colors2 = ['red' if node in addresses[170:180] else 'lightgray' 
                for node in G_fraud_ring.nodes()]

nx.draw_networkx_nodes(G_fraud_ring, pos2, node_color=node_colors2, 
                       node_size=400, alpha=0.8, ax=axes[1])
nx.draw_networkx_edges(G_fraud_ring, pos2, alpha=0.4, arrows=True, 
                       arrowsize=12, ax=axes[1])
nx.draw_networkx_labels(G_fraud_ring, pos2, font_size=8, ax=axes[1])

axes[1].set_title('Fraud Ring and Neighbors\n(Red = fraud ring members)', 
                  fontweight='bold')
axes[1].axis('off')

plt.tight_layout()
plt.show()

# Calculate graph metrics for fraud detection
print("\n" + "="*70)
print("FRAUD PATTERN DETECTION SUMMARY")
print("="*70)

# Check if mixer was identified
mixer_score = out_degree_centrality[mixer_source]
print(f"\nMixing service detection:")
print(f"  Address: {mixer_source}")
print(f"  Out-degree centrality: {mixer_score:.4f} (rank: {sorted(out_degree_centrality.values(), reverse=True).index(mixer_score)+1})")
print(f"  Out-degree: {G.out_degree(mixer_source)} (should be high for mixer)")

# Check if layering chain was identified
chain_between = [betweenness_centrality[addr] for addr in chain[1:-1]]  # Middle nodes
avg_chain_between = np.mean(chain_between)
print(f"\nLayering chain detection:")
print(f"  Average betweenness of chain nodes: {avg_chain_between:.4f}")
print(f"  These nodes should have high betweenness (act as bridges)")

# Check if fraud ring was identified as community
fraud_ring_in_community = sum(1 for addr in addresses[170:180] 
                               if communities[addr] == fraud_ring_comm_id)
print(f"\nFraud ring detection:")
print(f"  Fraud ring members in same community: {fraud_ring_in_community}/10")
print(f"  Community density should be high for fraud rings")

print("\n💡 Network analytics revealed patterns invisible to transaction-level analysis!")

### Reflection Questions (Exercise 3)

Write 200-250 words addressing:

1. **Network Patterns**: How do mixing services, layering chains, and fraud rings manifest differently in network structure? Which centrality metrics are most effective for detecting each pattern?

2. **Scalability**: Real blockchain networks have millions of nodes and billions of edges. What technical challenges arise when scaling these analyses to production systems? What approximations or sampling strategies might be necessary?

3. **Privacy Trade-offs**: Network analysis on transparent blockchains enables sophisticated fraud detection but reveals transaction relationships. How should regulators and system designers balance fraud detection capabilities against user privacy? Are there techniques that preserve privacy whilst enabling effective monitoring?

## Summary and Integration

### What We've Learned

Through these exercises, you've:

1. **Analyzed blockchain transaction data** identifying distinguishing features of fraudulent versus normal activity

2. **Implemented multiple anomaly detection methods** (statistical, Isolation Forest, autoencoders) and compared their effectiveness

3. **Evaluated precision-recall trade-offs** understanding that no method perfectly detects all fraud without false positives

4. **Applied network analysis** detecting fraud patterns invisible to transaction-level methods—mixing, layering, fraud rings

5. **Visualized suspicious subnetworks** making abstract analytics interpretable for fraud analysts

6. **Considered operational deployment** thinking beyond algorithms to false positive management, interpretability, and adversarial robustness

### Connections to Course Themes

- **Week 7 (Cryptocurrency)**: Blockchain transparency enables analyses impossible with traditional banking data, but pseudonymity limits identification

- **Week 6 (Financial Inclusion)**: Fraud detection systems must balance security with access—overly aggressive detection excludes legitimate users, especially marginalized populations

- **Week 3 (Platforms)**: Exchanges and payment platforms need fraud detection at scale, balancing automated systems with human review

- **Week 2 (APIs)**: Real-world deployment requires integrating fraud detection with blockchain APIs, streaming transaction feeds, and alert systems

### Critical Evaluation Framework

When evaluating fraud detection systems:

1. **Quantitative performance**: Precision, recall, F1 scores, false positive rates
2. **Operational feasibility**: Computational cost, latency, scalability, integration complexity
3. **Interpretability**: Can analysts understand why transactions were flagged?
4. **Adversarial robustness**: How easily can criminals evade detection?
5. **Fairness**: Does system discriminate against certain users or transaction types?
6. **Regulatory compliance**: Does system meet AML/KYC requirements?

### Assessment Preparation

**FIN510 Coursework 2**: You can analyze transaction patterns in cryptocurrency or traditional payment data, applying anomaly detection or network analysis techniques from this lab.

**FIN720**: Critical evaluation of blockchain fraud detection claims makes excellent reflective analysis—compare promised benefits to empirical evidence, identify limitations, propose improvements based on academic research.

### Further Exploration

If interested in extending your analysis:

- **Temporal dynamics**: How do fraud patterns evolve? Can models detect pattern shifts?
- **Deep learning on graphs**: Graph neural networks (GNNs) for end-to-end fraud detection
- **Privacy-preserving detection**: Federated learning, differential privacy for fraud detection without exposing transaction details
- **Adversarial ML**: How do criminals evade detection? Can we make systems more robust?
- **Cross-chain analysis**: Detecting fraud spanning multiple blockchains (bridges, wrapped tokens)

---

**Excellent work! You've implemented production-grade fraud detection techniques, connecting technical methods to operational realities and regulatory requirements.**
