---
title: "Lab 11: Market Surveillance & Detection Algorithms"
format:
  html:
    code-fold: false
    code-tools: true
    toc: true
  ipynb: default
bibliography: ../resources/reading.bib
jupyter: python3
execute:
  eval: false
---

## Before You Code: The Big Picture

**Every trade you make is watched.** Exchanges run real-time surveillance systems scanning millions of orders per day for manipulation. Can you build algorithms that catch bad actors without drowning investigators in false positives?

::: {.callout-note}
## The Market Surveillance Challenge

**Why Surveillance Exists:**
- **Spoofing**: Place large orders to move prices, cancel before execution
- **Wash trading**: Trade with yourself to inflate volume, manipulate prices
- **Layering**: Stack multiple fake orders to create false supply/demand
- **Front-running**: Use non-public information to trade ahead of clients

**The Regulatory Mandate:**
- **MAR (Europe)**: Market Abuse Regulation requires surveillance of all trading venues
- **SEC/FINRA (US)**: Exchanges must detect and report suspicious activity
- **MiFID II**: Transaction reporting within 24 hours, audit trails for 5 years

**The Problem:**
- **Scale**: NYSE processes ~10B quotes per day, NASDAQ ~20B
- **False positives**: 95-99% of alerts are legitimate activity (Nasdaq 2019)
- **Cost**: $10-30K to investigate each alert (compliance team time)
- **Adversarial**: Manipulators adapt to detection systems

**The Evidence:**
- $3.4B in fines for market manipulation (2020-2022, FCA/SEC/CFTC)
- JP Morgan: $920M fine for spoofing precious metals (2020)
- Navinder Sarao: $41M penalty for Flash Crash spoofing (2010)
- Most violations go undetected (estimates: <10% detection rate)
:::

### What You'll Build Today

By the end of this lab, you will have:

- ✅ Spoofing detector analyzing order book patterns
- ✅ Wash trading detector using network analysis
- ✅ ML alert triage system prioritizing investigations
- ✅ Understanding of precision-recall tradeoffs in surveillance

**Time estimate:** FIN510: 75 min (Ex 1-2) | FIN720: 100 min (all exercises)

::: {.callout-important}
## Why This Matters
Financial surveillance is a $50B+ industry (compliance, RegTech, monitoring systems). Every bank, exchange, and broker needs surveillance analysts and systems engineers. If you can build detection algorithms and manage false positive rates, you're employable at FINRA, exchanges (NYSE, NASDAQ), or compliance teams at any bank.
:::

## Introduction

This lab applies market surveillance concepts to realistic scenarios detecting manipulation patterns. You'll implement detection algorithms used in production surveillance systems, experience false positive challenges, and evaluate trade-offs between sensitivity and specificity. The exercises connect technical detection to regulatory requirements and operational constraints shaping real-world surveillance deployment.

Market surveillance protects market integrity by detecting and deterring manipulation—spoofing, layering, wash trading, front-running, and other prohibited behaviors. Regulatory frameworks (MAR in Europe, SEC/FINRA in US) mandate surveillance systems generating alerts for suspicious activity. However, perfect detection is impossible—surveillance must balance catching violations against managing false positives that overwhelm investigators. This lab makes these trade-offs concrete through hands-on implementation.

We'll work with three scenarios increasingly sophisticated. First, spoofing detection using order book data—implementing rules-based detector and calibrating thresholds balancing false positives against false negatives. Second, wash trading detection using transaction data and graph analytics—identifying circular trading networks and quantifying volume inflation. Third, alert triage using machine learning—prioritizing investigator attention by predicting which alerts represent genuine violations versus false positives.

**Prerequisites**: Understanding of Week 11 material (surveillance frameworks, manipulation patterns, detection approaches), Python programming, pandas data manipulation, and basic machine learning concepts from previous labs.

**Learning Objectives**: By completing this lab, you will be able to implement pattern detection algorithms for market manipulation, evaluate detection performance using appropriate metrics, apply graph analytics to transaction surveillance, use machine learning for alert prioritization, and assess surveillance system effectiveness considering operational constraints.

::: {.callout-note}
### FIN510 vs FIN720

- **FIN510 students**: Complete Exercises 1-2 (pattern detection)
- **FIN720 students**: Complete all three exercises including ML triage
:::

## Exercise 1: Detecting Spoofing Patterns

### Context

Spoofing manipulates order books by placing orders without execution intent—creating false supply/demand signals inducing price movements that benefit the spoofer. Regulators prohibit spoofing under MAR (Europe) and Dodd-Frank (US); violations carry severe penalties including imprisonment. Surveillance systems must detect spoofing patterns whilst avoiding false positives from legitimate market making strategies that also involve high cancellation rates.

This exercise provides simulated order book data from a single security trading session. The data contains both legitimate market making (providing liquidity, canceling based on market conditions) and injected spoofing (large orders placed opposite desired direction, cancelled quickly after triggering desired price movement). Your task is implementing a spoofing detector and calibrating thresholds.

### Generating Simulated Order Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Set random seed for reproducibility
np.random.seed(42)

# Simulation parameters
n_participants = 10
n_orders = 5000
session_start = datetime(2024, 10, 1, 9, 30)

# Generate orders
orders = []
for i in range(n_orders):
    # Participant (two are spoofers)
    participant_id = np.random.choice(range(n_participants), 
                                     p=[0.15, 0.15, 0.10, 0.10, 0.08, 
                                        0.08, 0.08, 0.08, 0.09, 0.09])
    
    # Spoofers are participants 0 and 1
    is_spoofer = participant_id in [0, 1]
    
    # Time
    timestamp = session_start + timedelta(
        microseconds=np.random.uniform(0, 6.5*3600*1e6)
    )
    
    # Side
    if is_spoofer:
        # Spoofers alternate: place spoof on one side, trade on other
        side = 'buy' if np.random.random() < 0.3 else 'sell'
        if side == 'sell':  # Spoof sell orders (want to buy)
            is_spoof = np.random.random() < 0.8
        else:  # Real buy orders
            is_spoof = False
    else:
        side = 'buy' if np.random.random() < 0.5 else 'sell'
        is_spoof = False
    
    # Price (around 100 with noise)
    mid_price = 100 + np.sin(i/200) * 0.5 + np.random.normal(0, 0.1)
    
    if side == 'buy':
        price = mid_price - np.random.uniform(0.01, 0.10)
    else:
        price = mid_price + np.random.uniform(0.01, 0.10)
    
    # Quantity
    if is_spoof:
        quantity = np.random.randint(5000, 20000)  # Large spoof orders
    else:
        quantity = np.random.randint(100, 2000)
    
    # Outcome: spoof orders cancelled quickly, others fill or cancel based on market
    if is_spoof:
        time_to_cancel = np.random.uniform(0.1, 2.0)  # 100ms - 2s
        outcome = 'cancelled'
        fill_price = None
    elif is_spoofer and not is_spoof:
        # Spoofer's real orders fill (benefiting from manipulation)
        time_to_cancel = None
        outcome = 'filled'
        fill_price = price
    else:
        # Legitimate orders: 40% fill, 60% cancel after longer time
        if np.random.random() < 0.4:
            outcome = 'filled'
            fill_price = price
            time_to_cancel = None
        else:
            outcome = 'cancelled'
            fill_price = None
            time_to_cancel = np.random.uniform(10, 300)  # 10s - 5min
    
    orders.append({
        'order_id': f'ORD_{i:06d}',
        'participant_id': f'P{participant_id:02d}',
        'timestamp': timestamp,
        'side': side,
        'price': round(price, 2),
        'quantity': quantity,
        'outcome': outcome,
        'fill_price': fill_price,
        'time_to_cancel': time_to_cancel,
        'is_spoof_actual': is_spoof  # Ground truth (not observable in practice)
    })

orders_df = pd.DataFrame(orders)
orders_df = orders_df.sort_values('timestamp').reset_index(drop=True)

print(f"Generated {len(orders_df)} orders from {n_participants} participants")
print(f"\nSpoof orders (ground truth): {orders_df['is_spoof_actual'].sum()}")
print(f"\nFirst few orders:")
print(orders_df[['order_id', 'participant_id', 'timestamp', 
                 'side', 'price', 'quantity', 'outcome']].head(10))

### Task 1.1: Participant-Level Statistics

Calculate participant-level statistics indicative of spoofing behavior:

- **Cancel rate**: Fraction of orders cancelled vs filled
- **Average time to cancel**: How quickly cancelled orders are cancelled
- **Opposite-side fills**: Whether participant has fills opposite their cancelled orders
- **Order size distribution**: Cancelled orders vs filled orders

In [None]:
# Calculate participant statistics
def calculate_participant_stats(orders_df):
    """
    Calculate participant-level statistics for spoofing detection.
    """
    stats_list = []
    
    for participant in orders_df['participant_id'].unique():
        p_orders = orders_df[orders_df['participant_id'] == participant]
        
        # Total orders
        n_orders = len(p_orders)
        
        # Cancel rate
        n_cancelled = (p_orders['outcome'] == 'cancelled').sum()
        n_filled = (p_orders['outcome'] == 'filled').sum()
        cancel_rate = n_cancelled / n_orders if n_orders > 0 else 0
        
        # Average time to cancel (for cancelled orders)
        cancelled_orders = p_orders[p_orders['outcome'] == 'cancelled']
        avg_time_to_cancel = cancelled_orders['time_to_cancel'].mean() if len(cancelled_orders) > 0 else np.nan
        
        # Median time to cancel (more robust to outliers)
        median_time_to_cancel = cancelled_orders['time_to_cancel'].median() if len(cancelled_orders) > 0 else np.nan
        
        # Average cancelled order size vs filled order size
        cancelled_size = cancelled_orders['quantity'].mean() if len(cancelled_orders) > 0 else 0
        filled_orders = p_orders[p_orders['outcome'] == 'filled']
        filled_size = filled_orders['quantity'].mean() if len(filled_orders) > 0 else 0
        size_ratio = cancelled_size / filled_size if filled_size > 0 else np.inf
        
        # Check for opposite-side activity
        has_buy_fills = ((p_orders['side'] == 'buy') & (p_orders['outcome'] == 'filled')).any()
        has_sell_fills = ((p_orders['side'] == 'sell') & (p_orders['outcome'] == 'filled')).any()
        has_buy_cancels = ((p_orders['side'] == 'buy') & (p_orders['outcome'] == 'cancelled')).any()
        has_sell_cancels = ((p_orders['side'] == 'sell') & (p_orders['outcome'] == 'cancelled')).any()
        
        # Spoof pattern: cancelled on one side, filled on other
        potential_spoof_pattern = (has_buy_fills and has_sell_cancels) or (has_sell_fills and has_buy_cancels)
        
        # Profitability proxy: did fills benefit from cancelled orders?
        # Simplified: look at timing - fills after cancels on opposite side
        profitability_score = 0
        if potential_spoof_pattern:
            for _, fill_order in filled_orders.iterrows():
                # Count recent cancels on opposite side
                opposite_side = 'sell' if fill_order['side'] == 'buy' else 'buy'
                recent_cancels = cancelled_orders[
                    (cancelled_orders['side'] == opposite_side) &
                    (cancelled_orders['timestamp'] < fill_order['timestamp']) &
                    (cancelled_orders['timestamp'] > fill_order['timestamp'] - timedelta(seconds=10))
                ]
                profitability_score += len(recent_cancels)
        
        stats_list.append({
            'participant_id': participant,
            'n_orders': n_orders,
            'cancel_rate': cancel_rate,
            'avg_time_to_cancel_sec': avg_time_to_cancel,
            'median_time_to_cancel_sec': median_time_to_cancel,
            'cancelled_size': cancelled_size,
            'filled_size': filled_size,
            'size_ratio': size_ratio,
            'potential_spoof_pattern': potential_spoof_pattern,
            'profitability_score': profitability_score
        })
    
    return pd.DataFrame(stats_list)

# Calculate statistics
participant_stats = calculate_participant_stats(orders_df)

# Sort by cancel rate (high to low)
participant_stats = participant_stats.sort_values('cancel_rate', ascending=False)

print("Participant Statistics:")
print(participant_stats.to_string(index=False))

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Cancel rate
axes[0, 0].barh(participant_stats['participant_id'], 
                participant_stats['cancel_rate'])
axes[0, 0].set_xlabel('Cancel Rate')
axes[0, 0].set_title('Cancel Rate by Participant')
axes[0, 0].axvline(0.95, color='red', linestyle='--', label='High threshold (0.95)')
axes[0, 0].legend()

# Time to cancel
axes[0, 1].barh(participant_stats['participant_id'], 
                participant_stats['median_time_to_cancel_sec'])
axes[0, 1].set_xlabel('Median Time to Cancel (seconds)')
axes[0, 1].set_title('Cancellation Speed by Participant')
axes[0, 1].axvline(5, color='red', linestyle='--', label='Fast threshold (5s)')
axes[0, 1].legend()

# Size ratio
axes[1, 0].barh(participant_stats['participant_id'], 
                participant_stats['size_ratio'])
axes[1, 0].set_xlabel('Cancelled Size / Filled Size')
axes[1, 0].set_title('Order Size Ratio by Participant')
axes[1, 0].axvline(2, color='red', linestyle='--', label='Suspicious threshold (2x)')
axes[1, 0].set_xlim(0, 10)
axes[1, 0].legend()

# Profitability score
axes[1, 1].barh(participant_stats['participant_id'], 
                participant_stats['profitability_score'])
axes[1, 1].set_xlabel('Profitability Score')
axes[1, 1].set_title('Opposite-Side Profitability by Participant')

plt.tight_layout()
plt.savefig('spoofing_participant_stats.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n📊 Visualization saved as 'spoofing_participant_stats.png'")

### Task 1.2: Implement Spoofing Detection Rule

Implement a rule-based spoofing detector using thresholds on calculated statistics:

In [None]:
def detect_spoofing(participant_stats, 
                   cancel_rate_threshold=0.90,
                   time_to_cancel_threshold=5.0,
                   size_ratio_threshold=2.0,
                   min_profitability_score=1):
    """
    Detect potential spoofing based on participant statistics.
    
    Parameters:
    -----------
    participant_stats : DataFrame
        Participant-level statistics
    cancel_rate_threshold : float
        Minimum cancel rate to trigger alert
    time_to_cancel_threshold : float
        Maximum median time to cancel (seconds) to trigger alert
    size_ratio_threshold : float
        Minimum cancelled/filled size ratio to trigger alert
    min_profitability_score : int
        Minimum profitability score to trigger alert
        
    Returns:
    --------
    DataFrame with alerts and scores
    """
    alerts = []
    
    for _, row in participant_stats.iterrows():
        # Rule checks
        high_cancel_rate = row['cancel_rate'] >= cancel_rate_threshold
        fast_cancellation = row['median_time_to_cancel_sec'] <= time_to_cancel_threshold
        large_cancelled_orders = row['size_ratio'] >= size_ratio_threshold
        profitable_pattern = row['profitability_score'] >= min_profitability_score
        has_spoof_pattern = row['potential_spoof_pattern']
        
        # Scoring: sum of conditions met
        conditions_met = sum([high_cancel_rate, fast_cancellation, 
                             large_cancelled_orders, profitable_pattern, 
                             has_spoof_pattern])
        
        # Generate alert if multiple conditions met
        if conditions_met >= 3:
            alert_priority = 'HIGH' if conditions_met >= 4 else 'MEDIUM'
            
            alerts.append({
                'participant_id': row['participant_id'],
                'alert_priority': alert_priority,
                'conditions_met': conditions_met,
                'high_cancel_rate': high_cancel_rate,
                'fast_cancellation': fast_cancellation,
                'large_cancelled_orders': large_cancelled_orders,
                'profitable_pattern': profitable_pattern,
                'has_spoof_pattern': has_spoof_pattern,
                'cancel_rate': row['cancel_rate'],
                'median_time_to_cancel': row['median_time_to_cancel_sec'],
                'size_ratio': row['size_ratio'],
                'profitability_score': row['profitability_score']
            })
    
    return pd.DataFrame(alerts)

# Detect spoofing with default thresholds
alerts = detect_spoofing(participant_stats)

print(f"\n🚨 Generated {len(alerts)} alerts")
if len(alerts) > 0:
    print("\nAlerts:")
    print(alerts[['participant_id', 'alert_priority', 'conditions_met', 
                  'cancel_rate', 'median_time_to_cancel', 'size_ratio']].to_string(index=False))

# Evaluate against ground truth
# Count actual spoofers (P00 and P01)
actual_spoofers = ['P00', 'P01']
detected_participants = alerts['participant_id'].tolist() if len(alerts) > 0 else []

true_positives = sum(1 for p in detected_participants if p in actual_spoofers)
false_positives = sum(1 for p in detected_participants if p not in actual_spoofers)
false_negatives = sum(1 for p in actual_spoofers if p not in detected_participants)
true_negatives = sum(1 for p in participant_stats['participant_id'] 
                    if p not in detected_participants and p not in actual_spoofers)

precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"\n📊 Detection Performance:")
print(f"True Positives:  {true_positives}")
print(f"False Positives: {false_positives}")
print(f"False Negatives: {false_negatives}")
print(f"True Negatives:  {true_negatives}")
print(f"\nPrecision: {precision:.2%}")
print(f"Recall:    {recall:.2%}")
print(f"F1 Score:  {f1:.2%}")

### Task 1.3: Threshold Calibration

Experiment with different threshold combinations and observe the precision/recall trade-off:

In [None]:
# Test multiple threshold combinations
threshold_experiments = [
    {'cancel_rate': 0.85, 'time': 10.0, 'size_ratio': 1.5, 'prof': 1, 'name': 'Sensitive'},
    {'cancel_rate': 0.90, 'time': 5.0, 'size_ratio': 2.0, 'prof': 1, 'name': 'Moderate'},
    {'cancel_rate': 0.95, 'time': 2.0, 'size_ratio': 3.0, 'prof': 2, 'name': 'Conservative'},
    {'cancel_rate': 0.98, 'time': 1.0, 'size_ratio': 5.0, 'prof': 3, 'name': 'Very Conservative'}
]

results = []

for exp in threshold_experiments:
    alerts = detect_spoofing(participant_stats,
                           cancel_rate_threshold=exp['cancel_rate'],
                           time_to_cancel_threshold=exp['time'],
                           size_ratio_threshold=exp['size_ratio'],
                           min_profitability_score=exp['prof'])
    
    detected = alerts['participant_id'].tolist() if len(alerts) > 0 else []
    tp = sum(1 for p in detected if p in actual_spoofers)
    fp = sum(1 for p in detected if p not in actual_spoofers)
    fn = sum(1 for p in actual_spoofers if p not in detected)
    
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    
    results.append({
        'configuration': exp['name'],
        'n_alerts': len(alerts),
        'true_positives': tp,
        'false_positives': fp,
        'false_negatives': fn,
        'precision': precision,
        'recall': recall,
        'f1_score': f1
    })

results_df = pd.DataFrame(results)
print("\n📊 Threshold Calibration Results:")
print(results_df.to_string(index=False))

# Visualize precision-recall trade-off
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(results_df['recall'], results_df['precision'], 'o-', linewidth=2, markersize=10)

for _, row in results_df.iterrows():
    ax.annotate(row['configuration'], 
               (row['recall'], row['precision']),
               textcoords="offset points", 
               xytext=(0,10), 
               ha='center',
               fontsize=9)

ax.set_xlabel('Recall (True Positive Rate)')
ax.set_ylabel('Precision')
ax.set_title('Precision-Recall Trade-off: Spoofing Detection Calibration')
ax.grid(True, alpha=0.3)
ax.set_xlim(-0.05, 1.05)
ax.set_ylim(-0.05, 1.05)

plt.tight_layout()
plt.savefig('spoofing_precision_recall.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n📊 Visualization saved as 'spoofing_precision_recall.png'")

### Reflection Questions

1. **Threshold Selection**: Which threshold configuration would you deploy in production? Consider both detection effectiveness (F1 score) and operational constraints (investigation capacity). How would you justify your choice to regulators?

2. **False Positives**: Examine false positive alerts. What legitimate trading behaviors trigger alerts? How could detection be refined to reduce false positives without sacrificing recall?

3. **Evidence Requirements**: Suppose an alert escalates to regulatory investigation. What additional evidence beyond statistical patterns would strengthen the case? (Consider: trader communications, algorithm code, market impact analysis, pattern repetition)

4. **Adversarial Adaptation**: If spoofers know detection thresholds, how might they adapt tactics to evade detection whilst still benefiting from manipulation? What surveillance enhancements would counter such adaptation?

---

## Exercise 2: Wash Trading Detection

### Context

Wash trading creates false volume impressions through self-dealing or coordinated trading between related parties—violating market manipulation rules worldwide. Detection requires identifying transactions where beneficial ownership is identical or related parties trade in prearranged manner. Graph analytics reveal circular trading networks and relationships difficult to detect from individual transactions.

This exercise provides simulated transaction data including legitimate trading and injected wash trading. You'll construct transaction networks, calculate network metrics identifying suspicious subgraphs, and quantify volume inflation from wash trades.

### Generating Transaction Network Data

In [None]:
import networkx as nx
from itertools import combinations

# Generate transaction data with wash trading
np.random.seed(42)

n_traders = 20
n_transactions = 1000

# Define wash trading groups (coordinated traders)
wash_groups = [
    ['T00', 'T01', 'T02'],  # Group 1: circular trading
    ['T15', 'T16']  # Group 2: simple wash trading pair
]

transactions = []

for i in range(n_transactions):
    # Determine if this is wash trade
    is_wash = np.random.random() < 0.15
    
    if is_wash:
        # Pick wash group
        group = wash_groups[np.random.choice(len(wash_groups))]
        buyer = np.random.choice(group)
        # Seller is different member of same group
        seller = np.random.choice([t for t in group if t != buyer])
        price = 100 + np.random.normal(0, 0.5)  # Tight price clustering
    else:
        # Legitimate trade
        buyer = f'T{np.random.randint(0, n_traders):02d}'
        seller = f'T{np.random.randint(0, n_traders):02d}'
        while seller == buyer:  # Avoid self-trading in legitimate trades
            seller = f'T{np.random.randint(0, n_traders):02d}'
        price = 100 + np.random.normal(0, 2.0)  # More price dispersion
    
    quantity = np.random.randint(100, 1000) if not is_wash else np.random.randint(500, 2000)
    
    transactions.append({
        'transaction_id': f'TX{i:04d}',
        'buyer': buyer,
        'seller': seller,
        'price': round(price, 2),
        'quantity': quantity,
        'value': round(price * quantity, 2),
        'timestamp': session_start + timedelta(seconds=np.random.uniform(0, 6.5*3600)),
        'is_wash_actual': is_wash  # Ground truth
    })

transactions_df = pd.DataFrame(transactions)
transactions_df = transactions_df.sort_values('timestamp').reset_index(drop=True)

print(f"Generated {len(transactions_df)} transactions")
print(f"Wash trades (ground truth): {transactions_df['is_wash_actual'].sum()}")
print(f"\nFirst few transactions:")
print(transactions_df[['transaction_id', 'buyer', 'seller', 
                       'price', 'quantity', 'value']].head(10))

### Task 2.1: Construct and Visualize Transaction Network

Build a directed graph where nodes are traders and edges are transactions:

In [None]:
# Construct transaction network
def build_transaction_network(transactions_df):
    """
    Build directed graph from transaction data.
    """
    G = nx.DiGraph()
    
    # Add edges with attributes
    for _, txn in transactions_df.iterrows():
        if G.has_edge(txn['buyer'], txn['seller']):
            # Update existing edge
            G[txn['buyer']][txn['seller']]['weight'] += 1
            G[txn['buyer']][txn['seller']]['total_value'] += txn['value']
        else:
            # New edge
            G.add_edge(txn['buyer'], txn['seller'], 
                      weight=1, 
                      total_value=txn['value'])
    
    return G

G = build_transaction_network(transactions_df)

print(f"\nNetwork Statistics:")
print(f"Nodes (traders): {G.number_of_nodes()}")
print(f"Edges (trader pairs): {G.number_of_edges()}")
print(f"Network density: {nx.density(G):.3f}")

# Calculate node metrics
in_degree = dict(G.in_degree())
out_degree = dict(G.out_degree())
reciprocity_scores = {}

for node in G.nodes():
    # Reciprocity: how many connections go both ways?
    neighbors_out = set(G.successors(node))
    neighbors_in = set(G.predecessors(node))
    reciprocal = neighbors_out.intersection(neighbors_in)
    
    total_connections = len(neighbors_out.union(neighbors_in))
    reciprocity = len(reciprocal) / total_connections if total_connections > 0 else 0
    reciprocity_scores[node] = reciprocity

# Visualize network
plt.figure(figsize=(14, 10))

# Use spring layout
pos = nx.spring_layout(G, k=0.5, iterations=50, seed=42)

# Node sizes based on degree
node_sizes = [300 * (in_degree[node] + out_degree[node]) for node in G.nodes()]

# Node colors based on reciprocity (red = high reciprocity = suspicious)
node_colors = [reciprocity_scores[node] for node in G.nodes()]

# Draw network
nx.draw_networkx_nodes(G, pos, node_size=node_sizes, 
                       node_color=node_colors, cmap='YlOrRd',
                       alpha=0.7, vmin=0, vmax=1)
nx.draw_networkx_labels(G, pos, font_size=8)
nx.draw_networkx_edges(G, pos, alpha=0.3, edge_color='gray',
                       arrows=True, arrowsize=10, arrowstyle='->')

plt.title('Transaction Network (Node Color = Reciprocity Score)', fontsize=14)
plt.colorbar(plt.cm.ScalarMappable(cmap='YlOrRd', 
                                   norm=plt.Normalize(vmin=0, vmax=1)),
            label='Reciprocity Score', shrink=0.8)
plt.axis('off')
plt.tight_layout()
plt.savefig('wash_trading_network.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n📊 Visualization saved as 'wash_trading_network.png'")

### Task 2.2: Detect Wash Trading Using Network Metrics

Calculate network metrics identifying suspicious trading patterns:

In [None]:
def detect_wash_trading_network(G, transactions_df, 
                               reciprocity_threshold=0.5,
                               min_transactions=3):
    """
    Detect potential wash trading using network analysis.
    """
    alerts = []
    
    # Find strongly connected components (circular trading)
    sccs = list(nx.strongly_connected_components(G))
    
    for scc in sccs:
        if len(scc) >= 2:  # Ignore single nodes
            subgraph = G.subgraph(scc)
            
            # Calculate metrics for this component
            scc_nodes = list(scc)
            
            # Count transactions within component
            internal_txns = transactions_df[
                (transactions_df['buyer'].isin(scc_nodes)) &
                (transactions_df['seller'].isin(scc_nodes))
            ]
            n_internal = len(internal_txns)
            
            # Calculate total volume
            internal_volume = internal_txns['value'].sum()
            
            # External transactions
            external_txns = transactions_df[
                (transactions_df['buyer'].isin(scc_nodes)) |
                (transactions_df['seller'].isin(scc_nodes))
            ]
            external_txns = external_txns[
                ~((external_txns['buyer'].isin(scc_nodes)) &
                  (external_txns['seller'].isin(scc_nodes)))
            ]
            n_external = len(external_txns)
            
            # Calculate insularity: internal vs external trading ratio
            insularity = n_internal / (n_internal + n_external) if (n_internal + n_external) > 0 else 0
            
            # Average reciprocity in component
            avg_reciprocity = np.mean([reciprocity_scores[node] for node in scc_nodes])
            
            # Detect wash trading if:
            # 1. High reciprocity
            # 2. Significant internal trading
            # 3. High insularity (trades mostly within group)
            if (avg_reciprocity >= reciprocity_threshold and 
                n_internal >= min_transactions and
                insularity > 0.3):
                
                alert_priority = 'HIGH' if insularity > 0.6 else 'MEDIUM'
                
                alerts.append({
                    'traders': ', '.join(sorted(scc_nodes)),
                    'n_traders': len(scc_nodes),
                    'alert_priority': alert_priority,
                    'internal_transactions': n_internal,
                    'external_transactions': n_external,
                    'insularity': insularity,
                    'avg_reciprocity': avg_reciprocity,
                    'internal_volume': internal_volume
                })
    
    return pd.DataFrame(alerts)

# Detect wash trading
wash_alerts = detect_wash_trading_network(G, transactions_df)

print(f"\n🚨 Generated {len(wash_alerts)} wash trading alerts")
if len(wash_alerts) > 0:
    print("\nAlerts:")
    print(wash_alerts.to_string(index=False))
    
    # Calculate volume inflation
    total_volume = transactions_df['value'].sum()
    wash_volume = wash_alerts['internal_volume'].sum()
    inflation_pct = (wash_volume / total_volume) * 100
    
    print(f"\n📊 Volume Inflation Analysis:")
    print(f"Total volume: ${total_volume:,.0f}")
    print(f"Suspected wash volume: ${wash_volume:,.0f}")
    print(f"Inflation: {inflation_pct:.1f}% of reported volume")
    
    # Ground truth comparison
    actual_wash_volume = transactions_df[transactions_df['is_wash_actual']]['value'].sum()
    print(f"\nActual wash volume (ground truth): ${actual_wash_volume:,.0f}")
    print(f"Detection accuracy: {(wash_volume / actual_wash_volume * 100):.1f}% of true wash volume detected")

### Task 2.3: Visualize Suspicious Subnetworks

Highlight detected wash trading networks:

In [None]:
if len(wash_alerts) > 0:
    fig, axes = plt.subplots(1, len(wash_alerts), figsize=(7*len(wash_alerts), 6))
    
    if len(wash_alerts) == 1:
        axes = [axes]
    
    for idx, (_, alert) in enumerate(wash_alerts.iterrows()):
        traders = alert['traders'].split(', ')
        subG = G.subgraph(traders)
        
        pos = nx.circular_layout(subG)
        
        # Get edge weights for this subgraph
        edge_weights = [subG[u][v]['weight'] for u, v in subG.edges()]
        
        nx.draw_networkx_nodes(subG, pos, node_size=800, 
                             node_color='red', alpha=0.7, ax=axes[idx])
        nx.draw_networkx_labels(subG, pos, font_size=10, ax=axes[idx])
        nx.draw_networkx_edges(subG, pos, width=2, alpha=0.6, 
                             edge_color='darkred',
                             arrows=True, arrowsize=20, 
                             arrowstyle='->', ax=axes[idx])
        
        # Add edge labels (transaction counts)
        edge_labels = {(u, v): f"{subG[u][v]['weight']}" for u, v in subG.edges()}
        nx.draw_networkx_edge_labels(subG, pos, edge_labels, ax=axes[idx])
        
        axes[idx].set_title(f"Alert {idx+1}: {alert['n_traders']} traders\n"
                          f"Insularity: {alert['insularity']:.0%}, "
                          f"{alert['internal_transactions']} transactions",
                          fontsize=11)
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.savefig('wash_trading_subnetworks.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("\n📊 Visualization saved as 'wash_trading_subnetworks.png'")

### Reflection Questions

1. **Network Metrics**: Which network metrics were most effective at identifying wash trading? Why do reciprocity and insularity indicate coordinated trading?

2. **False Positives**: Could legitimate trading patterns resemble wash trading networks? (Consider: market makers trading with each other, broker internalization, algorithmic trading across related accounts). How would you distinguish legitimate from manipulative patterns?

3. **Volume Inflation Impact**: Quantify the market harm from detected wash trading. How does false volume affect other participants' decisions? Should penalties relate to volume inflation magnitude?

4. **Beneficial Ownership**: Real-world detection requires knowing beneficial ownership—multiple accounts controlled by single entity. How would incomplete beneficial ownership data affect detection? What additional data sources would improve accuracy?

---

## Exercise 3: Alert Triage and ML Scoring (FIN720)

### Context

Surveillance systems generate thousands of alerts daily; investigators can review hundreds. Effective triage prioritizes alerts by likelihood of being genuine violations, maximizing detection given limited investigation capacity. Machine learning can predict investigation outcomes from alert characteristics, automating prioritization and improving surveillance efficiency.

This exercise provides simulated alerts from multiple detection systems (spoofing, wash trading, layering, marking the close). Each alert has features describing participant, pattern characteristics, and market context. You'll train ML classifiers predicting investigation outcomes and compare ML triage to baseline prioritization approaches.

### Generating Alert Data with Investigation Outcomes

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

# Generate alert data
np.random.seed(42)

n_alerts = 500
violation_rate = 0.12  # 12% of alerts are actual violations

alerts_data = []

for i in range(n_alerts):
    # Violation determination (ground truth)
    is_violation = np.random.random() < violation_rate
    
    # Alert type
    alert_type = np.random.choice(['spoofing', 'wash_trading', 'layering', 'marking_close'],
                                  p=[0.35, 0.25, 0.25, 0.15])
    
    # Participant characteristics (violations more likely from certain profiles)
    if is_violation:
        participant_size = np.random.choice(['large', 'medium', 'small'], p=[0.2, 0.3, 0.5])
        prior_violations = np.random.choice([0, 1, 2, 3], p=[0.3, 0.4, 0.2, 0.1])
        trader_experience = np.random.choice(['junior', 'mid', 'senior'], p=[0.5, 0.3, 0.2])
    else:
        participant_size = np.random.choice(['large', 'medium', 'small'], p=[0.4, 0.4, 0.2])
        prior_violations = np.random.choice([0, 1, 2, 3], p=[0.7, 0.25, 0.04, 0.01])
        trader_experience = np.random.choice(['junior', 'mid', 'senior'], p=[0.3, 0.4, 0.3])
    
    # Pattern characteristics
    if is_violation:
        pattern_clarity = np.random.uniform(0.6, 0.95)
        n_occurrences = np.random.randint(3, 20)
        profitability = np.random.uniform(5000, 50000)
    else:
        pattern_clarity = np.random.uniform(0.3, 0.7)
        n_occurrences = np.random.randint(1, 5)
        profitability = np.random.uniform(-5000, 10000)
    
    # Market context
    volatility = np.random.uniform(0.01, 0.05)
    liquidity_score = np.random.uniform(0.2, 1.0)
    
    # Rule confidence (violations match rules more closely)
    if is_violation:
        rule_confidence = np.random.uniform(0.7, 0.98)
        n_rules_triggered = np.random.randint(2, 6)
    else:
        rule_confidence = np.random.uniform(0.4, 0.75)
        n_rules_triggered = np.random.randint(1, 3)
    
    alerts_data.append({
        'alert_id': f'A{i:04d}',
        'alert_type': alert_type,
        'participant_size': participant_size,
        'prior_violations': prior_violations,
        'trader_experience': trader_experience,
        'pattern_clarity': pattern_clarity,
        'n_occurrences': n_occurrences,
        'profitability': profitability,
        'volatility': volatility,
        'liquidity_score': liquidity_score,
        'rule_confidence': rule_confidence,
        'n_rules_triggered': n_rules_triggered,
        'is_violation': is_violation  # Ground truth
    })

alerts_df = pd.DataFrame(alerts_data)

print(f"Generated {len(alerts_df)} alerts")
print(f"True violations: {alerts_df['is_violation'].sum()} ({alerts_df['is_violation'].mean():.1%})")
print(f"\nFirst few alerts:")
print(alerts_df[['alert_id', 'alert_type', 'participant_size', 
                 'prior_violations', 'pattern_clarity', 'is_violation']].head(10))

### Task 3.1: Prepare Features and Train ML Models

Encode categorical features and train multiple classifiers:

In [None]:
# Prepare features
def prepare_features(df):
    """
    Encode categorical variables and prepare features.
    """
    df_encoded = df.copy()
    
    # One-hot encode categorical variables
    df_encoded = pd.get_dummies(df_encoded, 
                               columns=['alert_type', 'participant_size', 'trader_experience'],
                               drop_first=False)
    
    # Feature columns
    feature_cols = [col for col in df_encoded.columns 
                   if col not in ['alert_id', 'is_violation']]
    
    X = df_encoded[feature_cols]
    y = df_encoded['is_violation'].astype(int)
    
    return X, y, feature_cols

X, y, feature_cols = prepare_features(alerts_df)

print(f"\nFeatures: {len(feature_cols)}")
print(f"Feature names: {feature_cols}")

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print(f"\nTraining set: {len(X_train)} alerts ({y_train.sum()} violations)")
print(f"Test set: {len(X_test)} alerts ({y_test.sum()} violations)")

# Train multiple models
models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10),
    'Gradient Boosting': GradientBoostingClassifier(n_estimators=100, random_state=42, max_depth=5)
}

model_results = {}

for name, model in models.items():
    print(f"\n{'='*60}")
    print(f"Training {name}...")
    
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    # Metrics
    print(f"\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['Not Violation', 'Violation']))
    
    print(f"\nConfusion Matrix:")
    cm = confusion_matrix(y_test, y_pred)
    print(cm)
    print(f"(TN={cm[0,0]}, FP={cm[0,1]}, FN={cm[1,0]}, TP={cm[1,1]})")
    
    auc = roc_auc_score(y_test, y_pred_proba)
    print(f"\nROC AUC: {auc:.3f}")
    
    model_results[name] = {
        'model': model,
        'y_pred': y_pred,
        'y_pred_proba': y_pred_proba,
        'auc': auc
    }

# Feature importance (Random Forest)
print(f"\n{'='*60}")
print("Feature Importance (Random Forest):")
importances = models['Random Forest'].feature_importances_
feature_importance_df = pd.DataFrame({
    'feature': feature_cols,
    'importance': importances
}).sort_values('importance', ascending=False)

print(feature_importance_df.head(10).to_string(index=False))

### Task 3.2: Compare ML Triage to Baselines

Evaluate ML prioritization against baseline approaches:

In [None]:
# Baseline 1: Random ordering
np.random.seed(42)
baseline_random = np.random.permutation(len(X_test))

# Baseline 2: Rule confidence ordering
alerts_test = alerts_df.iloc[X_test.index].copy()
baseline_rule = alerts_test.sort_values('rule_confidence', ascending=False).index

# ML ordering: Gradient Boosting predicted probability
ml_proba = model_results['Gradient Boosting']['y_pred_proba']
alerts_test_ml = alerts_test.copy()
alerts_test_ml['ml_score'] = ml_proba
ml_ordering = alerts_test_ml.sort_values('ml_score', ascending=False).index

# Evaluate: Cumulative violations detected in top-N alerts
def cumulative_detection(y_true, ordering, top_n_list):
    """
    Calculate cumulative violations detected for different top-N thresholds.
    """
    y_ordered = y_true.iloc[ordering].values
    cumulative = np.cumsum(y_ordered)
    
    results = []
    for top_n in top_n_list:
        if top_n <= len(y_ordered):
            detected = cumulative[top_n - 1]
            detection_rate = detected / y_true.sum()
            results.append(detected)
        else:
            results.append(cumulative[-1])
    
    return results

top_n_list = [10, 25, 50, 100, len(X_test)]

random_detection = cumulative_detection(y_test, baseline_random, top_n_list)
rule_detection = cumulative_detection(y_test, baseline_rule, top_n_list)
ml_detection = cumulative_detection(y_test, ml_ordering, top_n_list)

comparison_df = pd.DataFrame({
    'top_n_alerts': top_n_list,
    'random_triage': random_detection,
    'rule_confidence_triage': rule_detection,
    'ml_triage': ml_detection,
    'total_violations': [y_test.sum()] * len(top_n_list)
})

comparison_df['random_rate'] = comparison_df['random_triage'] / comparison_df['total_violations']
comparison_df['rule_rate'] = comparison_df['rule_confidence_triage'] / comparison_df['total_violations']
comparison_df['ml_rate'] = comparison_df['ml_triage'] / comparison_df['total_violations']

print(f"\n📊 Triage Comparison: Violations Detected in Top-N Alerts")
print(comparison_df[['top_n_alerts', 'random_triage', 'rule_confidence_triage', 
                    'ml_triage', 'total_violations']].to_string(index=False))

print(f"\n📊 Detection Rates:")
print(comparison_df[['top_n_alerts', 'random_rate', 'rule_rate', 'ml_rate']].to_string(index=False))

# Visualize
fig, ax = plt.subplots(figsize=(12, 7))

ax.plot(comparison_df['top_n_alerts'], comparison_df['random_rate'], 
       'o-', label='Random Triage', linewidth=2)
ax.plot(comparison_df['top_n_alerts'], comparison_df['rule_rate'], 
       's-', label='Rule Confidence Triage', linewidth=2)
ax.plot(comparison_df['top_n_alerts'], comparison_df['ml_rate'], 
       '^-', label='ML Triage (Gradient Boosting)', linewidth=2)

# Perfect triage (all violations first)
perfect_detection = [min(n, y_test.sum()) / y_test.sum() for n in top_n_list]
ax.plot(comparison_df['top_n_alerts'], perfect_detection, 
       '--', color='green', label='Perfect Triage', linewidth=2, alpha=0.5)

ax.set_xlabel('Number of Alerts Investigated (Top-N)', fontsize=12)
ax.set_ylabel('Fraction of Violations Detected', fontsize=12)
ax.set_title('Triage Effectiveness: Violations Detected vs Investigation Effort', fontsize=14)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_xlim(0, max(top_n_list))
ax.set_ylim(0, 1.05)

plt.tight_layout()
plt.savefig('triage_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n📊 Visualization saved as 'triage_comparison.png'")

# Calculate improvement
print(f"\n💡 ML Triage Improvement:")
for idx, top_n in enumerate([10, 25, 50]):
    random_rate = comparison_df.iloc[idx]['random_rate']
    rule_rate = comparison_df.iloc[idx]['rule_rate']
    ml_rate = comparison_df.iloc[idx]['ml_rate']
    
    improvement_vs_random = ((ml_rate - random_rate) / random_rate * 100) if random_rate > 0 else 0
    improvement_vs_rule = ((ml_rate - rule_rate) / rule_rate * 100) if rule_rate > 0 else 0
    
    print(f"\nTop-{top_n} alerts investigated:")
    print(f"  ML detection rate: {ml_rate:.1%}")
    print(f"  Improvement vs random: +{improvement_vs_random:.0f}%")
    print(f"  Improvement vs rule: +{improvement_vs_rule:.0f}%")

### Reflection Questions

1. **ML Value Proposition**: Quantify the operational benefit of ML triage. If investigators can review 50 alerts per day, how many more violations does ML detect compared to baselines? How would you present this ROI to management justifying ML investment?

2. **Explainability Requirements**: Regulators may question ML triage—why was specific alert deprioritized potentially missing violation? Using feature importance and SHAP values, how would you explain individual triage decisions? What documentation would support regulatory audit?

3. **Adversarial Adaptation**: If manipulators learn that certain characteristics (participant size, prior violations, pattern clarity) increase triage priority, how might they adapt tactics to score lower? What surveillance enhancements would detect such adaptation?

4. **Cost Asymmetry**: False negatives (missing violations) and false positives (wasting investigator time) have different costs. How would you adjust ML decision thresholds accounting for asymmetric costs? Should thresholds vary by violation type (insider trading vs spoofing)?

---

## Conclusion

This lab applied market surveillance concepts to realistic detection scenarios. You implemented spoofing detection rules, calibrated thresholds managing false positive trade-offs, used graph analytics detecting wash trading networks, and trained ML models improving alert triage efficiency. The exercises demonstrated that effective surveillance requires not just technical detection algorithms but also operational judgment balancing detection sensitivity against investigation capacity and regulatory requirements.

Key insights: (1) Rules-based detection provides explainability essential for regulatory acceptance but requires continuous maintenance and misses novel tactics. (2) Network analytics reveal collusion patterns invisible from individual transactions, though false positives arise from legitimate coordinated trading. (3) Machine learning substantially improves triage efficiency—detecting more violations given limited investigation resources—but regulatory explainability requirements constrain model complexity. (4) Surveillance effectiveness depends on operational context—detection algorithms must be calibrated considering investigator capacity, violation costs, and regulatory expectations, not just technical metrics.

Real-world surveillance systems operate at institutional scale (billions of events daily), integrate data from multiple sources (order books, trades, communications, external intelligence), and support complex investigation workflows (case management, regulatory reporting, evidence documentation). This lab provided simplified scenarios illustrating core concepts; production deployment requires additional engineering (scalability, reliability, security), domain expertise (market microstructure, regulatory requirements), and operational processes (governance, tuning, training).

As Week 11 concludes, reflect on surveillance's role in market integrity—protecting participants from manipulation whilst imposing compliance costs. The challenge is achieving effective deterrence without excessive burden, particularly for smaller firms lacking large compliance teams. RegTech promises automation reducing costs whilst improving effectiveness, but adoption faces barriers from data sensitivity, regulatory conservatism, and integration complexity. The future surveillance landscape will increasingly incorporate AI, blockchain analytics, and cross-border coordination addressing manipulation in fragmented global markets.

**Next steps**: Week 12 synthesizes course material, discusses ethical implications of FinTech innovations, and prepares for assessments. Consider how surveillance concepts apply to assessment work—trading strategies should avoid patterns appearing manipulative; FinTech evaluations should assess regulatory compliance and identify gaps between technological capabilities and legal obligations.

## Additional Resources

**Regulatory guidance:**

- FCA Market Watch newsletters: Current enforcement actions and emerging risks
- ESMA MAR Q&A: Common questions on Market Abuse Regulation
- FINRA Surveillance Reports: US broker-dealer surveillance best practices
- SEC Division of Enforcement: Case studies of prosecuted manipulation

**Technical resources:**

- NetworkX documentation: Graph analytics in Python
- Scikit-learn: Machine learning model documentation
- SHAP: Explainable AI for model interpretability
- Production surveillance vendors: FICO, NICE Actimize, SAS (white papers and case studies)

**Academic papers:**

- Scopino (2015): "The (Questionable) Legality of High-Speed Pinging" - HFT manipulation debate
- Comerton-Forde & Putniņš (2014): "Stock Price Manipulation" - Survey of manipulation research
- Jiang et al. (2018): "Machine Learning in Market Manipulation Detection" - ML applications

**Professional development:**

- CAMS certification: Certified Anti-Money Laundering Specialist
- SCR exam: Securities Industry Essentials and Series 7 (US market knowledge)
- AML/Surveillance conferences: ACAMS, FINRA Annual Conference
