# Fat Sandwich (B91) vs Multi-Hop Arbitrage Classification

## Overview

This notebook demonstrates how to differentiate between Fat Sandwich attacks (B91 Pattern) and Multi-Hop Arbitrage (Cycle Trading) within the Solana pAMM ecosystem.

### Key Dimensions

1. **Presence of Victims**: Fat Sandwich requires 2+ wrapped victims; Multi-Hop does not
2. **Token Path Structure**: Fat Sandwich uses same pair (A→B, B→A); Multi-Hop uses cycles (A→B→C→A)
3. **Pool Routing Logic**: Fat Sandwich uses 1-2 pools; Multi-Hop uses 3+ diverse pools
4. **Timing & Triggers**: Fat Sandwich correlates with Oracle bursts; Multi-Hop with pool imbalances

---

## Section 1: Import Required Libraries and Data

Load necessary libraries and transaction data for Solana pAMM analysis.

In [None]:
import pandas as pd
import numpy as np
from collections import Counter, defaultdict
import warnings
warnings.filterwarnings('ignore')

# Import our enhanced detection functions
import sys
sys.path.insert(0, '/Users/aileen/Downloads/pamm/solana-pamm-analysis/solana-pamm-MEV-binary-monte-analysis')

from improved_fat_sandwich_detection import (
    detect_cycle_routing,
    identify_token_structure,
    analyze_pool_diversity,
    detect_victims_in_cluster,
    classify_mev_attack,
    classify_mev_attacks_batch
)

print("✓ Libraries imported successfully")
print("✓ Enhanced detection functions loaded")

## Section 2: Cluster Transaction Parsing

Parse transaction clusters to extract key details: signer, tokens, pools, timing.

In [None]:
# Example: Load your TRADE data
# Assuming you have a parquet file or can construct a test dataframe

# For demonstration, we'll create a sample cluster with known patterns
# You would typically load: df_trades = pd.read_parquet('path/to/pamm_clean_final.parquet')
# df_trades = df_trades[df_trades['kind'] == 'TRADE']

print("Data Loading Template:")
print("""\ndf_trades = pd.read_parquet('data/pamm_clean_final.parquet')
df_trades = df_trades[df_trades['kind'] == 'TRADE']

Required columns:
  - 'signer': Address of transaction executor
  - 'from_token': Token being sold
  - 'to_token': Token being bought
  - 'ms_time': Millisecond timestamp
  - 'slot': Solana slot number
  - 'validator': Block proposer
  - 'amm_trade': Pool/AMM identifier
  - 'amount_in': Amount of from_token
  - 'amount_out': Amount of to_token
""")

# Create a demonstration cluster showing Fat Sandwich pattern
demo_fs_cluster = pd.DataFrame({
    'signer': ['ATTACKER', 'VICTIM1', 'VICTIM2', 'VICTIM1', 'VICTIM2', 'ATTACKER'],
    'from_token': ['PUMP', 'WSOL', 'WSOL', 'PUMP', 'PUMP', 'PUMP'],
    'to_token': ['WSOL', 'PUMP', 'PUMP', 'WSOL', 'WSOL', 'WSOL'],
    'ms_time': [1000, 1050, 1100, 1150, 1200, 1250],
    'slot': [1000, 1000, 1000, 1000, 1000, 1000],
    'amm_trade': ['RAYDIUM_POOL_1', 'RAYDIUM_POOL_1', 'RAYDIUM_POOL_1', 'RAYDIUM_POOL_1', 'RAYDIUM_POOL_1', 'RAYDIUM_POOL_1'],
    'validator': ['VALIDATOR_A'] * 6
})

# Create a demonstration cluster showing Multi-Hop Arbitrage pattern
demo_mh_cluster = pd.DataFrame({
    'signer': ['BOT'] * 5,
    'from_token': ['SOL', 'TOKEN_A', 'TOKEN_B', 'TOKEN_C', 'TOKEN_A'],
    'to_token': ['TOKEN_A', 'TOKEN_B', 'TOKEN_C', 'TOKEN_A', 'SOL'],
    'ms_time': [2000, 2050, 2100, 2150, 2200],
    'slot': [2000, 2000, 2000, 2000, 2000],
    'amm_trade': ['ORCA_POOL_1', 'ORCA_POOL_2', 'MARINADE_POOL', 'ORCA_POOL_3', 'ORCA_POOL_1'],
    'validator': ['VALIDATOR_B'] * 5
})

print("\n" + "="*80)
print("DEMO CLUSTER 1: Fat Sandwich Pattern (A-B-C-...-A with victims)")
print("="*80)
print(demo_fs_cluster)

print("\n" + "="*80)
print("DEMO CLUSTER 2: Multi-Hop Arbitrage Pattern (Cycle trading)")
print("="*80)
print(demo_mh_cluster)

## Section 3: Victim Detection Logic

Identify wrapped victims between attacker's front-run and back-run transactions. At least 2 victims are required for a Fat Sandwich.

In [None]:
def analyze_cluster_victims(cluster_df, attacker_signer):
    """
    Detailed victim analysis for a cluster.
    """
    print(f"Analyzing victims for attacker: {attacker_signer}")
    print()
    
    # Use our built-in function
    victim_info = detect_victims_in_cluster(cluster_df, attacker_signer)
    
    print(f"Victim Count: {victim_info['victim_count']}")
    print(f"Victim Signers: {victim_info['victim_signers']}")
    print(f"Victim Ratio: {victim_info['victim_ratio']:.2%}")
    print(f"Has Mandatory Victims (≥2): {victim_info['has_mandatory_victims']}")
    print()
    
    # Show victim trades
    attacker_trades = cluster_df[cluster_df['signer'] == attacker_signer].sort_values('ms_time')
    if len(attacker_trades) >= 2:
        first_idx = cluster_df[cluster_df['signer'] == attacker_signer].index[0]
        last_idx = cluster_df[cluster_df['signer'] == attacker_signer].index[-1]
        
        between_trades = cluster_df[(cluster_df.index > first_idx) & (cluster_df.index < last_idx)]
        print("Trades between attacker's front-run and back-run:")
        print(between_trades[['signer', 'from_token', 'to_token']])
    print()
    return victim_info

# Test on both clusters
print("="*80)
print("VICTIM ANALYSIS: Fat Sandwich Cluster")
print("="*80)
victim_fs = analyze_cluster_victims(demo_fs_cluster, 'ATTACKER')

print("\n" + "="*80)
print("VICTIM ANALYSIS: Multi-Hop Arbitrage Cluster")
print("="*80)
victim_mh = analyze_cluster_victims(demo_mh_cluster, 'BOT')

## Section 4: Token Path Structure Analysis

Analyze the sequence of from_token and to_token to distinguish between:
- **Same Pair** (Fat Sandwich): A→B, then B→A
- **Cycle** (Multi-Hop): A→B→C→A

In [None]:
def analyze_token_paths(cluster_df, signer):
    """
    Detailed token path analysis for a signer.
    """
    print(f"Token Path Analysis for: {signer}")
    print()
    
    # Get token structure
    token_struct = identify_token_structure(cluster_df, signer)
    
    print(f"Unique Token Pairs: {token_struct['unique_token_pairs']}")
    print(f"Token Pairs Used: {token_struct['token_pairs']}")
    print(f"Same Pair Throughout: {token_struct['is_same_pair_throughout']}")
    print(f"Pair Consistency: {token_struct['pair_consistency']:.2%}")
    print(f"Pattern Type: {token_struct['pattern_type'].upper()}")
    print()
    
    # Show the path
    signer_trades = cluster_df[cluster_df['signer'] == signer].sort_values('ms_time')
    print("Token Flow Path:")
    path = []
    for idx, trade in signer_trades.iterrows():
        if len(path) == 0:
            path.append(trade['from_token'])
        path.append(trade['to_token'])
    
    path_str = " → ".join(path)
    print(f"  {path_str}")
    
    # Determine if it's a recognized cycle
    if len(path) > 0 and path[0] == path[-1]:
        print(f"  ✓ Cycle detected: Returns to starting token")
    elif signer_trades.iloc[-1]['to_token'] in ['SOL', 'USDC']:
        print(f"  ✓ Cycle detected: Returns to base asset")
    else:
        print(f"  ✗ Not a true cycle")
    print()
    return token_struct

print("="*80)
print("TOKEN PATH ANALYSIS: Fat Sandwich")
print("="*80)
token_fs = analyze_token_paths(demo_fs_cluster, 'ATTACKER')

print("\n" + "="*80)
print("TOKEN PATH ANALYSIS: Multi-Hop Arbitrage")
print("="*80)
token_mh = analyze_token_paths(demo_mh_cluster, 'BOT')

## Section 5: Pool Routing and Signer Diversity Analysis

Count unique pools and analyze routing patterns:
- **Fat Sandwich**: 1-2 pools targeting same pair
- **Multi-Hop Arbitrage**: 3+ pools with different pairs

In [None]:
def analyze_pool_routing(cluster_df, signer):
    """
    Detailed pool routing analysis for a signer.
    """
    print(f"Pool Routing Analysis for: {signer}")
    print()
    
    # Get pool diversity
    pool_info = analyze_pool_diversity(cluster_df, signer)
    
    print(f"Unique Pools Used: {pool_info['unique_pools']}")
    print(f"Pools: {pool_info['pools']}")
    print(f"Avg Pools per Pair: {pool_info['avg_pools_per_pair']:.2f}")
    print(f"Pool Diversity Score: {pool_info['pool_diversity_score']:.2f}")
    print(f"Likely Attack Type: {pool_info['likely_attack_type'].upper()}")
    print()
    
    # Interpretation
    if pool_info['unique_pools'] <= 2:
        print("  ➜ Low pool count (≤2) suggests Fat Sandwich")
        print("    Focus: Extract slippage from victims within same pair")
    else:
        print(f"  ➜ High pool count ({pool_info['unique_pools']}) suggests Multi-Hop")
        print("    Focus: Route through multiple pools to exploit imbalances")
    print()
    return pool_info

print("="*80)
print("POOL ROUTING ANALYSIS: Fat Sandwich")
print("="*80)
pool_fs = analyze_pool_routing(demo_fs_cluster, 'ATTACKER')

print("\n" + "="*80)
print("POOL ROUTING ANALYSIS: Multi-Hop Arbitrage")
print("="*80)
pool_mh = analyze_pool_routing(demo_mh_cluster, 'BOT')

## Section 6: Timing and Trigger Signal Analysis

Analyze temporal patterns and correlations:
- **Fat Sandwich**: Highly correlated with Oracle bursts (99.8%)
- **Multi-Hop**: Triggered by pool imbalances rather than Oracle signals

In [None]:
def analyze_timing_triggers(cluster_df, signer):
    """
    Analyze timing patterns and potential triggers.
    """
    print(f"Timing Analysis for: {signer}")
    print()
    
    signer_trades = cluster_df[cluster_df['signer'] == signer].sort_values('ms_time')
    
    if len(signer_trades) < 2:
        print("  Insufficient trades for timing analysis")
        return None
    
    # Calculate timing metrics
    times = signer_trades['ms_time'].values
    time_diffs = np.diff(times)
    
    avg_gap = time_diffs.mean()
    max_gap = time_diffs.max()
    min_gap = time_diffs.min()
    total_span = times[-1] - times[0]
    
    print(f"Total Time Span: {total_span}ms ({total_span/1000:.3f}s)")
    print(f"Average Gap Between Trades: {avg_gap:.1f}ms")
    print(f"Max Gap: {max_gap:.1f}ms")
    print(f"Min Gap: {min_gap:.1f}ms")
    print()
    
    # Trigger analysis
    if total_span < 50:
        print("  ➜ Sub-50ms execution: Strong Fat Sandwich indicator")
        print("    Tight back-running window suggests victim exploitation")
    else:
        print(f"  ➜ {total_span}ms execution: Could be either type")
    
    if max_gap > 100:
        print(f"  ➜ Large gap detected ({max_gap}ms): May indicate multi-hop routing")
    
    # Oracle burst check (example)
    print()
    print("Oracle Burst Correlation:")
    print("  (In production, check if Oracle was updated in this slot)")
    print("  - Fat Sandwich: 99.8% follow Oracle bursts")
    print("  - Multi-Hop: ~50% follow Oracle updates")
    print()
    return {'avg_gap': avg_gap, 'total_span': total_span, 'max_gap': max_gap}

print("="*80)
print("TIMING ANALYSIS: Fat Sandwich")
print("="*80)
timing_fs = analyze_timing_triggers(demo_fs_cluster, 'ATTACKER')

print("\n" + "="*80)
print("TIMING ANALYSIS: Multi-Hop Arbitrage")
print("="*80)
timing_mh = analyze_timing_triggers(demo_mh_cluster, 'BOT')

## Section 7: Cycle Routing Detection Function

Implement `detect_cycle_routing()` to verify if a signer's net token balance (excluding starting token) is zero, confirming Multi-Hop Arbitrage.

In [None]:
def detailed_cycle_analysis(cluster_df, signer):
    """
    Detailed cycle routing analysis.
    """
    print(f"Cycle Routing Analysis for: {signer}")
    print()
    
    # Use our built-in cycle detection
    cycle_info = detect_cycle_routing(cluster_df, signer)
    
    print(f"Is Cycle: {cycle_info['is_cycle']}")
    print(f"Cycle Confidence: {cycle_info['confidence']:.2%}")
    print(f"Cycle Length: {cycle_info['cycle_length']} hops")
    print()
    
    print(f"Path: {' → '.join(cycle_info['cycle_path'])}")
    print(f"Starting Token: {cycle_info['starting_token']}")
    print(f"Ending Token: {cycle_info['ending_token']}")
    print()
    
    print("Net Balance Change (by token):")
    for token, change in cycle_info['net_balance_change'].items():
        status = "✓" if change == 0 else "✗"
        print(f"  {status} {token:20s}: {change:+3d}")
    print()
    
    # Verification
    total_net = sum(cycle_info['net_balance_change'].values())
    if total_net == 0:
        print("  ✓ Perfect cycle: Net balance is zero")
        print("  → Strong Multi-Hop Arbitrage indicator")
    else:
        print(f"  ✗ Not a perfect cycle: Net balance ≠ 0 ({total_net:+d})")
        print("  → Suggests Fat Sandwich or incomplete data")
    print()
    return cycle_info

print("="*80)
print("CYCLE DETECTION: Fat Sandwich")
print("="*80)
cycle_fs = detailed_cycle_analysis(demo_fs_cluster, 'ATTACKER')

print("\n" + "="*80)
print("CYCLE DETECTION: Multi-Hop Arbitrage")
print("="*80)
cycle_mh = detailed_cycle_analysis(demo_mh_cluster, 'BOT')

## Section 8: Summary Comparison Table Generation

Generate comprehensive comparison of both clusters based on all analysis dimensions.

In [None]:
def create_comparison_table():
    """
    Create side-by-side comparison of Fat Sandwich vs Multi-Hop characteristics.
    """
    comparison_data = {
        'Feature': [
            'Intermediate Victims',
            'Token Structure',
            'Token Pair Pattern',
            'Pool Count',
            'Pool Protocol Diversity',
            'Primary Goal',
            'Timing (ms)',
            'Oracle Burst Correlation',
            'Trigger Type',
            'Back-Run Window',
            'Net Balance Zero',
            'Cycle Path',
        ],
        'Fat Sandwich (B91)': [
            'Mandatory (≥2)',
            'A-B-C-...-A (same signer)',
            'Same pair (A→B, B→A)',
            '1-2 pools',
            'Low (same pair across different protocols)',
            'Extract victim slippage',
            '<50ms (aggressive back-run)',
            '99.8% (DeezNode style)',
            'Oracle burst signal',
            '<50ms window required',
            'No (attacker profits)',
            'Not applicable',
        ],
        'Multi-Hop Arbitrage (Cycle)': [
            'None required',
            'A→B→C→A (no victims)',
            'Different pairs in cycle',
            '3+ pools',
            'High (many protocols/pairs)',
            'Exploit pool imbalances',
            'Variable (not time-critical)',
            '~50% (optional)',
            'Pool imbalance signal',
            'Not time-dependent',
            'Yes (arbitrage closure)',
            'Returns to starting token',
        ]
    }
    
    comparison_df = pd.DataFrame(comparison_data)
    return comparison_df

comparison_table = create_comparison_table()
print("\n" + "="*120)
print("COMPREHENSIVE FEATURE COMPARISON")
print("="*120)
print(comparison_table.to_string(index=False))
print("\n")

## Integrated Classification Demo

Use the comprehensive `classify_mev_attack()` function to automatically classify both clusters.

In [None]:
# Classify Fat Sandwich cluster
print("\n" + "="*80)
print("INTEGRATED CLASSIFICATION: Fat Sandwich")
print("="*80)
print()

classification_fs = classify_mev_attack(
    demo_fs_cluster,
    'ATTACKER',
    oracle_burst_in_slot=True,  # Simulating oracle burst
    verbose=True
)

print("\n" + "="*80)
print("INTEGRATED CLASSIFICATION: Multi-Hop Arbitrage")
print("="*80)
print()

classification_mh = classify_mev_attack(
    demo_mh_cluster,
    'BOT',
    oracle_burst_in_slot=False,  # No oracle burst
    verbose=True
)

## Production Usage: Batch Classification

In production, use `classify_mev_attacks_batch()` to classify all detected attacks at once.

In [None]:
print("\n" + "="*80)
print("BATCH CLASSIFICATION WORKFLOW")
print("="*80)
print()
print("""
## Production Workflow:

# Step 1: Load and detect fat sandwiches
from improved_fat_sandwich_detection import detect_fat_sandwich_time_window

df_trades = pd.read_parquet('data/pamm_clean_final.parquet')
df_trades = df_trades[df_trades['kind'] == 'TRADE']

detected_attacks, stats = detect_fat_sandwich_time_window(
    df_trades,
    window_seconds=[1, 2, 5, 10],
    min_trades=5,
    max_victim_ratio=0.8,
    verbose=True
)

# Step 2: Classify all detected attacks
from improved_fat_sandwich_detection import classify_mev_attacks_batch

classified_attacks = classify_mev_attacks_batch(
    df_all_trades=df_trades,
    detected_attacks_df=detected_attacks,
    verbose=False,
    show_progress=True
)

# Step 3: Analyze results by type
fat_sandwiches = classified_attacks[classified_attacks['attack_type'] == 'fat_sandwich']
multi_hops = classified_attacks[classified_attacks['attack_type'] == 'multi_hop_arbitrage']
ambiguous = classified_attacks[classified_attacks['attack_type'] == 'ambiguous']

print(f"Fat Sandwiches: {len(fat_sandwiches)}")
print(f"Multi-Hop Arbitrage: {len(multi_hops)}")
print(f"Ambiguous: {len(ambiguous)}")

# Step 4: Further analysis
print("\nTop Fat Sandwich Attackers:")
print(fat_sandwiches['attacker_signer'].value_counts().head(10))

print("\nMulti-Hop Arbitrage Statistics:")
print(multi_hops[['unique_pools_used', 'classification_confidence']].describe())
""")

print("\n✓ Batch classification workflow ready for production")

## Key Insights and Decision Rules

### Primary Differentiator: Wrapped Victims
- **Fat Sandwich REQUIRES** at least 2 victim trades between front-run and back-run
- **Multi-Hop NEVER** wraps victims—it's pure arbitrage
- If no victims found → Strong indicator of Multi-Hop

### Token Path Validation
- **Fat Sandwich**: Same token pair throughout (e.g., PUMP/WSOL only)
- **Multi-Hop**: Cycle of different pairs (e.g., SOL→TokenA→TokenB→SOL)
- Check if `to_token[N] == from_token[N+1]`

### Pool Diversity Heuristic
- **Fat Sandwich**: 1-2 pools (focused attack on single pair)
- **Multi-Hop**: 3+ pools (distributed routing across protocols)
- If unique_pools ≥ 3 AND token_pairs ≥ 3 → Likely Multi-Hop

### Timing and Trigger Signals
- **Fat Sandwich**: 99.8% correlation with Oracle bursts (DeezNode style)
- **Multi-Hop**: Triggered by pool imbalances, not time-critical
- Check if Oracle was updated in same slot

### Net Balance Check (Ultimate Validator)
- If signer's net token balance across all non-starting tokens = 0 → Definitive Multi-Hop
- If attacker has positive balance in tokens → Fat Sandwich (extracted profit)

---

## Classification Confidence Model

**Fat Sandwich Score** (0-1):
- Wrapped victims present: +0.35
- Same token pair: +0.25
- Low pool diversity: +0.20
- Oracle burst detected: +0.20

**Multi-Hop Score** (0-1):
- Cycle routing pattern: +0.35
- Multiple token pairs: +0.25
- High pool diversity: +0.20
- No wrapped victims: +0.20

**Final Classification**:
- If |FS_score - MH_score| > 0.15 → Confident classification
- Otherwise → Ambiguous (manual review recommended)

## Summary

This notebook demonstrates the complete framework for differentiating Fat Sandwich attacks (B91 Pattern) from Multi-Hop Arbitrage (Cycle Trading) in Solana pAMM.

### Methods Implemented:
1. ✓ **Victim Detection**: Identify wrapped victims between front-run and back-run
2. ✓ **Token Path Analysis**: Distinguish same-pair vs cyclic routing
3. ✓ **Pool Routing Analysis**: Measure pool diversity and protocol distribution
4. ✓ **Cycle Detection**: Validate net balance = 0 for arbitrage
5. ✓ **Timing Analysis**: Correlate with Oracle bursts and pool imbalances
6. ✓ **Classification Scoring**: Weighted multi-factor confidence model
7. ✓ **Batch Processing**: Scale to full attack datasets

### Use Cases:
- **Regulatory**: Identify sandwich attacks targeting retail users
- **Risk Analysis**: Understand arbitrage-induced pool volatility
- **MEV Quantification**: Separate victim extraction vs pure arbitrage profits
- **Protocol Optimization**: Design defenses against specific attack types