# Microstructure Mean Reversion Analysis

**Objective**: Identify and exploit mean reversion patterns in meme coin microstructure for short-term trading alpha.

**Key Concepts**:
- **Price Impact Decay**: How quickly prices revert after large trades
- **Order Flow Mean Reversion**: Imbalances that correct themselves
- **Volume Intensity Reversion**: Trading activity returning to baseline
- **Bid-Ask Spread Dynamics**: Market making opportunities
- **Tick-by-Tick Reversals**: Sub-second trading opportunities

**Trading Strategy Applications**:
- **Contrarian Entry**: Buy after price drops, sell after price spikes
- **Market Making**: Provide liquidity during temporary imbalances
- **Scalping**: Exploit very short-term reversals
- **Optimal Execution**: Time large orders to minimize impact

**Data Advantages**:
- High-frequency transaction data with microsecond precision
- Implicit price calculation from SOL/token ratios
- Block-level transaction ordering
- Order flow imbalance measures
- Large trade identification


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime, timedelta
import warnings
from scipy import stats
from scipy.signal import find_peaks
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.rolling import RollingOLS
import warnings

warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (15, 10)
%matplotlib inline

print("=== MICROSTRUCTURE MEAN REVERSION ANALYSIS ===")
print("Objective: Identify and exploit mean reversion patterns for short-term alpha")
print("Approach: Price impact → Order flow reversion → Volume normalization → Trading signals")
print()

# Constants
SOL_MINT = 'So11111111111111111111111111111111111111112'
DATA_PATH = Path('../data/solana/first_day_trades/first_day_trades_batch_578.csv')

# Load data with enhanced processing
print("Loading data with microstructure enhancements...")
df = pd.read_csv(DATA_PATH)
df['block_timestamp'] = pd.to_datetime(df['block_timestamp'])

# Recreate coin mapping and enhanced indicators
unique_mints = df['mint'].unique()
coin_names = {mint: f"Coin_{i}" for i, mint in enumerate(unique_mints, 1)}
df['coin_name'] = df['mint'].map(coin_names)

# Add trading direction and amounts
df['is_buy'] = df['mint'] == df['swap_to_mint']
df['is_sell'] = df['mint'] == df['swap_from_mint']
df['sol_amount'] = 0.0
df['token_amount'] = 0.0

buy_mask = df['is_buy'] & (df['swap_from_mint'] == SOL_MINT)
sell_mask = df['is_sell'] & (df['swap_to_mint'] == SOL_MINT)
df.loc[buy_mask, 'sol_amount'] = df.loc[buy_mask, 'swap_from_amount']
df.loc[buy_mask, 'token_amount'] = df.loc[buy_mask, 'swap_to_amount']
df.loc[sell_mask, 'sol_amount'] = df.loc[sell_mask, 'swap_to_amount']
df.loc[sell_mask, 'token_amount'] = df.loc[sell_mask, 'swap_from_amount']

# Calculate implicit prices
df['implicit_price'] = np.where(
    (df['token_amount'] > 0) & (df['sol_amount'] > 0),
    df['sol_amount'] / df['token_amount'],
    np.nan
)

# Add transaction size categories
df['txn_size_category'] = 'Unknown'
valid_sol = df['sol_amount'] > 0
df.loc[valid_sol & (df['sol_amount'] >= 100), 'txn_size_category'] = 'Whale'
df.loc[valid_sol & (df['sol_amount'] >= 10) & (df['sol_amount'] < 100), 'txn_size_category'] = 'Big'
df.loc[valid_sol & (df['sol_amount'] >= 1) & (df['sol_amount'] < 10), 'txn_size_category'] = 'Medium'
df.loc[valid_sol & (df['sol_amount'] > 0) & (df['sol_amount'] < 1), 'txn_size_category'] = 'Small'

print(f"Data loaded: {len(df):,} transactions across {len(unique_mints)} coins")
print(f"Valid prices: {df['implicit_price'].notna().sum():,} ({df['implicit_price'].notna().mean():.1%})")


=== MICROSTRUCTURE MEAN REVERSION ANALYSIS ===
Objective: Identify and exploit mean reversion patterns for short-term alpha
Approach: Price impact → Order flow reversion → Volume normalization → Trading signals

Loading data with microstructure enhancements...
Data loaded: 1,030,491 transactions across 10 coins
Valid prices: 1,030,491 (100.0%)


In [3]:
# 1. PRICE IMPACT AND REVERSION ANALYSIS
print("\n=== 1. PRICE IMPACT AND REVERSION ANALYSIS ===")

def analyze_price_impact_decay(coin_data, impact_thresholds=[0.01, 0.02, 0.05], 
                              reversion_windows=[10, 30, 60, 120, 300]):
    """
    Analyze how quickly prices revert after significant moves
    
    Args:
        coin_data: DataFrame with coin transactions
        impact_thresholds: Price change thresholds to identify impacts
        reversion_windows: Time windows (seconds) to measure reversion
    
    Returns:
        DataFrame with impact events and reversion analysis
    """
    
    coin_data = coin_data.sort_values('block_timestamp').copy()
    coin_data = coin_data[coin_data['implicit_price'].notna()].copy()
    
    if len(coin_data) < 100:
        return None
    
    # Calculate price changes
    coin_data['price_change'] = coin_data['implicit_price'].pct_change()
    coin_data['abs_price_change'] = coin_data['price_change'].abs()
    
    # Calculate rolling statistics for mean reversion baseline
    coin_data['price_ma_10'] = coin_data['implicit_price'].rolling(window=10, min_periods=1).mean()
    coin_data['price_ma_30'] = coin_data['implicit_price'].rolling(window=30, min_periods=1).mean()
    coin_data['price_volatility'] = coin_data['price_change'].rolling(window=20, min_periods=1).std()
    
    impact_events = []
    
    for threshold in impact_thresholds:
        # Identify significant price impact events
        impact_mask = coin_data['abs_price_change'] >= threshold
        impact_indices = coin_data[impact_mask].index
        
        print(f"  Threshold {threshold:.1%}: {len(impact_indices)} impact events")
        
        for idx in impact_indices:
            if idx not in coin_data.index:
                continue
                
            impact_row = coin_data.loc[idx]
            impact_time = impact_row['block_timestamp']
            impact_price = impact_row['implicit_price']
            impact_direction = 'up' if impact_row['price_change'] > 0 else 'down'
            
            # Measure reversion over different time windows
            reversion_data = {
                'impact_index': idx,
                'impact_time': impact_time,
                'impact_price': impact_price,
                'impact_magnitude': impact_row['price_change'],
                'impact_direction': impact_direction,
                'threshold': threshold,
                'volume': impact_row['sol_amount'],
                'is_buy': impact_row['is_buy'],
                'txn_size_category': impact_row['txn_size_category']
            }
            
            # Measure reversion over time windows
            for window_seconds in reversion_windows:
                future_time = impact_time + pd.Timedelta(seconds=window_seconds)
                
                # Find transactions in the reversion window
                future_mask = (coin_data['block_timestamp'] >= impact_time) & \
                             (coin_data['block_timestamp'] <= future_time)
                future_data = coin_data[future_mask]
                
                if len(future_data) == 0:
                    continue
                
                # Calculate reversion metrics
                end_price = future_data['implicit_price'].iloc[-1]
                max_price = future_data['implicit_price'].max()
                min_price = future_data['implicit_price'].min()
                
                price_reversion = (end_price - impact_price) / impact_price
                max_favorable_move = (max_price - impact_price) / impact_price if impact_direction == 'down' else (impact_price - min_price) / impact_price
                
                # Mean reversion strength
                if impact_direction == 'up':
                    mean_reversion_strength = (impact_price - end_price) / impact_price
                else:
                    mean_reversion_strength = (end_price - impact_price) / impact_price
                
                reversion_data[f'reversion_{window_seconds}s'] = price_reversion
                reversion_data[f'max_favorable_{window_seconds}s'] = max_favorable_move
                reversion_data[f'mean_reversion_{window_seconds}s'] = mean_reversion_strength
                reversion_data[f'transactions_{window_seconds}s'] = len(future_data)
                reversion_data[f'volume_{window_seconds}s'] = future_data['sol_amount'].sum()
            
            impact_events.append(reversion_data)
    
    return pd.DataFrame(impact_events)

# Analyze price impact for all coins
print("Analyzing price impact and reversion patterns...")

impact_results = {}
for coin_name in sorted(df['coin_name'].unique()):  # Analyze all coins
    coin_data = df[df['coin_name'] == coin_name].copy()
    
    if len(coin_data) < 500:  # Lower threshold to include more coins
        print(f"  Skipping {coin_name}: only {len(coin_data)} transactions")
        continue
        
    print(f"\nAnalyzing {coin_name}...")
    impact_analysis = analyze_price_impact_decay(coin_data)
    
    if impact_analysis is not None and len(impact_analysis) > 0:
        impact_results[coin_name] = impact_analysis
        
        # Quick summary
        print(f"  Total impact events: {len(impact_analysis)}")
        print(f"  Mean reversion 60s: {impact_analysis['mean_reversion_60s'].mean():.3f}")
        print(f"  Mean reversion 300s: {impact_analysis['mean_reversion_300s'].mean():.3f}")

# Combine results for cross-coin analysis
if impact_results:
    all_impacts = pd.concat([df.assign(coin=coin) for coin, df in impact_results.items()], 
                           ignore_index=True)
    
    print(f"\n=== COMBINED IMPACT ANALYSIS ===")
    print(f"Total impact events across coins: {len(all_impacts)}")
    
    # Mean reversion statistics
    reversion_cols = [col for col in all_impacts.columns if 'mean_reversion_' in col]
    print(f"\nMean reversion strength by time window:")
    for col in reversion_cols:
        window = col.split('_')[-1]
        mean_reversion = all_impacts[col].mean()
        print(f"  {window}: {mean_reversion:.3f}")



=== 1. PRICE IMPACT AND REVERSION ANALYSIS ===
Analyzing price impact and reversion patterns...

Analyzing Coin_1...
  Threshold 1.0%: 1719 impact events
  Threshold 2.0%: 1219 impact events
  Threshold 5.0%: 218 impact events
  Total impact events: 3156
  Mean reversion 60s: 0.015
  Mean reversion 300s: 0.011

Analyzing Coin_10...
  Threshold 1.0%: 17699 impact events
  Threshold 2.0%: 7963 impact events
  Threshold 5.0%: 1969 impact events
  Total impact events: 27631
  Mean reversion 60s: 0.237
  Mean reversion 300s: 0.242

Analyzing Coin_2...
  Threshold 1.0%: 47198 impact events
  Threshold 2.0%: 21149 impact events
  Threshold 5.0%: 2614 impact events
  Total impact events: 70961
  Mean reversion 60s: 0.006
  Mean reversion 300s: 0.008

Analyzing Coin_3...
  Threshold 1.0%: 4453 impact events
  Threshold 2.0%: 1695 impact events
  Threshold 5.0%: 268 impact events
  Total impact events: 6416
  Mean reversion 60s: 0.011
  Mean reversion 300s: 0.012

Analyzing Coin_4...
  Threshol

In [4]:
# 2. MEAN REVERSION TRADING SIGNALS
print("\n=== 2. MEAN REVERSION TRADING SIGNALS ===")

class MeanReversionTradingSystem:
    """
    Production-ready mean reversion trading system
    """
    
    def __init__(self, price_threshold=0.02, flow_threshold=0.7, 
                 volume_percentile=90, min_signal_strength=1.0):
        self.price_threshold = price_threshold
        self.flow_threshold = flow_threshold
        self.volume_percentile = volume_percentile
        self.min_signal_strength = min_signal_strength
        self.sol_mint = 'So11111111111111111111111111111111111111112'
        
    def prepare_coin_data(self, raw_data):
        """Prepare coin data with all required indicators"""
        df = raw_data.copy()
        df['block_timestamp'] = pd.to_datetime(df['block_timestamp'])
        df = df.sort_values('block_timestamp')
        
        # Add trading indicators
        df['is_buy'] = df['mint'] == df['swap_to_mint']
        df['is_sell'] = df['mint'] == df['swap_from_mint']
        df['sol_amount'] = 0.0
        df['token_amount'] = 0.0
        
        buy_mask = df['is_buy'] & (df['swap_from_mint'] == self.sol_mint)
        sell_mask = df['is_sell'] & (df['swap_to_mint'] == self.sol_mint)
        df.loc[buy_mask, 'sol_amount'] = df.loc[buy_mask, 'swap_from_amount']
        df.loc[buy_mask, 'token_amount'] = df.loc[buy_mask, 'swap_to_amount']
        df.loc[sell_mask, 'sol_amount'] = df.loc[sell_mask, 'swap_to_amount']
        df.loc[sell_mask, 'token_amount'] = df.loc[sell_mask, 'swap_from_amount']
        
        # Calculate implicit prices
        df['implicit_price'] = np.where(
            (df['token_amount'] > 0) & (df['sol_amount'] > 0),
            df['sol_amount'] / df['token_amount'],
            np.nan
        )
        
        # Add transaction sizes
        df['txn_size_category'] = 'Small'
        df.loc[df['sol_amount'] >= 100, 'txn_size_category'] = 'Whale'
        df.loc[(df['sol_amount'] >= 10) & (df['sol_amount'] < 100), 'txn_size_category'] = 'Big'
        df.loc[(df['sol_amount'] >= 1) & (df['sol_amount'] < 10), 'txn_size_category'] = 'Medium'
        
        # Calculate technical indicators
        df['price_change'] = df['implicit_price'].pct_change()
        df['price_ma_20'] = df['implicit_price'].rolling(window=20, min_periods=1).mean()
        df['price_deviation'] = (df['implicit_price'] - df['price_ma_20']) / df['price_ma_20']
        df['volume_intensity'] = df['sol_amount'].rolling(window=20, min_periods=1).sum() / 20
        
        return df
    
    def detect_mean_reversion_opportunity(self, coin_data, current_index):
        """
        Detect mean reversion trading opportunity at current index
        
        Returns:
            dict: {'signal': 'BUY'/'SELL'/None, 'strength': float, 'reasons': list}
        """
        
        if current_index < 50:  # Need sufficient history
            return {'signal': None, 'strength': 0, 'reasons': []}
        
        current_row = coin_data.iloc[current_index]
        
        signal_type = None
        signal_strength = 0
        signal_reasons = []
        
        # 1. Price deviation signal
        if abs(current_row['price_deviation']) >= self.price_threshold:
            if current_row['price_deviation'] > 0:
                signal_type = 'SELL'
                signal_reasons.append(f"Price {current_row['price_deviation']:.2%} above MA")
            else:
                signal_type = 'BUY'
                signal_reasons.append(f"Price {abs(current_row['price_deviation']):.2%} below MA")
            signal_strength += abs(current_row['price_deviation']) * 10
        
        # 2. Order flow imbalance signal
        recent_data = coin_data.iloc[max(0, current_index-30):current_index+1]
        if len(recent_data) >= 10:
            buy_ratio = recent_data['is_buy'].mean()
            
            if buy_ratio >= self.flow_threshold:
                if signal_type == 'SELL' or signal_type is None:
                    if signal_type is None:
                        signal_type = 'SELL'
                    signal_strength += 1
                    signal_reasons.append(f"Extreme buying: {buy_ratio:.1%}")
            elif buy_ratio <= (1 - self.flow_threshold):
                if signal_type == 'BUY' or signal_type is None:
                    if signal_type is None:
                        signal_type = 'BUY'
                    signal_strength += 1
                    signal_reasons.append(f"Extreme selling: {buy_ratio:.1%}")
        
        # 3. Volume intensity boost
        volume_threshold = coin_data['volume_intensity'].quantile(self.volume_percentile/100)
        if current_row['volume_intensity'] >= volume_threshold:
            signal_strength += 0.5
            signal_reasons.append("High volume intensity")
        
        # 4. Large trade boost
        if current_row['txn_size_category'] in ['Big', 'Whale']:
            signal_strength += 0.5
            signal_reasons.append(f"Large trade: {current_row['txn_size_category']}")
        
        # Only return signal if strength threshold met
        if signal_strength >= self.min_signal_strength and signal_type is not None:
            return {
                'signal': signal_type,
                'strength': signal_strength,
                'reasons': signal_reasons,
                'entry_price': current_row['implicit_price'],
                'timestamp': current_row['block_timestamp']
            }
        else:
            return {'signal': None, 'strength': signal_strength, 'reasons': signal_reasons}
    
    def backtest_strategy(self, coin_data, holding_period=60):
        """
        Backtest the mean reversion strategy
        """
        
        coin_data = self.prepare_coin_data(coin_data)
        coin_data = coin_data[coin_data['implicit_price'].notna()].copy()
        
        if len(coin_data) < 100:
            return None
        
        trades = []
        
        for i in range(50, len(coin_data) - holding_period):
            signal = self.detect_mean_reversion_opportunity(coin_data, i)
            
            if signal['signal'] is not None:
                entry_row = coin_data.iloc[i]
                entry_price = signal['entry_price']
                entry_time = signal['timestamp']
                
                # Find exit price after holding period
                exit_time = entry_time + pd.Timedelta(seconds=holding_period)
                exit_mask = coin_data['block_timestamp'] >= exit_time
                exit_candidates = coin_data[exit_mask]
                
                if len(exit_candidates) > 0:
                    exit_price = exit_candidates['implicit_price'].iloc[0]
                    
                    # Calculate return
                    if signal['signal'] == 'BUY':
                        trade_return = (exit_price - entry_price) / entry_price
                    else:  # SELL
                        trade_return = (entry_price - exit_price) / entry_price
                    
                    trade_data = {
                        'entry_time': entry_time,
                        'exit_time': exit_candidates['block_timestamp'].iloc[0],
                        'signal_type': signal['signal'],
                        'signal_strength': signal['strength'],
                        'entry_price': entry_price,
                        'exit_price': exit_price,
                        'return': trade_return,
                        'holding_period': holding_period,
                        'reasons': '; '.join(signal['reasons'])
                    }
                    
                    trades.append(trade_data)
        
        return pd.DataFrame(trades)

# Test the production system
print("Testing production mean reversion system...")

trading_system = MeanReversionTradingSystem()

backtest_results = {}
for coin_name in sorted(df['coin_name'].unique()):  # Analyze all coins
    coin_data = df[df['coin_name'] == coin_name].copy()
    
    if len(coin_data) < 500:  # Lower threshold to include more coins
        print(f"  Skipping {coin_name}: only {len(coin_data)} transactions")
        continue
        
    print(f"\nBacktesting {coin_name}...")
    trades = trading_system.backtest_strategy(coin_data)
    
    if trades is not None and len(trades) > 0:
        backtest_results[coin_name] = trades
        
        # Performance metrics
        total_trades = len(trades)
        winning_trades = (trades['return'] > 0).sum()
        win_rate = winning_trades / total_trades
        avg_return = trades['return'].mean()
        total_return = trades['return'].sum()
        sharpe_ratio = trades['return'].mean() / trades['return'].std() if trades['return'].std() > 0 else 0
        
        print(f"  Total trades: {total_trades}")
        print(f"  Win rate: {win_rate:.1%}")
        print(f"  Average return per trade: {avg_return:.3f}")
        print(f"  Total return: {total_return:.3f}")
        print(f"  Sharpe ratio: {sharpe_ratio:.2f}")

# Combined backtest results
if backtest_results:
    all_trades = pd.concat([df.assign(coin=coin) for coin, df in backtest_results.items()], 
                          ignore_index=True)
    
    print(f"\n=== COMBINED BACKTEST RESULTS ===")
    print(f"Total trades across coins: {len(all_trades)}")
    print(f"Overall win rate: {(all_trades['return'] > 0).mean():.1%}")
    print(f"Overall average return: {all_trades['return'].mean():.3f}")
    print(f"Overall Sharpe ratio: {all_trades['return'].mean() / all_trades['return'].std():.2f}")
    
    # Performance by signal type
    signal_performance = all_trades.groupby('signal_type')['return'].agg(['count', 'mean', 'std'])
    print(f"\nPerformance by signal type:")
    print(signal_performance)



=== 2. MEAN REVERSION TRADING SIGNALS ===
Testing production mean reversion system...

Backtesting Coin_1...
  Total trades: 376
  Win rate: 60.1%
  Average return per trade: -0.026
  Total return: -9.694
  Sharpe ratio: -0.14

Backtesting Coin_10...
  Total trades: 28975
  Win rate: 49.2%
  Average return per trade: -0.009
  Total return: -247.972
  Sharpe ratio: -0.05

Backtesting Coin_2...
  Total trades: 5361
  Win rate: 47.7%
  Average return per trade: -0.080
  Total return: -429.683
  Sharpe ratio: -0.17

Backtesting Coin_3...
  Total trades: 5282
  Win rate: 41.6%
  Average return per trade: -0.012
  Total return: -65.256
  Sharpe ratio: -0.16

Backtesting Coin_4...
  Total trades: 59
  Win rate: 76.3%
  Average return per trade: 0.202
  Total return: 11.902
  Sharpe ratio: 0.92

Backtesting Coin_5...
  Total trades: 31838
  Win rate: 20.1%
  Average return per trade: -1.029
  Total return: -32753.434
  Sharpe ratio: -0.30

Backtesting Coin_6...
  Total trades: 1620
  Win rate

In [5]:
# ============================================================================
# ENHANCED STRATEGY DEVELOPMENT BASED ON RESULTS
# ============================================================================

print("=" * 80)
print("DEVELOPING ENHANCED STRATEGIES BASED ON ANALYSIS RESULTS")
print("=" * 80)

# Strategy 1: Two-Phase Momentum-Then-Reversion Strategy
def two_phase_strategy(df, momentum_window=120, reversion_window=300):
    """
    Phase 1: Ride momentum for 120s
    Phase 2: Catch reversion for 300s
    """
    signals = []
    
    for idx, row in df.iterrows():
        if row['signal_type'] == 'NONE':
            continue
            
        # Phase 1: Momentum trade (opposite to traditional mean reversion)
        if row['signal_type'] == 'BUY':
            # Expect price to continue UP in short term
            momentum_direction = 'LONG'
        else:  # SELL
            # Expect price to continue DOWN in short term  
            momentum_direction = 'SHORT'
            
        # Phase 2: Reversion trade (traditional mean reversion)
        if row['signal_type'] == 'BUY':
            # After momentum, expect reversion DOWN
            reversion_direction = 'SHORT'
        else:  # SELL
            # After momentum, expect reversion UP
            reversion_direction = 'LONG'
            
        signals.append({
            'timestamp': row['timestamp'],
            'coin_id': row['coin_id'],
            'signal_type': row['signal_type'],
            'momentum_direction': momentum_direction,
            'reversion_direction': reversion_direction,
            'price_impact': row['price_impact'],
            'volume_intensity': row['volume_intensity']
        })
    
    return pd.DataFrame(signals)

# Strategy 2: BUY-Only Strategy (since SELL signals are terrible)
def buy_only_strategy(df, hold_period=300):
    """Only take BUY signals, ignore SELL signals completely"""
    buy_signals = df[df['signal_type'] == 'BUY'].copy()
    print(f"BUY-Only Strategy: {len(buy_signals)} signals (vs {len(df[df['signal_type'] != 'NONE'])} total)")
    return buy_signals

# Strategy 3: Extended Hold Strategy (300s instead of 60s)
def extended_hold_strategy(df, extended_hold=300):
    """Hold positions for 300s to catch the real reversion"""
    return df[df['signal_type'] != 'NONE'].copy()

# Strategy 4: Coin-Specific Strategy
def coin_specific_strategy(df):
    """Use different thresholds per coin based on performance"""
    
    # Based on results: Coin_4 is great, Coin_5 is terrible
    coin_performance = {
        'Coin_1': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_2': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True}, 
        'Coin_3': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_4': {'price_thresh': 0.03, 'volume_thresh': 1.5, 'trade_sell': True},  # More aggressive (good performer)
        'Coin_5': {'price_thresh': 0.10, 'volume_thresh': 3.0, 'trade_sell': False}, # Much more conservative (bad performer)
        'Coin_6': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_7': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_8': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_9': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True},
        'Coin_10': {'price_thresh': 0.05, 'volume_thresh': 2.0, 'trade_sell': True}
    }
    
    custom_signals = []
    
    for coin_id in df['coin_id'].unique():
        if coin_id not in coin_performance:
            continue
            
        coin_data = df[df['coin_id'] == coin_id].copy()
        params = coin_performance[coin_id]
        
        # Apply custom thresholds
        for idx, row in coin_data.iterrows():
            price_signal = abs(row['price_impact']) > params['price_thresh']
            volume_signal = row['volume_intensity'] > params['volume_thresh']
            
            if price_signal and volume_signal:
                if row['price_impact'] > 0:  # Price went up significantly
                    signal_type = 'SELL'  # Expect reversion down
                elif row['price_impact'] < 0:  # Price went down significantly  
                    signal_type = 'BUY'   # Expect reversion up
                else:
                    signal_type = 'NONE'
                    
                # Skip SELL signals for bad performers
                if signal_type == 'SELL' and not params['trade_sell']:
                    signal_type = 'NONE'
                    
                if signal_type != 'NONE':
                    custom_signals.append({
                        'timestamp': row['timestamp'],
                        'coin_id': coin_id,
                        'signal_type': signal_type,
                        'price_impact': row['price_impact'],
                        'volume_intensity': row['volume_intensity']
                    })
    
    return pd.DataFrame(custom_signals)

# Strategy 5: Adaptive Threshold Strategy
def adaptive_threshold_strategy(df, percentile=90):
    """Use dynamic thresholds based on market conditions"""
    
    adaptive_signals = []
    
    for coin_id in df['coin_id'].unique():
        coin_data = df[df['coin_id'] == coin_id].copy()
        
        # Calculate dynamic thresholds
        price_threshold = np.percentile(np.abs(coin_data['price_impact']), percentile)
        volume_threshold = np.percentile(coin_data['volume_intensity'], percentile)
        
        print(f"{coin_id}: Dynamic thresholds - Price: {price_threshold:.3f}, Volume: {volume_threshold:.2f}")
        
        for idx, row in coin_data.iterrows():
            if abs(row['price_impact']) > price_threshold and row['volume_intensity'] > volume_threshold:
                if row['price_impact'] > 0:
                    signal_type = 'SELL'
                else:
                    signal_type = 'BUY'
                    
                adaptive_signals.append({
                    'timestamp': row['timestamp'],
                    'coin_id': coin_id,
                    'signal_type': signal_type,
                    'price_impact': row['price_impact'],
                    'volume_intensity': row['volume_intensity']
                })
    
    return pd.DataFrame(adaptive_signals)


DEVELOPING ENHANCED STRATEGIES BASED ON ANALYSIS RESULTS


In [6]:
# ============================================================================
# STRATEGY BACKTESTING FRAMEWORK
# ============================================================================

def backtest_strategy_extended(strategy_signals, price_impacts_df, hold_period=300, strategy_name="Strategy"):
    """Enhanced backtesting with multiple hold periods and metrics"""
    
    if len(strategy_signals) == 0:
        print(f"No signals for {strategy_name}")
        return None
        
    results = []
    
    for hold_time in [60, 120, 180, 300, 600]:  # Test multiple hold periods
        trades = []
        
        for idx, signal in strategy_signals.iterrows():
            entry_time = signal['timestamp']
            exit_time = entry_time + pd.Timedelta(seconds=hold_time)
            
            # Find actual returns
            coin_data = price_impacts_df[price_impacts_df['coin_id'] == signal['coin_id']]
            future_data = coin_data[
                (coin_data['timestamp'] > entry_time) & 
                (coin_data['timestamp'] <= exit_time)
            ]
            
            if len(future_data) > 0:
                if signal['signal_type'] == 'BUY':
                    actual_return = future_data['price_impact'].sum()  # Expect price to go UP
                else:  # SELL
                    actual_return = -future_data['price_impact'].sum()  # Expect price to go DOWN
                    
                trades.append({
                    'signal_type': signal['signal_type'],
                    'predicted_direction': signal['signal_type'],
                    'actual_return': actual_return,
                    'coin_id': signal['coin_id'],
                    'hold_period': hold_time
                })
        
        if trades:
            trades_df = pd.DataFrame(trades)
            
            # Calculate metrics
            win_rate = (trades_df['actual_return'] > 0).mean()
            avg_return = trades_df['actual_return'].mean()
            sharpe = avg_return / trades_df['actual_return'].std() if trades_df['actual_return'].std() > 0 else 0
            max_return = trades_df['actual_return'].max()
            min_return = trades_df['actual_return'].min()
            
            results.append({
                'strategy': strategy_name,
                'hold_period': hold_time,
                'num_trades': len(trades),
                'win_rate': win_rate,
                'avg_return': avg_return,
                'sharpe_ratio': sharpe,
                'max_return': max_return,
                'min_return': min_return
            })
    
    return pd.DataFrame(results)


In [8]:
# ============================================================================
# CREATE SIGNALS FROM PRICE IMPACT ANALYSIS
# ============================================================================

# First, let's create the signals we need from the existing data
def create_impact_signals_from_analysis(all_impacts):
    """Convert price impact analysis into trading signals"""
    signals = []
    
    for idx, row in all_impacts.iterrows():
        # Generate signals based on price impact magnitude and volume
        price_impact = abs(row['impact_magnitude'])
        volume = row.get('volume', 0)
        
        # Calculate volume intensity (relative to other transactions)
        # Use sol_amount if available, otherwise estimate
        volume_intensity = volume / 10 if volume > 0 else 1.0  # Simple baseline
        
        # Generate signal based on impact direction and magnitude
        if price_impact > 0.05 and volume_intensity > 2.0:  # Significant impact
            if row['impact_direction'] == 'up':
                signal_type = 'SELL'  # Expect reversion down
            else:
                signal_type = 'BUY'   # Expect reversion up
                
            signals.append({
                'timestamp': row['impact_time'],
                'coin_id': row.get('coin', 'Unknown'),
                'signal_type': signal_type,
                'price_impact': row['impact_magnitude'],
                'volume_intensity': volume_intensity
            })
    
    return pd.DataFrame(signals)

# Create price_impacts_df for the strategies that need it
def create_price_impacts_df_from_analysis(all_impacts):
    """Convert analysis data to the format expected by strategy functions"""
    price_impacts_df = []
    
    for idx, row in all_impacts.iterrows():
        # Calculate volume intensity
        volume = row.get('volume', 0)
        volume_intensity = volume / 10 if volume > 0 else 1.0
        
        price_impacts_df.append({
            'timestamp': row['impact_time'],
            'coin_id': row.get('coin', 'Unknown'),
            'price_impact': row['impact_magnitude'],
            'volume_intensity': volume_intensity
        })
    
    return pd.DataFrame(price_impacts_df)

# Check if we have the required data
if 'all_impacts' in locals() and len(all_impacts) > 0:
    print("Creating impact signals from existing analysis...")
    impact_signals = create_impact_signals_from_analysis(all_impacts)
    price_impacts_df = create_price_impacts_df_from_analysis(all_impacts)
    
    print(f"Created {len(impact_signals)} impact signals")
    print(f"Signal distribution: {impact_signals['signal_type'].value_counts().to_dict()}")
else:
    print("⚠️ No impact analysis data found - creating sample signals for demonstration")
    # Create sample signals for demonstration
    sample_timestamps = pd.date_range('2024-01-01', periods=1000, freq='1min')
    impact_signals = pd.DataFrame({
        'timestamp': sample_timestamps,
        'coin_id': np.random.choice(['Coin_1', 'Coin_2', 'Coin_3', 'Coin_4', 'Coin_5'], 1000),
        'signal_type': np.random.choice(['BUY', 'SELL'], 1000),
        'price_impact': np.random.normal(0, 0.05, 1000),
        'volume_intensity': np.random.exponential(2, 1000)
    })
    
    price_impacts_df = pd.DataFrame({
        'timestamp': sample_timestamps,
        'coin_id': np.random.choice(['Coin_1', 'Coin_2', 'Coin_3', 'Coin_4', 'Coin_5'], 1000),
        'price_impact': np.random.normal(0, 0.05, 1000),
        'volume_intensity': np.random.exponential(2, 1000)
    })

# ============================================================================
# RUN ALL STRATEGIES
# ============================================================================

print("\n" + "="*60)
print("TESTING ALL ENHANCED STRATEGIES")
print("="*60)

strategies_to_test = [
    ("Original", impact_signals),
    ("BUY-Only", buy_only_strategy(impact_signals)),
    ("Extended-Hold-300s", extended_hold_strategy(impact_signals)),
    ("Coin-Specific", coin_specific_strategy(price_impacts_df)),
    ("Adaptive-Threshold", adaptive_threshold_strategy(price_impacts_df, percentile=85))
]

all_strategy_results = []

for strategy_name, signals in strategies_to_test:
    print(f"\n--- Testing {strategy_name} Strategy ---")
    if signals is not None and len(signals) > 0:
        results = backtest_strategy_extended(signals, price_impacts_df, strategy_name=strategy_name)
        if results is not None:
            all_strategy_results.append(results)
            
            # Show best performing hold period for this strategy
            best_period = results.loc[results['sharpe_ratio'].idxmax()]
            print(f"Best Hold Period: {best_period['hold_period']}s")
            print(f"Win Rate: {best_period['win_rate']:.3f}")
            print(f"Avg Return: {best_period['avg_return']:.3f}")
            print(f"Sharpe: {best_period['sharpe_ratio']:.3f}")
    else:
        print("No signals generated")


Creating impact signals from existing analysis...
Created 2349 impact signals
Signal distribution: {'BUY': 1290, 'SELL': 1059}

TESTING ALL ENHANCED STRATEGIES
BUY-Only Strategy: 1290 signals (vs 2349 total)
Coin_1: Dynamic thresholds - Price: 0.063, Volume: 0.78
Coin_10: Dynamic thresholds - Price: 0.061, Volume: 0.90
Coin_2: Dynamic thresholds - Price: 0.044, Volume: 0.27
Coin_3: Dynamic thresholds - Price: 0.041, Volume: 1.99
Coin_4: Dynamic thresholds - Price: 0.028, Volume: 0.70
Coin_5: Dynamic thresholds - Price: 0.084, Volume: 0.16
Coin_6: Dynamic thresholds - Price: 0.019, Volume: 0.06
Coin_7: Dynamic thresholds - Price: 0.025, Volume: 0.30
Coin_8: Dynamic thresholds - Price: 0.037, Volume: 0.84
Coin_9: Dynamic thresholds - Price: 0.027, Volume: 0.55

--- Testing Original Strategy ---
Best Hold Period: 180s
Win Rate: 0.517
Avg Return: 1.017
Sharpe: 0.138

--- Testing BUY-Only Strategy ---
Best Hold Period: 180s
Win Rate: 0.760
Avg Return: 5.336
Sharpe: 0.918

--- Testing Extend

In [9]:
# ============================================================================
# COMPREHENSIVE RESULTS COMPARISON
# ============================================================================

if all_strategy_results:
    combined_results = pd.concat(all_strategy_results, ignore_index=True)
    
    print("\n" + "="*80)
    print("STRATEGY PERFORMANCE COMPARISON")
    print("="*80)
    
    # Best strategy by Sharpe ratio
    best_overall = combined_results.loc[combined_results['sharpe_ratio'].idxmax()]
    print(f"\n🏆 BEST OVERALL STRATEGY:")
    print(f"Strategy: {best_overall['strategy']}")
    print(f"Hold Period: {best_overall['hold_period']}s")
    print(f"Win Rate: {best_overall['win_rate']:.1%}")
    print(f"Avg Return: {best_overall['avg_return']:.3f}")
    print(f"Sharpe Ratio: {best_overall['sharpe_ratio']:.3f}")
    print(f"Total Trades: {best_overall['num_trades']}")
    
    # Best by win rate
    best_winrate = combined_results.loc[combined_results['win_rate'].idxmax()]
    print(f"\n🎯 HIGHEST WIN RATE:")
    print(f"Strategy: {best_winrate['strategy']} ({best_winrate['hold_period']}s)")
    print(f"Win Rate: {best_winrate['win_rate']:.1%}")
    print(f"Avg Return: {best_winrate['avg_return']:.3f}")
    
    # Best by return
    best_return = combined_results.loc[combined_results['avg_return'].idxmax()]
    print(f"\n💰 HIGHEST RETURN:")
    print(f"Strategy: {best_return['strategy']} ({best_return['hold_period']}s)")
    print(f"Avg Return: {best_return['avg_return']:.3f}")
    print(f"Win Rate: {best_return['win_rate']:.1%}")
    
    # Strategy comparison table
    print(f"\n📊 STRATEGY SUMMARY (Best Hold Period Each):")
    strategy_summary = combined_results.groupby('strategy').apply(
        lambda x: x.loc[x['sharpe_ratio'].idxmax()]
    )[['hold_period', 'num_trades', 'win_rate', 'avg_return', 'sharpe_ratio']]
    
    print(strategy_summary.round(3))
    
    # Hold period analysis
    print(f"\n⏱️ HOLD PERIOD ANALYSIS:")
    hold_period_summary = combined_results.groupby('hold_period')[['win_rate', 'avg_return', 'sharpe_ratio']].mean()
    print(hold_period_summary.round(3))



STRATEGY PERFORMANCE COMPARISON

🏆 BEST OVERALL STRATEGY:
Strategy: BUY-Only
Hold Period: 180s
Win Rate: 76.0%
Avg Return: 5.336
Sharpe Ratio: 0.918
Total Trades: 1287

🎯 HIGHEST WIN RATE:
Strategy: BUY-Only (600s)
Win Rate: 80.2%
Avg Return: 44.881

💰 HIGHEST RETURN:
Strategy: BUY-Only (600s)
Avg Return: 44.881
Win Rate: 80.2%

📊 STRATEGY SUMMARY (Best Hold Period Each):
                    hold_period  num_trades  win_rate  avg_return  \
strategy                                                            
Adaptive-Threshold          600       11542     0.483       5.358   
BUY-Only                    180        1287     0.760       5.336   
Coin-Specific               180        1593     0.593       2.904   
Extended-Hold-300s          180        2346     0.517       1.017   
Original                    180        2346     0.517       1.017   

                    sharpe_ratio  
strategy                          
Adaptive-Threshold         0.051  
BUY-Only                   0.918  


In [14]:
# ============================================================================
# FAST CORRELATION ANALYSIS (NO SLOW STRATEGIES)
# ============================================================================

print("\n" + "="*60)
print("FAST CORRELATION & PATTERN ANALYSIS")
print("="*60)

# Skip slow nested loops - just do vectorized analysis
if 'price_impacts_df' in locals() and len(price_impacts_df) > 0:
    
    print("📊 COMPREHENSIVE CORRELATION ANALYSIS:")
    print("-" * 50)
    
    # 1. Basic Price-Volume Correlation
    price_vol_corr = price_impacts_df['price_impact'].corr(price_impacts_df['volume_intensity'])
    print(f"Price Impact ↔ Volume Intensity: {price_vol_corr:.3f} (VERY WEAK!)")
    
    # 2. Absolute Price Impact vs Volume (more meaningful)
    abs_price_vol_corr = abs(price_impacts_df['price_impact']).corr(price_impacts_df['volume_intensity'])
    print(f"ABS(Price Impact) ↔ Volume: {abs_price_vol_corr:.3f}")
    
    # 3. High Volume vs Extreme Moves
    high_volume = price_impacts_df['volume_intensity'] > price_impacts_df['volume_intensity'].quantile(0.75)
    extreme_moves = abs(price_impacts_df['price_impact']) > 0.05
    
    # Cross-tabulation to see relationship
    from scipy.stats import chi2_contingency
    contingency_table = pd.crosstab(high_volume, extreme_moves)
    print(f"\n📊 HIGH VOLUME vs EXTREME MOVES RELATIONSHIP:")
    print(contingency_table)
    
    # Calculate correlation coefficient for binary variables
    high_vol_extreme_corr = high_volume.astype(int).corr(extreme_moves.astype(int))
    print(f"High Volume ↔ Extreme Moves: {high_vol_extreme_corr:.3f}")
    
    # 4. Coin-specific correlations
    print(f"\n🪙 COIN-SPECIFIC CORRELATIONS:")
    coin_correlations = []
    for coin in price_impacts_df['coin_id'].unique():
        coin_data = price_impacts_df[price_impacts_df['coin_id'] == coin]
        if len(coin_data) > 100:  # Only for coins with sufficient data
            corr = coin_data['price_impact'].corr(coin_data['volume_intensity'])
            abs_corr = abs(coin_data['price_impact']).corr(coin_data['volume_intensity'])
            coin_correlations.append({
                'coin': coin,
                'price_vol_corr': corr,
                'abs_price_vol_corr': abs_corr,
                'n_trades': len(coin_data)
            })
    
    corr_df = pd.DataFrame(coin_correlations).sort_values('abs_price_vol_corr', ascending=False)
    print(corr_df.round(3))
    
    # 2. Volume Distribution Analysis
    print(f"\n📈 VOLUME DISTRIBUTION:")
    volume_stats = price_impacts_df['volume_intensity'].describe()
    vol_90th = price_impacts_df['volume_intensity'].quantile(0.90)
    print(f"Median volume intensity: {volume_stats['50%']:.2f}")
    print(f"75th percentile: {volume_stats['75%']:.2f}")
    print(f"90th percentile: {vol_90th:.2f}")
    
    # 3. Price Impact Distribution
    print(f"\n⚡ PRICE IMPACT PATTERNS:")
    price_stats = price_impacts_df['price_impact'].describe()
    print(f"Mean price impact: {price_stats['mean']:.4f}")
    print(f"Std price impact: {price_stats['std']:.4f}")
    print(f"Extreme moves (>5%): {(abs(price_impacts_df['price_impact']) > 0.05).sum():,} events")
    
    # 4. Coin-by-Coin Quick Stats
    print(f"\n🪙 BY COIN ANALYSIS:")
    if 'coin_id' in price_impacts_df.columns:
        coin_summary = price_impacts_df.groupby('coin_id').agg({
            'price_impact': ['count', 'mean', 'std'],
            'volume_intensity': ['mean', 'max']
        }).round(3)
        print("Top 5 coins by activity:")
        print(coin_summary.sort_values(('price_impact', 'count'), ascending=False).head())
    
    # 5. Simple High-Volume Filter Test
    print(f"\n🎯 VOLUME FILTERING IMPACT:")
    
    # Test different volume thresholds
    thresholds = [50, 75, 90, 95]
    for pct in thresholds:
        vol_threshold = price_impacts_df['volume_intensity'].quantile(pct/100)
        high_vol_data = price_impacts_df[price_impacts_df['volume_intensity'] > vol_threshold]
        
        if len(high_vol_data) > 10:
            avg_price_impact = abs(high_vol_data['price_impact']).mean()
            print(f"  {pct}th percentile (vol>{vol_threshold:.1f}): {len(high_vol_data):,} events, avg impact: {avg_price_impact:.4f}")
    
    # 6. Quick Signal Quality Check
    print(f"\n🚀 SIGNAL QUALITY PREVIEW:")
    
    # Simple BUY signal check
    buy_conditions = (price_impacts_df['price_impact'] < -0.05) & (price_impacts_df['volume_intensity'] > 2.0)
    sell_conditions = (price_impacts_df['price_impact'] > 0.05) & (price_impacts_df['volume_intensity'] > 2.0)
    
    print(f"Potential BUY signals (price<-5%, vol>2): {buy_conditions.sum():,}")
    print(f"Potential SELL signals (price>+5%, vol>2): {sell_conditions.sum():,}")
    print(f"Total signal rate: {(buy_conditions.sum() + sell_conditions.sum()) / len(price_impacts_df):.1%}")
    
    # 7. MULTI-SIGNAL CORRELATION TEST (Fast Version)
    print(f"\n🎯 MULTI-SIGNAL CORRELATION ANALYSIS:")
    print("-" * 40)
    
    # Test if multiple signals align better than single signals
    price_signal = abs(price_impacts_df['price_impact']) > 0.05
    volume_signal = price_impacts_df['volume_intensity'] > price_impacts_df['volume_intensity'].quantile(0.75)
    
    # Multi-signal correlation
    multi_signal = price_signal & volume_signal
    single_price_only = price_signal & ~volume_signal
    single_volume_only = volume_signal & ~price_signal
    
    print(f"Price-only signals: {single_price_only.sum():,}")
    print(f"Volume-only signals: {single_volume_only.sum():,}")
    print(f"Multi-signal (both): {multi_signal.sum():,}")
    print(f"Multi-signal rate: {multi_signal.mean():.1%}")
    
    # Correlation between signal types
    price_vol_signal_corr = price_signal.astype(int).corr(volume_signal.astype(int))
    print(f"Price Signal ↔ Volume Signal correlation: {price_vol_signal_corr:.3f}")
    
    # 8. REGIME-AWARE CORRELATION TEST (Fast Version)
    print(f"\n📊 REGIME-AWARE ANALYSIS:")
    print("-" * 30)
    
    # Calculate volatility regimes without nested loops
    price_impacts_df['abs_impact'] = abs(price_impacts_df['price_impact'])
    
    # Define volatility regimes using rolling standard deviation proxy
    # Use quantiles as a proxy for different volatility regimes
    vol_33rd = price_impacts_df['abs_impact'].quantile(0.33)
    vol_67th = price_impacts_df['abs_impact'].quantile(0.67)
    
    low_vol_regime = price_impacts_df['abs_impact'] <= vol_33rd
    medium_vol_regime = (price_impacts_df['abs_impact'] > vol_33rd) & (price_impacts_df['abs_impact'] <= vol_67th)
    high_vol_regime = price_impacts_df['abs_impact'] > vol_67th
    
    print(f"Low volatility regime: {low_vol_regime.sum():,} events ({low_vol_regime.mean():.1%})")
    print(f"Medium volatility regime: {medium_vol_regime.sum():,} events ({medium_vol_regime.mean():.1%})")
    print(f"High volatility regime: {high_vol_regime.sum():,} events ({high_vol_regime.mean():.1%})")
    
    # Volume behavior by regime
    low_vol_avg_volume = price_impacts_df[low_vol_regime]['volume_intensity'].mean()
    med_vol_avg_volume = price_impacts_df[medium_vol_regime]['volume_intensity'].mean()
    high_vol_avg_volume = price_impacts_df[high_vol_regime]['volume_intensity'].mean()
    
    print(f"\nAverage volume by regime:")
    print(f"  Low vol regime: {low_vol_avg_volume:.3f}")
    print(f"  Medium vol regime: {med_vol_avg_volume:.3f}")
    print(f"  High vol regime: {high_vol_avg_volume:.3f}")
    
    # Regime correlation with volume
    regime_volume_correlation = []
    for regime_name, regime_mask in [("Low", low_vol_regime), ("Medium", medium_vol_regime), ("High", high_vol_regime)]:
        if regime_mask.sum() > 100:  # Enough data
            regime_data = price_impacts_df[regime_mask]
            corr = regime_data['price_impact'].corr(regime_data['volume_intensity'])
            regime_volume_correlation.append({
                'regime': regime_name,
                'price_vol_corr': corr,
                'n_events': regime_mask.sum()
            })
    
    regime_corr_df = pd.DataFrame(regime_volume_correlation)
    print(f"\nPrice-Volume correlation by regime:")
    print(regime_corr_df.round(3))
    
else:
    print("⚠️ No price_impacts_df available for fast analysis")



FAST CORRELATION & PATTERN ANALYSIS
📊 COMPREHENSIVE CORRELATION ANALYSIS:
--------------------------------------------------
Price Impact ↔ Volume Intensity: 0.054 (VERY WEAK!)
ABS(Price Impact) ↔ Volume: 0.060

📊 HIGH VOLUME vs EXTREME MOVES RELATIONSHIP:
price_impact       False  True 
volume_intensity               
False             214032  68133
True               77873  16182
High Volume ↔ Extreme Moves: -0.072

🪙 COIN-SPECIFIC CORRELATIONS:
      coin  price_vol_corr  abs_price_vol_corr  n_trades
0   Coin_1          -0.022               0.263      3156
3   Coin_3           0.037               0.242      6416
6   Coin_6           0.043               0.188     26187
5   Coin_5           0.092               0.101    184295
2   Coin_2          -0.016               0.072     70961
4   Coin_4           0.265               0.026     14536
1  Coin_10           0.015               0.018     27631
9   Coin_9           0.652              -0.008     20354
8   Coin_8           0.798        