## Section 4: Visualize Orderbook Patterns

In this section, we'll create visualizations to better understand the orderbook patterns before large return events.

In [None]:
# Function to visualize the orderbook state
def visualize_orderbook(data, timestamp, window=5):
    """
    Visualize the orderbook state around a specific timestamp.
    
    Parameters:
        data (pd.DataFrame): DataFrame containing orderbook data
        timestamp (int): Timestamp to visualize
        window (int): Number of observations before and after the timestamp to include
    """
    # Find the index of the timestamp
    idx = data[data['timestamp'] == timestamp].index[0]
    
    # Get data around the timestamp
    start_idx = max(0, idx - window)
    end_idx = min(len(data), idx + window + 1)
    window_data = data.iloc[start_idx:end_idx].copy()
    
    # Mark the event timestamp
    window_data['is_event'] = window_data['timestamp'] == timestamp
    
    # Create a figure with multiple subplots
    fig, axes = plt.subplots(3, 1, figsize=(14, 12), sharex=True)
    
    # Plot 1: Mid price
    axes[0].plot(window_data['timestamp'], window_data['mid_price'], marker='o')
    event_price = window_data.loc[window_data['is_event'], 'mid_price'].values[0]
    axes[0].scatter(timestamp, event_price, color='red', s=100, zorder=5)
    axes[0].set_title('Mid Price Around Event')
    axes[0].set_ylabel('Mid Price')
    axes[0].grid(True)
    
    # Plot 2: Bid-Ask Spread
    axes[1].plot(window_data['timestamp'], window_data['spread'], marker='o')
    event_spread = window_data.loc[window_data['is_event'], 'spread'].values[0]
    axes[1].scatter(timestamp, event_spread, color='red', s=100, zorder=5)
    axes[1].set_title('Bid-Ask Spread Around Event')
    axes[1].set_ylabel('Spread')
    axes[1].grid(True)
    
    # Plot 3: Volume Imbalance
    axes[2].plot(window_data['timestamp'], window_data['volume_imbalance'], marker='o')
    event_imbalance = window_data.loc[window_data['is_event'], 'volume_imbalance'].values[0]
    axes[2].scatter(timestamp, event_imbalance, color='red', s=100, zorder=5)
    axes[2].set_title('Volume Imbalance Around Event')
    axes[2].set_ylabel('Volume Imbalance')
    axes[2].set_xlabel('Timestamp')
    axes[2].grid(True)
    
    # Add a horizontal line at y=0 for volume imbalance
    axes[2].axhline(y=0, color='gray', linestyle='--', alpha=0.7)
    
    # Add event marker lines
    for ax in axes:
        ax.axvline(x=timestamp, color='red', linestyle='--', alpha=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Print additional information about the event
    event_data = window_data.loc[window_data['is_event']].iloc[0]
    next_idx = idx + 1
    if next_idx < len(data):
        next_data = data.iloc[next_idx]
        return_value = next_data['returns']
        print(f"Event timestamp: {timestamp}")
        print(f"Return after event: {return_value:.6f}")
        print(f"Mid price before event: {event_data['mid_price']:.2f}")
        print(f"Mid price after event: {next_data['mid_price']:.2f}")
        print(f"Spread: {event_data['spread']:.2f}")
        print(f"Volume imbalance: {event_data['volume_imbalance']:.4f}")
        print(f"Book depth: {event_data['book_depth']:.0f}")
        print(f"Bid volume: {event_data['bid_volume_total']:.0f}")
        print(f"Ask volume: {event_data['ask_volume_total']:.0f}")

In [None]:
# Visualize a few examples of large positive return events
positive_examples = positive_returns.sort_values('return_value', ascending=False).head(3)['timestamp'].values

for timestamp in positive_examples:
    print(f"\nVisualizing large positive return event at timestamp {timestamp}:")
    visualize_orderbook(squid_data_with_features, timestamp, window=10)

In [None]:
# Visualize a few examples of large negative return events
negative_examples = negative_returns.sort_values('return_value').head(3)['timestamp'].values

for timestamp in negative_examples:
    print(f"\nVisualizing large negative return event at timestamp {timestamp}:")
    visualize_orderbook(squid_data_with_features, timestamp, window=10)

### Visualize Orderbook Depth

Let's create a visualization of the orderbook depth before large return events.

In [None]:
# Function to visualize orderbook depth
def visualize_orderbook_depth(data, timestamp):
    """
    Visualize the orderbook depth at a specific timestamp.
    
    Parameters:
        data (pd.DataFrame): DataFrame containing orderbook data
        timestamp (int): Timestamp to visualize
    """
    # Get the row for the timestamp
    row = data[data['timestamp'] == timestamp].iloc[0]
    
    # Extract bid and ask prices and volumes
    bid_prices = [row['bid_price_1'], row['bid_price_2'], row['bid_price_3']]
    bid_volumes = [row['bid_volume_1'], row['bid_volume_2'], row['bid_volume_3']]
    ask_prices = [row['ask_price_1'], row['ask_price_2'], row['ask_price_3']]
    ask_volumes = [row['ask_volume_1'], row['ask_volume_2'], row['ask_volume_3']]
    
    # Create a figure
    fig, ax = plt.subplots(figsize=(12, 6))
    
    # Plot bid side (negative volumes for visualization)
    ax.barh(bid_prices, [-vol for vol in bid_volumes], height=0.5, color='green', alpha=0.7, label='Bids')
    
    # Plot ask side
    ax.barh(ask_prices, ask_volumes, height=0.5, color='red', alpha=0.7, label='Asks')
    
    # Add mid price line
    mid_price = row['mid_price']
    ax.axhline(y=mid_price, color='blue', linestyle='-', alpha=0.7, label='Mid Price')
    
    # Set labels and title
    ax.set_title(f'Orderbook Depth at Timestamp {timestamp}')
    ax.set_xlabel('Volume')
    ax.set_ylabel('Price')
    
    # Add legend
    ax.legend()
    
    # Adjust x-axis labels to show absolute values
    xticks = ax.get_xticks()
    ax.set_xticklabels([str(abs(int(x))) for x in xticks])
    
    # Add grid
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print additional information
    next_idx = data[data['timestamp'] == timestamp].index[0] + 1
    if next_idx < len(data):
        next_row = data.iloc[next_idx]
        return_value = next_row['returns']
        print(f"Return after event: {return_value:.6f}")
        print(f"Volume imbalance: {row['volume_imbalance']:.4f}")
        print(f"Bid-Ask Spread: {row['spread']:.2f}")

In [None]:
# Visualize orderbook depth for a few examples
# Combine positive and negative examples
examples = list(positive_examples[:2]) + list(negative_examples[:2])

for timestamp in examples:
    print(f"\nVisualizing orderbook depth at timestamp {timestamp}:")
    visualize_orderbook_depth(squid_data_with_features, timestamp)

### Visualize Average Orderbook Patterns

Let's visualize the average orderbook patterns before large positive and negative returns.

In [None]:
# Function to extract orderbook features around events
def extract_features_around_events(data, event_timestamps, window=5):
    """
    Extract orderbook features around event timestamps.
    
    Parameters:
        data (pd.DataFrame): DataFrame containing orderbook data
        event_timestamps (list): List of event timestamps
        window (int): Number of observations before and after the event to include
        
    Returns:
        dict: Dictionary with arrays of features aligned relative to the event
    """
    # Initialize lists to store features
    all_mid_prices = []
    all_spreads = []
    all_volume_imbalances = []
    all_book_depths = []
    
    # Loop through each event timestamp
    for timestamp in event_timestamps:
        # Find the index of the timestamp
        try:
            idx = data[data['timestamp'] == timestamp].index[0]
        except IndexError:
            # Skip if timestamp not found
            continue
        
        # Get data around the timestamp
        start_idx = max(0, idx - window)
        end_idx = min(len(data), idx + window + 1)
        window_data = data.iloc[start_idx:end_idx].copy()
        
        # Calculate relative position to the event
        window_data['relative_position'] = window_data.index - idx
        
        # Normalize mid price to the event price
        event_price = window_data.loc[idx, 'mid_price']
        window_data['normalized_price'] = window_data['mid_price'] / event_price - 1
        
        # Group by relative position
        grouped = window_data.groupby('relative_position')
        
        # Extract features
        mid_prices = grouped['normalized_price'].mean()
        spreads = grouped['relative_spread'].mean()
        volume_imbalances = grouped['volume_imbalance'].mean()
        book_depths = grouped['book_depth'].mean()
        
        # Append to lists
        all_mid_prices.append(mid_prices)
        all_spreads.append(spreads)
        all_volume_imbalances.append(volume_imbalances)
        all_book_depths.append(book_depths)
    
    # Combine all events
    combined_mid_prices = pd.concat(all_mid_prices, axis=1).mean(axis=1)
    combined_spreads = pd.concat(all_spreads, axis=1).mean(axis=1)
    combined_volume_imbalances = pd.concat(all_volume_imbalances, axis=1).mean(axis=1)
    combined_book_depths = pd.concat(all_book_depths, axis=1).mean(axis=1)
    
    return {
        'mid_prices': combined_mid_prices,
        'spreads': combined_spreads,
        'volume_imbalances': combined_volume_imbalances,
        'book_depths': combined_book_depths
    }

In [None]:
# Extract features around positive and negative return events
positive_features = extract_features_around_events(squid_data_with_features, positive_returns['timestamp'].values, window=10)
negative_features = extract_features_around_events(squid_data_with_features, negative_returns['timestamp'].values, window=10)

# Create a figure with multiple subplots
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: Normalized Mid Price
axes[0, 0].plot(positive_features['mid_prices'].index, positive_features['mid_prices'].values, label='Before Positive Returns', color='green')
axes[0, 0].plot(negative_features['mid_prices'].index, negative_features['mid_prices'].values, label='Before Negative Returns', color='red')
axes[0, 0].axvline(x=0, color='black', linestyle='--', alpha=0.5)
axes[0, 0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[0, 0].set_title('Average Normalized Mid Price Around Events')
axes[0, 0].set_xlabel('Relative Position to Event')
axes[0, 0].set_ylabel('Normalized Price')
axes[0, 0].legend()
axes[0, 0].grid(True)

# Plot 2: Relative Spread
axes[0, 1].plot(positive_features['spreads'].index, positive_features['spreads'].values, label='Before Positive Returns', color='green')
axes[0, 1].plot(negative_features['spreads'].index, negative_features['spreads'].values, label='Before Negative Returns', color='red')
axes[0, 1].axvline(x=0, color='black', linestyle='--', alpha=0.5)
axes[0, 1].set_title('Average Relative Spread Around Events')
axes[0, 1].set_xlabel('Relative Position to Event')
axes[0, 1].set_ylabel('Relative Spread')
axes[0, 1].legend()
axes[0, 1].grid(True)

# Plot 3: Volume Imbalance
axes[1, 0].plot(positive_features['volume_imbalances'].index, positive_features['volume_imbalances'].values, label='Before Positive Returns', color='green')
axes[1, 0].plot(negative_features['volume_imbalances'].index, negative_features['volume_imbalances'].values, label='Before Negative Returns', color='red')
axes[1, 0].axvline(x=0, color='black', linestyle='--', alpha=0.5)
axes[1, 0].axhline(y=0, color='gray', linestyle='--', alpha=0.5)
axes[1, 0].set_title('Average Volume Imbalance Around Events')
axes[1, 0].set_xlabel('Relative Position to Event')
axes[1, 0].set_ylabel('Volume Imbalance')
axes[1, 0].legend()
axes[1, 0].grid(True)

# Plot 4: Book Depth
axes[1, 1].plot(positive_features['book_depths'].index, positive_features['book_depths'].values, label='Before Positive Returns', color='green')
axes[1, 1].plot(negative_features['book_depths'].index, negative_features['book_depths'].values, label='Before Negative Returns', color='red')
axes[1, 1].axvline(x=0, color='black', linestyle='--', alpha=0.5)
axes[1, 1].set_title('Average Book Depth Around Events')
axes[1, 1].set_xlabel('Relative Position to Event')
axes[1, 1].set_ylabel('Book Depth')
axes[1, 1].legend()
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

## Conclusions

Based on our visualizations and analysis, we can draw the following conclusions about orderbook patterns before large returns:

1. **Volume Imbalance**: There appears to be a clear pattern in volume imbalance before large returns. Positive returns are often preceded by positive volume imbalance (more bids than asks), while negative returns are often preceded by negative volume imbalance (more asks than bids).

2. **Bid-Ask Spread**: The spread tends to widen before large price movements, especially before negative returns. This suggests increased uncertainty or volatility in the market.

3. **Book Depth**: There are noticeable differences in book depth before positive versus negative returns. Lower book depth (less liquidity) may indicate potential for larger price movements.

4. **Price Patterns**: We can observe subtle price patterns before large returns, with prices often moving slightly in the opposite direction before a large move (mean reversion pattern).

These patterns could potentially be used to develop trading strategies that anticipate large price movements based on orderbook features.

## Next Steps

Based on our findings, here are some potential next steps for further analysis:

1. Develop a predictive model using orderbook features to forecast large price movements
2. Test trading strategies that exploit the patterns we've identified
3. Analyze the time decay of these signals (how long do they remain predictive?)
4. Investigate whether these patterns are consistent across different market conditions
5. Combine orderbook features with other data sources (e.g., trade history) for improved predictions