# QuantJourney Bid-Ask Spread Estimator - Usage Examples

## Complete Guide to EDGE Estimator Implementation

This notebook demonstrates comprehensive usage of the `quantjourney-bidask` library for estimating bid-ask spreads from OHLC prices using the methodology from Ardia, Guidotti, & Kroencke (2024).

**Author:** Jakub Polec  
**Date:** 2025-06-28  
**Framework:** QuantJourney - Advanced quantitative finance tools and insights


## Table of Contents

1. [Installation and Setup](#installation)
2. [Basic EDGE Estimation](#basic-estimation) 
3. [Rolling Window Analysis](#rolling-windows)
4. [Expanding Window Analysis](#expanding-windows)
5. [Real-Time Data Integration](#realtime-data)
6. [Cryptocurrency Analysis](#crypto-analysis)
7. [Advanced Visualizations](#visualizations)
8. [Performance Monitoring](#performance)
9. [Risk Management Applications](#risk-management)

## 1. Installation and Setup {#installation}

First, install the quantjourney-bidask package and import required libraries:

In [None]:
# Install the package (run this if not already installed)
# !pip install quantjourney-bidask

# Core imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import asyncio
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# QuantJourney EDGE estimators
from quantjourney_bidask import edge, edge_rolling, edge_expanding

# Data fetching utilities (for examples)
try:
    from data.fetch import DataFetcher, get_stock_data, get_crypto_data
    data_available = True
except ImportError:
    print("Data fetching utilities not available. Using synthetic data for examples.")
    data_available = False

print(f"✅ QuantJourney Bid-Ask Spread Estimator loaded successfully")
print(f"📊 Real data fetching: {'Available' if data_available else 'Using synthetic data'}")

## 2. Basic EDGE Estimation {#basic-estimation}

The core `edge()` function estimates bid-ask spreads from OHLC prices using the efficient estimator.

In [None]:
# Generate synthetic OHLC data for demonstration
np.random.seed(42)
n_periods = 100
base_price = 100.0
volatility = 0.02

# Generate realistic OHLC data
returns = np.random.normal(0, volatility, n_periods)
prices = base_price * np.exp(np.cumsum(returns))

# Create OHLC with realistic intraday patterns
spread_width = 0.001  # 10 bps spread
open_prices = prices + np.random.uniform(-spread_width, spread_width, n_periods) * prices
high_prices = prices + np.random.uniform(0, spread_width*2, n_periods) * prices
low_prices = prices - np.random.uniform(0, spread_width*2, n_periods) * prices
close_prices = prices + np.random.uniform(-spread_width, spread_width, n_periods) * prices

# Ensure high >= low consistency
high_prices = np.maximum(high_prices, np.maximum(open_prices, close_prices))
low_prices = np.minimum(low_prices, np.minimum(open_prices, close_prices))

print("=== Basic EDGE Estimation ===")
print(f"Sample size: {n_periods} periods")
print(f"Price range: ${low_prices.min():.2f} - ${high_prices.max():.2f}")

# Calculate single spread estimate
spread_estimate = edge(open_prices, high_prices, low_prices, close_prices)
spread_bps = spread_estimate * 10000  # Convert to basis points

print(f"\n📊 EDGE Spread Estimate: {spread_estimate:.6f} ({spread_bps:.2f} bps)")
print(f"🎯 True spread (simulated): {spread_width:.6f} ({spread_width*10000:.2f} bps)")
print(f"📈 Estimation accuracy: {abs(spread_estimate - spread_width)/spread_width*100:.1f}% difference")

## 3. Rolling Window Analysis {#rolling-windows}

The `edge_rolling()` function provides time-varying spread estimates using rolling windows.

In [None]:
# Create DataFrame for rolling analysis
df = pd.DataFrame({
    'open': open_prices,
    'high': high_prices,
    'low': low_prices,
    'close': close_prices
})

# Add timestamps
df.index = pd.date_range(start='2024-01-01', periods=len(df), freq='1H')

print("=== Rolling Window Analysis ===")

# Calculate rolling spreads with different window sizes
window_sizes = [10, 20, 50]
rolling_results = {}

for window in window_sizes:
    rolling_spreads = edge_rolling(df, window=window)
    rolling_results[window] = rolling_spreads
    
    # Calculate statistics
    mean_spread = rolling_spreads.mean()
    std_spread = rolling_spreads.std()
    
    print(f"Window {window:2d}: Mean={mean_spread:.6f} ({mean_spread*10000:.2f} bps), Std={std_spread:.6f}")

# Visualization
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# Plot prices
ax1.plot(df.index, df['close'], label='Close Price', alpha=0.7)
ax1.fill_between(df.index, df['low'], df['high'], alpha=0.3, label='Daily Range')
ax1.set_title('Price Evolution')
ax1.set_ylabel('Price ($)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot rolling spreads
for window in window_sizes:
    spreads_bps = rolling_results[window] * 10000
    ax2.plot(df.index, spreads_bps, label=f'Window {window}', alpha=0.8)

ax2.axhline(y=spread_width*10000, color='red', linestyle='--', alpha=0.7, label='True Spread')
ax2.set_title('Rolling EDGE Spread Estimates')
ax2.set_ylabel('Spread (bps)')
ax2.set_xlabel('Time')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n📈 Rolling analysis complete for {len(window_sizes)} different window sizes")

## 4. Expanding Window Analysis {#expanding-windows}

The `edge_expanding()` function shows how spread estimates converge as more data becomes available.

In [None]:
print("=== Expanding Window Analysis ===")

# Calculate expanding window spreads
expanding_spreads = edge_expanding(df, min_periods=10)
expanding_spreads_bps = expanding_spreads * 10000

# Calculate convergence statistics
final_estimate = expanding_spreads.iloc[-1]
convergence_periods = len(expanding_spreads.dropna())

print(f"Expanding window periods: {convergence_periods}")
print(f"Final estimate: {final_estimate:.6f} ({final_estimate*10000:.2f} bps)")
print(f"Convergence to true spread: {abs(final_estimate - spread_width)/spread_width*100:.1f}% difference")

# Visualization
fig, ax = plt.subplots(figsize=(12, 6))

# Plot expanding spreads
ax.plot(df.index, expanding_spreads_bps, label='Expanding EDGE Estimate', linewidth=2, color='blue')
ax.axhline(y=spread_width*10000, color='red', linestyle='--', alpha=0.7, label='True Spread', linewidth=2)

# Add confidence bands (using rolling standard deviation)
rolling_std = expanding_spreads.rolling(window=20).std() * 10000
upper_band = expanding_spreads_bps + 2*rolling_std
lower_band = expanding_spreads_bps - 2*rolling_std

ax.fill_between(df.index, lower_band, upper_band, alpha=0.2, color='blue', label='±2σ Confidence')

ax.set_title('Expanding Window EDGE Spread Estimates - Convergence Analysis')
ax.set_ylabel('Spread (bps)')
ax.set_xlabel('Time')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Convergence analysis
initial_periods = 20
if len(expanding_spreads.dropna()) > initial_periods:
    initial_est = expanding_spreads.dropna().iloc[initial_periods]
    improvement = abs(initial_est - spread_width) - abs(final_estimate - spread_width)
    print(f"\n📊 Improvement from period {initial_periods} to final: {improvement/spread_width*100:.1f}%")

## 5. Real-Time Data Integration {#realtime-data}

Demonstrate integration with real market data (stocks and cryptocurrencies).

In [None]:
print("=== Real-Time Data Integration ===")

if data_available:
    # Stock data example
    print("\n📈 Stock Data Analysis (Apple Inc.)")
    try:
        stock_df = get_stock_data("AAPL", period="1mo", interval="1d")
        if not stock_df.empty:
            stock_spread = edge(stock_df['open'], stock_df['high'], stock_df['low'], stock_df['close'])
            stock_rolling = edge_rolling(stock_df, window=10)
            
            print(f"AAPL overall spread: {stock_spread:.6f} ({stock_spread*10000:.2f} bps)")
            print(f"AAPL rolling avg: {stock_rolling.mean():.6f} ({stock_rolling.mean()*10000:.2f} bps)")
            print(f"Data period: {stock_df.index[0].date()} to {stock_df.index[-1].date()}")
        else:
            print("❌ No stock data available")
    except Exception as e:
        print(f"❌ Stock data error: {e}")
    
    # Cryptocurrency data example
    print("\n₿ Cryptocurrency Analysis (BTC/USDT)")
    try:
        # Note: This is async, so we'll use a simplified approach
        fetcher = DataFetcher()
        print("🔄 Attempting to fetch BTC historical data...")
        
        # For demonstration, create realistic crypto data
        btc_base = 45000
        crypto_returns = np.random.normal(0, 0.03, 168)  # 1 week of hourly data
        btc_prices = btc_base * np.exp(np.cumsum(crypto_returns))
        
        # Crypto typically has tighter spreads
        crypto_spread = 0.0002  # 2 bps
        btc_open = btc_prices + np.random.uniform(-crypto_spread, crypto_spread, 168) * btc_prices
        btc_high = btc_prices + np.random.uniform(0, crypto_spread*2, 168) * btc_prices
        btc_low = btc_prices - np.random.uniform(0, crypto_spread*2, 168) * btc_prices
        btc_close = btc_prices + np.random.uniform(-crypto_spread, crypto_spread, 168) * btc_prices
        
        # Ensure OHLC consistency
        btc_high = np.maximum(btc_high, np.maximum(btc_open, btc_close))
        btc_low = np.minimum(btc_low, np.minimum(btc_open, btc_close))
        
        btc_spread = edge(btc_open, btc_high, btc_low, btc_close)
        print(f"BTC/USDT spread estimate: {btc_spread:.6f} ({btc_spread*10000:.2f} bps)")
        print(f"🎯 Expected crypto spread: ~{crypto_spread*10000:.1f} bps")
        
    except Exception as e:
        print(f"❌ Crypto data error: {e}")

else:
    print("\n⚠️  Real data fetching not available. Using synthetic examples.")
    
    # Create synthetic "real-world" examples
    print("\n📊 Synthetic Real-World Examples:")
    
    # Large cap stock (tight spreads)
    large_cap_spread = 0.0005  # 5 bps
    print(f"Large Cap Stock (simulated): ~{large_cap_spread*10000:.1f} bps")
    
    # Small cap stock (wider spreads) 
    small_cap_spread = 0.0025  # 25 bps
    print(f"Small Cap Stock (simulated): ~{small_cap_spread*10000:.1f} bps")
    
    # Major cryptocurrency
    crypto_spread = 0.0002  # 2 bps
    print(f"Major Cryptocurrency (simulated): ~{crypto_spread*10000:.1f} bps")

print("\n✅ Real-time data integration examples complete")

## 6. Cryptocurrency Multi-Asset Analysis {#crypto-analysis}

Compare spreads across multiple cryptocurrency pairs to identify liquidity patterns.

In [None]:
print("=== Cryptocurrency Multi-Asset Analysis ===")

# Simulate multiple cryptocurrency pairs with realistic characteristics
crypto_pairs = {
    'BTC/USDT': {'base_price': 45000, 'volatility': 0.03, 'spread': 0.0002},
    'ETH/USDT': {'base_price': 2800, 'volatility': 0.04, 'spread': 0.0003},
    'BNB/USDT': {'base_price': 320, 'volatility': 0.05, 'spread': 0.0005},
    'ADA/USDT': {'base_price': 0.45, 'volatility': 0.06, 'spread': 0.0008},
    'SOL/USDT': {'base_price': 95, 'volatility': 0.07, 'spread': 0.0010}
}

results = {}
periods = 100

for pair, params in crypto_pairs.items():
    # Generate price series
    returns = np.random.normal(0, params['volatility'], periods)
    prices = params['base_price'] * np.exp(np.cumsum(returns))
    
    # Generate OHLC with realistic spreads
    spread_factor = params['spread']
    open_p = prices + np.random.uniform(-spread_factor, spread_factor, periods) * prices
    high_p = prices + np.random.uniform(0, spread_factor*2, periods) * prices
    low_p = prices - np.random.uniform(0, spread_factor*2, periods) * prices
    close_p = prices + np.random.uniform(-spread_factor, spread_factor, periods) * prices
    
    # Ensure OHLC consistency
    high_p = np.maximum(high_p, np.maximum(open_p, close_p))
    low_p = np.minimum(low_p, np.minimum(open_p, close_p))
    
    # Calculate spreads
    edge_spread = edge(open_p, high_p, low_p, close_p)
    
    # Create DataFrame for rolling analysis
    crypto_df = pd.DataFrame({
        'open': open_p,
        'high': high_p,
        'low': low_p,
        'close': close_p
    })
    
    rolling_spreads = edge_rolling(crypto_df, window=20)
    
    results[pair] = {
        'edge_spread': edge_spread,
        'true_spread': params['spread'],
        'rolling_mean': rolling_spreads.mean(),
        'rolling_std': rolling_spreads.std(),
        'final_price': prices[-1],
        'volatility': params['volatility']
    }

# Display results
print("\n📊 Cryptocurrency Spread Analysis Results:")
print(f"{'Pair':<12} {'EDGE (bps)':<12} {'True (bps)':<12} {'Roll Avg':<12} {'Roll Std':<12} {'Accuracy':<10}")
print("-" * 75)

for pair, data in results.items():
    edge_bps = data['edge_spread'] * 10000
    true_bps = data['true_spread'] * 10000
    roll_avg_bps = data['rolling_mean'] * 10000
    roll_std_bps = data['rolling_std'] * 10000
    accuracy = abs(data['edge_spread'] - data['true_spread']) / data['true_spread'] * 100
    
    print(f"{pair:<12} {edge_bps:<12.2f} {true_bps:<12.2f} {roll_avg_bps:<12.2f} {roll_std_bps:<12.2f} {accuracy:<10.1f}%")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Spread comparison
pairs = list(results.keys())
edge_spreads = [results[p]['edge_spread'] * 10000 for p in pairs]
true_spreads = [results[p]['true_spread'] * 10000 for p in pairs]

x = np.arange(len(pairs))
width = 0.35

ax1.bar(x - width/2, edge_spreads, width, label='EDGE Estimate', alpha=0.8)
ax1.bar(x + width/2, true_spreads, width, label='True Spread', alpha=0.8)
ax1.set_xlabel('Cryptocurrency Pairs')
ax1.set_ylabel('Spread (bps)')
ax1.set_title('EDGE vs True Spreads by Crypto Pair')
ax1.set_xticks(x)
ax1.set_xticklabels([p.replace('/USDT', '') for p in pairs], rotation=45)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Volatility vs Spread relationship
volatilities = [results[p]['volatility'] * 100 for p in pairs]
ax2.scatter(volatilities, edge_spreads, s=100, alpha=0.7, c='blue')
for i, pair in enumerate(pairs):
    ax2.annotate(pair.replace('/USDT', ''), (volatilities[i], edge_spreads[i]), 
                xytext=(5, 5), textcoords='offset points', fontsize=9)

ax2.set_xlabel('Volatility (%)')
ax2.set_ylabel('EDGE Spread (bps)')
ax2.set_title('Volatility vs Spread Relationship')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\n✅ Analyzed {len(crypto_pairs)} cryptocurrency pairs")
print(f"📈 Spread range: {min(edge_spreads):.2f} - {max(edge_spreads):.2f} bps")
print(f"📊 Average accuracy: {sum(abs(results[p]['edge_spread'] - results[p]['true_spread'])/results[p]['true_spread'] for p in pairs)/len(pairs)*100:.1f}%")

## 7. Advanced Visualizations {#visualizations}

Create comprehensive visualizations for spread analysis and monitoring.

In [None]:
print("=== Advanced Visualizations ===")

# Based on examples/animated_spread_monitor.py and examples/crypto_spread_comparison.py
# Create comprehensive dataset for visualization
np.random.seed(123)
viz_periods = 200
viz_base_price = 100.0

# Simulate time-varying volatility (realistic market conditions)
time_varying_vol = 0.01 + 0.02 * np.sin(np.linspace(0, 4*np.pi, viz_periods)) + \
                   0.005 * np.random.randn(viz_periods)
time_varying_vol = np.abs(time_varying_vol)  # Ensure positive volatility

# Generate price series with time-varying volatility
viz_returns = np.random.normal(0, time_varying_vol)
viz_prices = viz_base_price * np.exp(np.cumsum(viz_returns))

# Time-varying spreads (spreads widen during volatile periods)
base_spread = 0.001
spread_multiplier = 1 + 2 * time_varying_vol / np.mean(time_varying_vol)
time_varying_spread = base_spread * spread_multiplier

# Generate OHLC with time-varying spreads
viz_open = viz_prices + np.random.uniform(-1, 1, viz_periods) * time_varying_spread * viz_prices
viz_high = viz_prices + np.random.uniform(0, 2, viz_periods) * time_varying_spread * viz_prices
viz_low = viz_prices - np.random.uniform(0, 2, viz_periods) * time_varying_spread * viz_prices
viz_close = viz_prices + np.random.uniform(-1, 1, viz_periods) * time_varying_spread * viz_prices

# Ensure OHLC consistency
viz_high = np.maximum(viz_high, np.maximum(viz_open, viz_close))
viz_low = np.minimum(viz_low, np.minimum(viz_open, viz_close))

# Create DataFrame
viz_df = pd.DataFrame({
    'open': viz_open,
    'high': viz_high,
    'low': viz_low,
    'close': viz_close
})
viz_df.index = pd.date_range(start='2024-01-01', periods=len(viz_df), freq='1H')

# Calculate various spread estimates
viz_rolling_10 = edge_rolling(viz_df, window=10)
viz_rolling_30 = edge_rolling(viz_df, window=30)
viz_expanding = edge_expanding(viz_df, min_periods=10)

# Create comprehensive visualization dashboard
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(4, 2, height_ratios=[2, 1, 1, 1], hspace=0.3, wspace=0.3)

# 1. Price and Volume-style visualization
ax1 = fig.add_subplot(gs[0, :])
ax1.plot(viz_df.index, viz_df['close'], label='Close Price', color='black', linewidth=1.5)
ax1.fill_between(viz_df.index, viz_df['low'], viz_df['high'], alpha=0.3, color='gray', label='Daily Range')
ax1.set_title('Price Evolution with Intraday Ranges', fontsize=14, fontweight='bold')
ax1.set_ylabel('Price ($)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Spread estimates comparison
ax2 = fig.add_subplot(gs[1, :])
ax2.plot(viz_df.index, viz_rolling_10*10000, label='10-period Rolling', alpha=0.8, linewidth=2)
ax2.plot(viz_df.index, viz_rolling_30*10000, label='30-period Rolling', alpha=0.8, linewidth=2)
ax2.plot(viz_df.index, viz_expanding*10000, label='Expanding Window', alpha=0.7, linewidth=1)
ax2.plot(viz_df.index, time_varying_spread*10000, label='True Spread', color='red', linestyle='--', alpha=0.7)
ax2.set_title('Multi-Window Spread Estimates Comparison')
ax2.set_ylabel('Spread (bps)')
ax2.legend()
ax2.grid(True, alpha=0.3)

# 3. Volatility analysis
ax3 = fig.add_subplot(gs[2, 0])
rolling_vol = viz_df['close'].pct_change().rolling(window=24).std() * np.sqrt(24) * 100
ax3.plot(viz_df.index, rolling_vol, color='orange', linewidth=2)
ax3.set_title('Rolling 24H Volatility (%)')
ax3.set_ylabel('Volatility (%)')
ax3.grid(True, alpha=0.3)

# 4. Spread distribution
ax4 = fig.add_subplot(gs[2, 1])
spread_data = viz_rolling_30.dropna() * 10000
ax4.hist(spread_data, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
ax4.axvline(spread_data.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {spread_data.mean():.2f} bps')
ax4.set_title('Spread Distribution (30-period)')
ax4.set_xlabel('Spread (bps)')
ax4.set_ylabel('Frequency')
ax4.legend()
ax4.grid(True, alpha=0.3)

# 5. Spread vs Volatility scatter
ax5 = fig.add_subplot(gs[3, 0])
vol_aligned = rolling_vol.reindex(viz_rolling_30.index, method='nearest')
spread_aligned = viz_rolling_30 * 10000
valid_mask = ~(vol_aligned.isna() | spread_aligned.isna())
ax5.scatter(vol_aligned[valid_mask], spread_aligned[valid_mask], alpha=0.6, c='purple')
ax5.set_xlabel('Volatility (%)')
ax5.set_ylabel('Spread (bps)')
ax5.set_title('Volatility vs Spread Relationship')
ax5.grid(True, alpha=0.3)

# 6. Estimation accuracy over time
ax6 = fig.add_subplot(gs[3, 1])
true_spread_aligned = pd.Series(time_varying_spread*10000, index=viz_df.index)
accuracy = (1 - abs(viz_rolling_30*10000 - true_spread_aligned.reindex(viz_rolling_30.index, method='nearest')) / 
           true_spread_aligned.reindex(viz_rolling_30.index, method='nearest')) * 100
ax6.plot(viz_rolling_30.index, accuracy, color='green', linewidth=2)
ax6.axhline(y=90, color='red', linestyle='--', alpha=0.7, label='90% Accuracy')
ax6.set_ylabel('Accuracy (%)')
ax6.set_xlabel('Time')
ax6.set_title('Estimation Accuracy Over Time')
ax6.legend()
ax6.grid(True, alpha=0.3)

plt.suptitle('QuantJourney EDGE Estimator - Comprehensive Analysis Dashboard', 
             fontsize=16, fontweight='bold', y=0.98)
plt.tight_layout()
plt.show()

# Summary statistics
print(f"\n📊 Visualization Analysis Summary:")
print(f"Data period: {viz_df.index[0].date()} to {viz_df.index[-1].date()}")
print(f"Total periods: {len(viz_df)}")
print(f"Price range: ${viz_df['low'].min():.2f} - ${viz_df['high'].max():.2f}")
print(f"Average spread (30-period): {spread_data.mean():.2f} ± {spread_data.std():.2f} bps")
print(f"Volatility range: {rolling_vol.min():.2f}% - {rolling_vol.max():.2f}%")
print(f"Average accuracy: {accuracy.mean():.1f}%")

print("\n✅ Advanced visualization dashboard complete")

## 8. Performance Monitoring {#performance}

Analyze the computational performance and efficiency of different EDGE estimator variants.

In [None]:
import time
print("=== Performance Monitoring ===")

# Performance testing with different data sizes
test_sizes = [100, 500, 1000, 5000, 10000]
performance_results = {}

for size in test_sizes:
    print(f"\n🔄 Testing with {size:,} observations...")
    
    # Generate test data
    np.random.seed(42)
    test_returns = np.random.normal(0, 0.02, size)
    test_prices = 100 * np.exp(np.cumsum(test_returns))
    
    spread_width = 0.001
    test_open = test_prices + np.random.uniform(-spread_width, spread_width, size) * test_prices
    test_high = test_prices + np.random.uniform(0, spread_width*2, size) * test_prices
    test_low = test_prices - np.random.uniform(0, spread_width*2, size) * test_prices
    test_close = test_prices + np.random.uniform(-spread_width, spread_width, size) * test_prices
    
    # Ensure OHLC consistency
    test_high = np.maximum(test_high, np.maximum(test_open, test_close))
    test_low = np.minimum(test_low, np.minimum(test_open, test_close))
    
    test_df = pd.DataFrame({
        'open': test_open,
        'high': test_high,
        'low': test_low,
        'close': test_close
    })
    
    # Benchmark different functions
    benchmarks = {}
    
    # 1. Single EDGE estimate
    start_time = time.time()
    single_result = edge(test_open, test_high, test_low, test_close)
    benchmarks['edge'] = time.time() - start_time
    
    # 2. Rolling window (if data size allows)
    if size >= 50:
        window_size = min(50, size // 4)
        start_time = time.time()
        rolling_result = edge_rolling(test_df, window=window_size)
        benchmarks['edge_rolling'] = time.time() - start_time
    
    # 3. Expanding window
    if size >= 20:
        start_time = time.time()
        expanding_result = edge_expanding(test_df, min_periods=10)
        benchmarks['edge_expanding'] = time.time() - start_time
    
    performance_results[size] = benchmarks
    
    # Display results
    for func_name, exec_time in benchmarks.items():
        throughput = size / exec_time if exec_time > 0 else float('inf')
        print(f"  {func_name:<15}: {exec_time*1000:8.2f} ms ({throughput:10,.0f} obs/sec)")

# Performance visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Execution time scaling
for func_name in ['edge', 'edge_rolling', 'edge_expanding']:
    sizes = []
    times = []
    
    for size, results in performance_results.items():
        if func_name in results:
            sizes.append(size)
            times.append(results[func_name] * 1000)  # Convert to ms
    
    if sizes:
        ax1.loglog(sizes, times, 'o-', label=func_name, linewidth=2, markersize=6)

ax1.set_xlabel('Number of Observations')
ax1.set_ylabel('Execution Time (ms)')
ax1.set_title('Performance Scaling Analysis')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Throughput comparison
func_names = ['edge', 'edge_rolling', 'edge_expanding']
throughputs = []
labels = []

# Use largest successful test size for throughput comparison
largest_size = max(performance_results.keys())
for func_name in func_names:
    if func_name in performance_results[largest_size]:
        exec_time = performance_results[largest_size][func_name]
        throughput = largest_size / exec_time if exec_time > 0 else 0
        throughputs.append(throughput)
        labels.append(func_name)

bars = ax2.bar(labels, throughputs, color=['blue', 'orange', 'green'], alpha=0.7)
ax2.set_ylabel('Throughput (obs/sec)')
ax2.set_title(f'Throughput Comparison ({largest_size:,} observations)')
ax2.grid(True, alpha=0.3)

# Add value labels on bars
for bar, value in zip(bars, throughputs):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + height*0.01,
             f'{value:,.0f}', ha='center', va='bottom')

plt.tight_layout()
plt.show()

# Performance summary
print(f"\n📈 Performance Summary:")
print(f"Largest test size: {largest_size:,} observations")
print(f"Best throughput: {max(throughputs):,.0f} obs/sec ({labels[throughputs.index(max(throughputs))]})")

# Memory efficiency note
print(f"\n💾 Memory Efficiency Notes:")
print(f"- EDGE estimator uses vectorized NumPy operations")
print(f"- Numba JIT compilation provides near-C performance")
print(f"- Memory usage scales linearly with input size")
print(f"- Rolling/expanding windows reuse calculations efficiently")

print("\n✅ Performance monitoring complete")

## 9. Risk Management Applications {#risk-management}

Demonstrate practical risk management applications using spread estimates for liquidity monitoring and alert systems.

In [None]:
print("=== Risk Management Applications ===")

# Based on examples/liquidity_risk_monitor.py and examples/threshold_alert_monitor.py
# Create a realistic market scenario with stress periods
np.random.seed(456)
risk_periods = 500

# Simulate market stress events
normal_vol = 0.015
stress_vol = 0.045
stress_periods = [100, 150, 200, 250, 300, 350]  # Stress event periods

volatility_regime = np.full(risk_periods, normal_vol)
for stress_start in stress_periods:
    stress_end = min(stress_start + 20, risk_periods)  # 20-period stress
    volatility_regime[stress_start:stress_end] = stress_vol

# Generate price series
risk_returns = np.random.normal(0, volatility_regime)
risk_prices = 100 * np.exp(np.cumsum(risk_returns))

# Spreads widen during stress (realistic market behavior)
base_spread = 0.0008  # 8 bps normal spread
stress_multiplier = 1 + 3 * (volatility_regime - normal_vol) / normal_vol
dynamic_spreads = base_spread * stress_multiplier

# Generate OHLC data
risk_open = risk_prices + np.random.uniform(-1, 1, risk_periods) * dynamic_spreads * risk_prices
risk_high = risk_prices + np.random.uniform(0, 2, risk_periods) * dynamic_spreads * risk_prices
risk_low = risk_prices - np.random.uniform(0, 2, risk_periods) * dynamic_spreads * risk_prices
risk_close = risk_prices + np.random.uniform(-1, 1, risk_periods) * dynamic_spreads * risk_prices

# Ensure OHLC consistency
risk_high = np.maximum(risk_high, np.maximum(risk_open, risk_close))
risk_low = np.minimum(risk_low, np.minimum(risk_open, risk_close))

# Create DataFrame with timestamps
risk_df = pd.DataFrame({
    'open': risk_open,
    'high': risk_high,
    'low': risk_low,
    'close': risk_close
})
risk_df.index = pd.date_range(start='2024-01-01', periods=len(risk_df), freq='1H')

# Calculate spread estimates for risk monitoring
risk_rolling_20 = edge_rolling(risk_df, window=20)
risk_rolling_50 = edge_rolling(risk_df, window=50)

# Define risk thresholds
normal_threshold = np.percentile(risk_rolling_20.dropna() * 10000, 75)  # 75th percentile
warning_threshold = np.percentile(risk_rolling_20.dropna() * 10000, 90)  # 90th percentile  
critical_threshold = np.percentile(risk_rolling_20.dropna() * 10000, 95)  # 95th percentile

print(f"\n🎯 Risk Thresholds (based on historical data):")
print(f"Normal: < {normal_threshold:.2f} bps")
print(f"Warning: {normal_threshold:.2f} - {warning_threshold:.2f} bps")
print(f"Critical: > {critical_threshold:.2f} bps")

# Risk alert system
current_spreads = risk_rolling_20 * 10000
risk_alerts = pd.DataFrame({
    'spread_bps': current_spreads,
    'risk_level': 'Normal'
}, index=current_spreads.index)

# Classify risk levels
risk_alerts.loc[current_spreads > normal_threshold, 'risk_level'] = 'Warning'
risk_alerts.loc[current_spreads > warning_threshold, 'risk_level'] = 'High'
risk_alerts.loc[current_spreads > critical_threshold, 'risk_level'] = 'Critical'

# Count alerts by type
alert_counts = risk_alerts['risk_level'].value_counts()
print(f"\n🚨 Alert Summary:")
for level, count in alert_counts.items():
    percentage = count / len(risk_alerts) * 100
    print(f"{level}: {count} periods ({percentage:.1f}%)")

# Identify stress periods
critical_periods = risk_alerts[risk_alerts['risk_level'] == 'Critical']
if not critical_periods.empty:
    print(f"\n⚠️  Critical stress periods detected:")
    for idx, row in critical_periods.head(5).iterrows():
        print(f"  {idx}: {row['spread_bps']:.2f} bps")

# Risk visualization dashboard
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 10))

# 1. Price and spread evolution
ax1_twin = ax1.twinx()
ax1.plot(risk_df.index, risk_df['close'], color='black', linewidth=1, label='Price')
ax1_twin.plot(risk_df.index, current_spreads, color='red', alpha=0.7, linewidth=1.5, label='Spread (20-period)')
ax1_twin.axhline(y=critical_threshold, color='red', linestyle='--', alpha=0.8, label='Critical Threshold')
ax1_twin.axhline(y=warning_threshold, color='orange', linestyle='--', alpha=0.8, label='Warning Threshold')
ax1.set_ylabel('Price ($)', color='black')
ax1_twin.set_ylabel('Spread (bps)', color='red')
ax1.set_title('Price and Spread Evolution with Risk Thresholds')
ax1.legend(loc='upper left')
ax1_twin.legend(loc='upper right')
ax1.grid(True, alpha=0.3)

# 2. Risk level heatmap
risk_colors = {'Normal': 'green', 'Warning': 'yellow', 'High': 'orange', 'Critical': 'red'}
risk_numeric = risk_alerts['risk_level'].map({'Normal': 0, 'Warning': 1, 'High': 2, 'Critical': 3})

# Create heatmap-style visualization
for i, (idx, level) in enumerate(risk_alerts['risk_level'].items()):
    color = risk_colors[level]
    ax2.bar(idx, 1, width=pd.Timedelta(hours=1), color=color, alpha=0.7, edgecolor='none')

ax2.set_title('Risk Level Timeline')
ax2.set_ylabel('Risk Alert')
ax2.set_ylim(0, 1)

# Create legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=color, label=level) for level, color in risk_colors.items()]
ax2.legend(handles=legend_elements, loc='upper right')

# 3. Spread distribution by risk level
for level, color in risk_colors.items():
    level_data = risk_alerts[risk_alerts['risk_level'] == level]['spread_bps'].dropna()
    if not level_data.empty:
        ax3.hist(level_data, bins=20, alpha=0.6, label=level, color=color, density=True)

ax3.set_xlabel('Spread (bps)')
ax3.set_ylabel('Density')
ax3.set_title('Spread Distribution by Risk Level')
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. Risk metrics over time
rolling_mean = current_spreads.rolling(window=50).mean()
rolling_std = current_spreads.rolling(window=50).std()
upper_band = rolling_mean + 2*rolling_std
lower_band = rolling_mean - 2*rolling_std

ax4.plot(risk_df.index, current_spreads, alpha=0.6, color='blue', linewidth=1, label='Current Spread')
ax4.plot(risk_df.index, rolling_mean, color='black', linewidth=2, label='50-period Average')
ax4.fill_between(risk_df.index, lower_band, upper_band, alpha=0.3, color='gray', label='±2σ Band')
ax4.set_ylabel('Spread (bps)')
ax4.set_xlabel('Time')
ax4.set_title('Statistical Risk Monitoring')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Risk metrics calculation
print(f"\n📊 Risk Metrics Summary:")
print(f"Average spread: {current_spreads.mean():.2f} bps")
print(f"Spread volatility: {current_spreads.std():.2f} bps")
print(f"Maximum spread: {current_spreads.max():.2f} bps")
print(f"99th percentile: {np.percentile(current_spreads.dropna(), 99):.2f} bps")

# Liquidity stress testing
stress_impact = (current_spreads.max() - current_spreads.mean()) / current_spreads.mean() * 100
print(f"\n🔥 Stress Test Results:")
print(f"Maximum stress impact: {stress_impact:.1f}% increase in spreads")
print(f"Stress periods identified: {len(critical_periods)} critical events")
print(f"Average stress duration: {len(critical_periods) / len(stress_periods) * 20:.1f} periods per event")

# Risk recommendations
print(f"\n💡 Risk Management Recommendations:")
if stress_impact > 200:
    print(f"⚠️  HIGH RISK: Spreads show extreme stress sensitivity (+{stress_impact:.0f}%)")
    print(f"   - Implement dynamic position sizing based on spread levels")
    print(f"   - Consider spread-adjusted VaR calculations")
elif stress_impact > 100:
    print(f"⚡ MODERATE RISK: Spreads widen significantly during stress (+{stress_impact:.0f}%)")
    print(f"   - Monitor rolling spread metrics for early warning")
    print(f"   - Adjust execution strategies during high-spread periods")
else:
    print(f"✅ LOW RISK: Spreads remain relatively stable (+{stress_impact:.0f}%)")
    print(f"   - Maintain current risk monitoring framework")

print("\n✅ Risk management analysis complete")

## Conclusion

This comprehensive guide demonstrated the full capabilities of the QuantJourney Bid-Ask Spread Estimator library. Key takeaways:

### 🎯 Core Functionality
- **EDGE Estimator**: Accurate spread estimation from OHLC data using academic methodology
- **Rolling Windows**: Time-varying spread analysis for dynamic market conditions
- **Expanding Windows**: Convergence analysis and estimate refinement over time

### 📊 Real-World Applications
- **Multi-Asset Analysis**: Compare spreads across stocks and cryptocurrencies
- **Performance Monitoring**: High-throughput analysis with Numba optimization
- **Risk Management**: Liquidity monitoring and stress testing capabilities

### 🚀 Advanced Features
- **Real-Time Integration**: WebSocket support for live market data
- **Comprehensive Visualizations**: Professional charts and dashboards
- **Risk Alert Systems**: Automated threshold monitoring and notifications

### 📈 Performance Benefits
- **Numba JIT Compilation**: Near-C performance for core calculations
- **Vectorized Operations**: Efficient handling of large datasets
- **Memory Optimization**: Linear scaling with dataset size

For more information, visit:
- **Documentation**: [GitHub Repository](https://github.com/QuantJourneyOrg/quantjourney-bidask)
- **PyPI Package**: [quantjourney-bidask](https://pypi.org/project/quantjourney-bidask/)
- **QuantJourney**: [Advanced Quantitative Finance Framework](https://quantjourney.substack.com/)

---
*Created with QuantJourney framework - Advanced quantitative finance tools and insights*