# Notebook 04: Real-World Finance Applications

## Putting It All Together

In this notebook, we'll build complete, practical finance applications:
1. **Parallel Backtesting Engine** - Test trading strategies across parameter grids
2. **Bootstrap Confidence Intervals** - Statistical inference for Sharpe ratios
3. **Rolling Correlation Analysis** - Multi-asset correlation computation

---

In [None]:
import numpy as np
import pandas as pd
import time
import os
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import matplotlib.pyplot as plt
from itertools import product

n_cores = os.cpu_count()
print(f"Available CPU cores: {n_cores}")

---

## Application 1: Parallel Backtesting Engine

### The Problem

You want to test a moving average crossover strategy with different parameter combinations:
- Short window: 5-50 days
- Long window: 20-200 days

That's potentially hundreds of combinations to test!

In [None]:
# Generate realistic stock price data
def generate_stock_data(n_days=2520, seed=42):  # 10 years of daily data
    """Generate synthetic stock price data with realistic properties."""
    np.random.seed(seed)
    
    # Parameters for a typical stock
    mu = 0.08 / 252  # Daily expected return (8% annual)
    sigma = 0.20 / np.sqrt(252)  # Daily volatility (20% annual)
    
    # Generate returns with some autocorrelation and fat tails
    returns = np.random.standard_t(df=5, size=n_days) * sigma + mu
    
    # Convert to prices
    prices = 100 * np.cumprod(1 + returns)
    
    dates = pd.date_range('2014-01-01', periods=n_days, freq='D')
    
    return pd.DataFrame({
        'date': dates,
        'close': prices
    }).set_index('date')

# Generate data
stock_data = generate_stock_data()
print(f"Generated {len(stock_data)} days of price data")
print(f"Date range: {stock_data.index[0].date()} to {stock_data.index[-1].date()}")
print(f"Price range: ${stock_data['close'].min():.2f} to ${stock_data['close'].max():.2f}")

In [None]:
# Plot the price data
plt.figure(figsize=(12, 4))
plt.plot(stock_data.index, stock_data['close'], linewidth=0.8)
plt.title('Synthetic Stock Price Data (10 Years)')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
def backtest_ma_crossover(args):
    """
    Backtest a moving average crossover strategy.
    
    Strategy:
    - Go long when short MA crosses above long MA
    - Go flat when short MA crosses below long MA
    
    Parameters:
    -----------
    args : tuple of (prices, short_window, long_window)
    
    Returns:
    --------
    dict with strategy performance metrics
    """
    prices, short_window, long_window = args
    
    # Skip invalid combinations
    if short_window >= long_window:
        return None
    
    # Calculate moving averages
    short_ma = prices.rolling(window=short_window).mean()
    long_ma = prices.rolling(window=long_window).mean()
    
    # Generate signals: 1 = long, 0 = flat
    signal = (short_ma > long_ma).astype(int)
    signal = signal.shift(1)  # Trade on next day's open
    
    # Calculate returns
    daily_returns = prices.pct_change()
    strategy_returns = signal * daily_returns
    
    # Remove NaN values
    strategy_returns = strategy_returns.dropna()
    
    if len(strategy_returns) == 0:
        return None
    
    # Calculate metrics
    total_return = (1 + strategy_returns).prod() - 1
    annual_return = (1 + total_return) ** (252 / len(strategy_returns)) - 1
    volatility = strategy_returns.std() * np.sqrt(252)
    sharpe = annual_return / volatility if volatility > 0 else 0
    
    # Maximum drawdown
    cumulative = (1 + strategy_returns).cumprod()
    rolling_max = cumulative.cummax()
    drawdown = (cumulative - rolling_max) / rolling_max
    max_drawdown = drawdown.min()
    
    # Number of trades
    trades = signal.diff().abs().sum() / 2
    
    return {
        'short_window': short_window,
        'long_window': long_window,
        'total_return': total_return * 100,
        'annual_return': annual_return * 100,
        'volatility': volatility * 100,
        'sharpe_ratio': sharpe,
        'max_drawdown': max_drawdown * 100,
        'n_trades': trades
    }

In [None]:
# Define parameter grid
short_windows = range(5, 51, 5)    # 5, 10, 15, ..., 50
long_windows = range(20, 201, 10)  # 20, 30, 40, ..., 200

# Create all combinations
param_combinations = list(product(short_windows, long_windows))
print(f"Testing {len(param_combinations)} parameter combinations")

In [None]:
# Prepare arguments for parallel execution
prices = stock_data['close']
backtest_args = [(prices, short, long) for short, long in param_combinations]

# Sequential backtesting
print("Sequential backtesting:")
start = time.time()
sequential_results = [backtest_ma_crossover(args) for args in backtest_args]
sequential_results = [r for r in sequential_results if r is not None]
seq_time = time.time() - start
print(f"  Time: {seq_time:.2f}s")
print(f"  Valid combinations: {len(sequential_results)}")

In [None]:
# Parallel backtesting
print("\nParallel backtesting:")
start = time.time()
with ProcessPoolExecutor(max_workers=n_cores) as executor:
    parallel_results = list(executor.map(backtest_ma_crossover, backtest_args))
parallel_results = [r for r in parallel_results if r is not None]
par_time = time.time() - start
print(f"  Time: {par_time:.2f}s")
print(f"  Speedup: {seq_time/par_time:.2f}x")

In [None]:
# Analyze results
results_df = pd.DataFrame(parallel_results)

print("\nTop 10 Strategies by Sharpe Ratio:")
print(results_df.nlargest(10, 'sharpe_ratio')[[
    'short_window', 'long_window', 'annual_return', 'sharpe_ratio', 'max_drawdown'
]].to_string(index=False))

In [None]:
# Create heatmap of Sharpe ratios
pivot = results_df.pivot(index='short_window', columns='long_window', values='sharpe_ratio')

plt.figure(figsize=(14, 8))
plt.imshow(pivot.values, aspect='auto', cmap='RdYlGn', origin='lower')
plt.colorbar(label='Sharpe Ratio')

# Set tick labels
plt.xticks(range(len(pivot.columns))[::2], pivot.columns[::2])
plt.yticks(range(len(pivot.index)), pivot.index)

plt.xlabel('Long Window (days)')
plt.ylabel('Short Window (days)')
plt.title('Strategy Performance Heatmap (Sharpe Ratio)\nMoving Average Crossover Strategy')

# Mark best strategy
best = results_df.loc[results_df['sharpe_ratio'].idxmax()]
best_short_idx = list(pivot.index).index(best['short_window'])
best_long_idx = list(pivot.columns).index(best['long_window'])
plt.plot(best_long_idx, best_short_idx, 'k*', markersize=15, label=f"Best: {int(best['short_window'])}/{int(best['long_window'])}")
plt.legend()

plt.tight_layout()
plt.show()

print(f"\nBest Strategy: Short={int(best['short_window'])}, Long={int(best['long_window'])}")
print(f"  Annual Return: {best['annual_return']:.1f}%")
print(f"  Sharpe Ratio: {best['sharpe_ratio']:.2f}")
print(f"  Max Drawdown: {best['max_drawdown']:.1f}%")

---

## Application 2: Bootstrap Confidence Intervals

### The Problem

You've calculated a Sharpe ratio of 1.5. But how confident are you in this estimate?

Bootstrap resampling gives us confidence intervals without assuming normality.

In [None]:
# Generate sample returns data
np.random.seed(42)
n_days = 756  # 3 years of daily returns

# Simulate returns with realistic properties
daily_returns = np.random.standard_t(df=5, size=n_days) * 0.015 + 0.0003

# Calculate actual Sharpe ratio
actual_sharpe = (np.mean(daily_returns) * 252) / (np.std(daily_returns) * np.sqrt(252))
print(f"Sample size: {n_days} days")
print(f"Observed Sharpe Ratio: {actual_sharpe:.3f}")

In [None]:
def calculate_sharpe(returns):
    """Calculate annualized Sharpe ratio."""
    return (np.mean(returns) * 252) / (np.std(returns) * np.sqrt(252))

def bootstrap_sharpe_batch(args):
    """
    Generate bootstrap samples and calculate Sharpe ratios.
    
    Parameters:
    -----------
    args : tuple of (returns, n_samples, seed)
    
    Returns:
    --------
    array of bootstrap Sharpe ratios
    """
    returns, n_samples, seed = args
    np.random.seed(seed)
    
    n = len(returns)
    bootstrap_sharpes = np.zeros(n_samples)
    
    for i in range(n_samples):
        # Resample with replacement
        sample_idx = np.random.randint(0, n, size=n)
        sample_returns = returns[sample_idx]
        bootstrap_sharpes[i] = calculate_sharpe(sample_returns)
    
    return bootstrap_sharpes

In [None]:
# Bootstrap parameters
total_samples = 10_000
n_batches = 8
samples_per_batch = total_samples // n_batches

print(f"Total bootstrap samples: {total_samples:,}")
print(f"Batches: {n_batches}")

# Prepare batch arguments
batch_args = [
    (daily_returns, samples_per_batch, seed)
    for seed in range(n_batches)
]

In [None]:
# Sequential bootstrap
print("Sequential bootstrap:")
start = time.time()
seq_sharpes = [bootstrap_sharpe_batch(args) for args in batch_args]
seq_sharpes = np.concatenate(seq_sharpes)
seq_time = time.time() - start
print(f"  Time: {seq_time:.2f}s")

In [None]:
# Parallel bootstrap
print("\nParallel bootstrap:")
start = time.time()
with ProcessPoolExecutor(max_workers=n_cores) as executor:
    par_sharpes = list(executor.map(bootstrap_sharpe_batch, batch_args))
par_sharpes = np.concatenate(par_sharpes)
par_time = time.time() - start
print(f"  Time: {par_time:.2f}s")
print(f"  Speedup: {seq_time/par_time:.2f}x")

In [None]:
# Calculate confidence intervals
ci_90 = np.percentile(par_sharpes, [5, 95])
ci_95 = np.percentile(par_sharpes, [2.5, 97.5])
ci_99 = np.percentile(par_sharpes, [0.5, 99.5])

print(f"\n{'='*50}")
print(f"BOOTSTRAP RESULTS")
print(f"{'='*50}")
print(f"Observed Sharpe Ratio: {actual_sharpe:.3f}")
print(f"Bootstrap Mean: {np.mean(par_sharpes):.3f}")
print(f"Bootstrap Std: {np.std(par_sharpes):.3f}")
print(f"\nConfidence Intervals:")
print(f"  90% CI: [{ci_90[0]:.3f}, {ci_90[1]:.3f}]")
print(f"  95% CI: [{ci_95[0]:.3f}, {ci_95[1]:.3f}]")
print(f"  99% CI: [{ci_99[0]:.3f}, {ci_99[1]:.3f}]")
print(f"\nProbability Sharpe > 0: {(par_sharpes > 0).mean()*100:.1f}%")
print(f"Probability Sharpe > 1: {(par_sharpes > 1).mean()*100:.1f}%")

In [None]:
# Visualize the bootstrap distribution
fig, ax = plt.subplots(figsize=(10, 5))

ax.hist(par_sharpes, bins=50, density=True, alpha=0.7, color='steelblue', edgecolor='white')

# Add vertical lines for CIs and observed value
ax.axvline(actual_sharpe, color='red', linewidth=2, linestyle='-', label=f'Observed: {actual_sharpe:.3f}')
ax.axvline(ci_95[0], color='orange', linewidth=2, linestyle='--', label=f'95% CI: [{ci_95[0]:.2f}, {ci_95[1]:.2f}]')
ax.axvline(ci_95[1], color='orange', linewidth=2, linestyle='--')
ax.axvline(0, color='black', linewidth=1, linestyle=':')

ax.set_xlabel('Sharpe Ratio')
ax.set_ylabel('Density')
ax.set_title('Bootstrap Distribution of Sharpe Ratio')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## Application 3: Rolling Correlation Analysis

### The Problem

Calculate rolling correlations for a multi-asset portfolio. With many assets, this becomes computationally expensive.

In [None]:
# Generate multi-asset return data
np.random.seed(42)
n_days = 2520  # 10 years
n_assets = 20

# Asset names
assets = [f'Asset_{i+1:02d}' for i in range(n_assets)]

# Generate correlated returns
# Create a random correlation matrix
base_corr = np.random.uniform(0.2, 0.6, (n_assets, n_assets))
base_corr = (base_corr + base_corr.T) / 2  # Make symmetric
np.fill_diagonal(base_corr, 1.0)

# Ensure positive semi-definite
eigenvalues, eigenvectors = np.linalg.eigh(base_corr)
eigenvalues = np.maximum(eigenvalues, 0.01)
corr_matrix = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T

# Cholesky decomposition
L = np.linalg.cholesky(corr_matrix)

# Generate correlated returns
volatilities = np.random.uniform(0.15, 0.35, n_assets) / np.sqrt(252)
means = np.random.uniform(0.05, 0.15, n_assets) / 252

uncorrelated = np.random.standard_normal((n_days, n_assets))
correlated = uncorrelated @ L.T
returns = correlated * volatilities + means

returns_df = pd.DataFrame(returns, columns=assets)

print(f"Generated returns for {n_assets} assets over {n_days} days")
print(f"\nShape: {returns_df.shape}")

In [None]:
def calculate_rolling_corr_for_window(args):
    """
    Calculate correlation matrix for a specific time window.
    
    Parameters:
    -----------
    args : tuple of (returns_array, start_idx, window_size)
    
    Returns:
    --------
    tuple of (start_idx, correlation_matrix)
    """
    returns_array, start_idx, window_size = args
    
    window_returns = returns_array[start_idx:start_idx + window_size]
    corr_matrix = np.corrcoef(window_returns.T)
    
    return start_idx, corr_matrix

In [None]:
# Parameters for rolling correlation
window_size = 63  # Quarterly rolling window
step_size = 21    # Calculate every month

# Create list of window start indices
start_indices = list(range(0, n_days - window_size, step_size))
print(f"Number of windows: {len(start_indices)}")
print(f"Correlation matrices to compute: {len(start_indices)}")

# Prepare arguments
returns_array = returns_df.values
corr_args = [(returns_array, idx, window_size) for idx in start_indices]

In [None]:
# Sequential calculation
print("Sequential calculation:")
start = time.time()
seq_results = [calculate_rolling_corr_for_window(args) for args in corr_args]
seq_time = time.time() - start
print(f"  Time: {seq_time:.3f}s")

In [None]:
# Parallel calculation
print("\nParallel calculation:")
start = time.time()
with ProcessPoolExecutor(max_workers=n_cores) as executor:
    par_results = list(executor.map(calculate_rolling_corr_for_window, corr_args))
par_time = time.time() - start
print(f"  Time: {par_time:.3f}s")
print(f"  Speedup: {seq_time/par_time:.2f}x")

In [None]:
# Extract average correlation over time
avg_correlations = []
for idx, corr_matrix in par_results:
    # Get upper triangle (excluding diagonal)
    upper_tri = corr_matrix[np.triu_indices(n_assets, k=1)]
    avg_correlations.append({
        'window_start': idx,
        'avg_correlation': np.mean(upper_tri),
        'min_correlation': np.min(upper_tri),
        'max_correlation': np.max(upper_tri)
    })

corr_df = pd.DataFrame(avg_correlations)

In [None]:
# Visualize correlation over time
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Average correlation over time
ax1 = axes[0]
ax1.plot(corr_df['window_start'], corr_df['avg_correlation'], 'b-', linewidth=1.5, label='Average')
ax1.fill_between(corr_df['window_start'], corr_df['min_correlation'], corr_df['max_correlation'], 
                  alpha=0.3, color='blue', label='Min-Max Range')
ax1.set_xlabel('Trading Day')
ax1.set_ylabel('Correlation')
ax1.set_title('Rolling Pairwise Correlations Over Time')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Final correlation heatmap
ax2 = axes[1]
final_corr = par_results[-1][1]
im = ax2.imshow(final_corr, cmap='RdBu_r', vmin=-1, vmax=1, aspect='auto')
plt.colorbar(im, ax=ax2, label='Correlation')
ax2.set_xticks(range(0, n_assets, 2))
ax2.set_yticks(range(0, n_assets, 2))
ax2.set_xticklabels([assets[i] for i in range(0, n_assets, 2)], rotation=45, ha='right')
ax2.set_yticklabels([assets[i] for i in range(0, n_assets, 2)])
ax2.set_title('Final Correlation Matrix')

plt.tight_layout()
plt.show()

---

## Summary

### What We Built

1. **Parallel Backtesting Engine**
   - Tested 190 strategy parameter combinations
   - Found optimal moving average parameters
   - Visualized performance across parameter space

2. **Bootstrap Confidence Intervals**
   - Generated 10,000 bootstrap samples
   - Calculated 95% confidence interval for Sharpe ratio
   - Properly quantified uncertainty in our estimate

3. **Rolling Correlation Analysis**
   - Computed correlation matrices for 100+ time windows
   - Tracked average correlation over time
   - Visualized correlation dynamics

### Key Patterns Used

| Application | Pattern | Why |
|-------------|---------|-----|
| Backtesting | `ProcessPoolExecutor.map()` | CPU-bound, many independent tasks |
| Bootstrap | Batch processing with seeds | Control randomness, reduce overhead |
| Correlations | Map over time windows | Each window is independent |