# Strategy Optimization and Parameter Tuning

This notebook demonstrates how to optimize trading strategy parameters using various optimization techniques.

## Table of Contents
1. [Setup and Data Loading](#setup)
2. [Grid Search Optimization](#grid-search)
3. [Bayesian Optimization](#bayesian)
4. [Walk-Forward Analysis](#walk-forward)
5. [Robustness Testing](#robustness)
6. [Results Analysis](#results)

## 1. Setup and Data Loading

In [None]:
# Import required libraries
import sys
import os
sys.path.append('../src')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from itertools import product
import warnings
warnings.filterwarnings('ignore')

# Optimization libraries
try:
    import optuna
    OPTUNA_AVAILABLE = True
except ImportError:
    OPTUNA_AVAILABLE = False
    print("Optuna not available. Install with: pip install optuna")

from sklearn.model_selection import ParameterGrid

# Import our custom modules
from data.data_loader import DataLoader
from backtesting.engine import BacktestEngine
from strategies.moving_average import MovingAverageCrossover, ExponentialMovingAverageCrossover
from utils.metrics import PerformanceAnalyzer
from config import config

# Set up plotting
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)

print("✓ All modules imported successfully")
print(f"✓ Optuna available: {OPTUNA_AVAILABLE}")

### Load and Prepare Data

In [None]:
# Load data for optimization
SYMBOL = 'AAPL'  # Focus on single asset for optimization
START_DATE = '2018-01-01'
END_DATE = '2023-12-31'

print(f"Loading data for {SYMBOL} from {START_DATE} to {END_DATE}...")

loader = DataLoader()
data = loader.fetch_stock_data([SYMBOL], START_DATE, END_DATE)
stock_data = data[SYMBOL]

print(f"✓ Loaded {len(stock_data)} records")
print(f"  Date range: {stock_data.index[0].strftime('%Y-%m-%d')} to {stock_data.index[-1].strftime('%Y-%m-%d')}")

# Split data for optimization and validation
split_date = '2021-01-01'
train_data = stock_data[stock_data.index < split_date]
test_data = stock_data[stock_data.index >= split_date]

print(f"\n📊 Data Split:")
print(f"  Training: {len(train_data)} records ({train_data.index[0].strftime('%Y-%m-%d')} to {train_data.index[-1].strftime('%Y-%m-%d')})")
print(f"  Testing: {len(test_data)} records ({test_data.index[0].strftime('%Y-%m-%d')} to {test_data.index[-1].strftime('%Y-%m-%d')})")

## 2. Grid Search Optimization

Systematic search through parameter combinations.

In [None]:
def optimize_ma_strategy_grid_search(data, param_grid, objective='sharpe_ratio'):
    """
    Optimize moving average strategy using grid search.
    """
    engine = BacktestEngine()
    results = []
    
    # Generate all parameter combinations
    param_combinations = list(ParameterGrid(param_grid))
    total_combinations = len(param_combinations)
    
    print(f"🔍 Testing {total_combinations} parameter combinations...")
    
    for i, params in enumerate(param_combinations):
        # Skip invalid combinations
        if params['short_window'] >= params['long_window']:
            continue
            
        try:
            # Create and test strategy
            strategy = MovingAverageCrossover(**params)
            result = engine.run_backtest(strategy, data)
            
            # Store results
            result_row = {
                'short_window': params['short_window'],
                'long_window': params['long_window'],
                'position_size_pct': params.get('position_size_pct', 0.1),
                'signal_threshold': params.get('signal_threshold', 0.01),
                'total_return': result['total_return'],
                'sharpe_ratio': result['sharpe_ratio'],
                'max_drawdown': result['max_drawdown'],
                'volatility': result['volatility'],
                'total_trades': result['total_trades'],
                'calmar_ratio': result['total_return'] / abs(result['max_drawdown']) if result['max_drawdown'] != 0 else 0
            }
            results.append(result_row)
            
        except Exception as e:
            print(f"Error with params {params}: {e}")
            continue
        
        # Progress update
        if (i + 1) % max(1, total_combinations // 10) == 0:
            progress = (i + 1) / total_combinations * 100
            print(f"  Progress: {progress:.1f}%")
    
    results_df = pd.DataFrame(results)
    
    if not results_df.empty:
        # Sort by objective
        results_df = results_df.sort_values(objective, ascending=False)
        print(f"\n✅ Grid search completed. Best {objective}: {results_df.iloc[0][objective]:.4f}")
    
    return results_df

# Define parameter grid
param_grid = {
    'short_window': [5, 10, 15, 20, 25, 30],
    'long_window': [30, 40, 50, 60, 70, 80],
    'position_size_pct': [0.05, 0.1, 0.15, 0.2],
    'signal_threshold': [0.005, 0.01, 0.015, 0.02]
}

# Run grid search optimization
grid_results = optimize_ma_strategy_grid_search(train_data, param_grid, 'sharpe_ratio')

# Display top results
if not grid_results.empty:
    print("\n🏆 Top 10 Parameter Combinations (by Sharpe Ratio):")
    top_results = grid_results.head(10)
    print(top_results[['short_window', 'long_window', 'position_size_pct', 'sharpe_ratio', 'total_return', 'max_drawdown']].to_string(index=False))

## 3. Out-of-Sample Testing

Test the optimized parameters on unseen data.

In [None]:
if not grid_results.empty:
    # Get best parameters from grid search
    best_params = grid_results.iloc[0]
    
    print(f"🎯 Testing Best Parameters on Out-of-Sample Data:")
    print(f"  Short Window: {best_params['short_window']}")
    print(f"  Long Window: {best_params['long_window']}")
    print(f"  Position Size: {best_params['position_size_pct']:.1%}")
    
    # Test on out-of-sample data
    strategy = MovingAverageCrossover(
        short_window=int(best_params['short_window']),
        long_window=int(best_params['long_window']),
        position_size_pct=best_params['position_size_pct'],
        signal_threshold=best_params['signal_threshold']
    )
    
    engine = BacktestEngine()
    oos_result = engine.run_backtest(strategy, test_data)
    
    print(f"\n📊 Out-of-Sample Results:")
    print(f"  Total Return: {oos_result['total_return']:.2%}")
    print(f"  Sharpe Ratio: {oos_result['sharpe_ratio']:.3f}")
    print(f"  Max Drawdown: {oos_result['max_drawdown']:.2%}")
    print(f"  Total Trades: {oos_result['total_trades']}")
    
    # Compare in-sample vs out-of-sample
    comparison_data = {
        'Metric': ['Total Return', 'Sharpe Ratio', 'Max Drawdown', 'Volatility'],
        'In-Sample': [f"{best_params['total_return']:.2%}", 
                     f"{best_params['sharpe_ratio']:.3f}",
                     f"{best_params['max_drawdown']:.2%}",
                     f"{best_params['volatility']:.2%}"],
        'Out-of-Sample': [f"{oos_result['total_return']:.2%}",
                         f"{oos_result['sharpe_ratio']:.3f}",
                         f"{oos_result['max_drawdown']:.2%}",
                         f"{oos_result['volatility']:.2%}"]
    }
    
    comparison_df = pd.DataFrame(comparison_data)
    print(f"\n📈 In-Sample vs Out-of-Sample Comparison:")
    print(comparison_df.to_string(index=False))
    
    # Plot portfolio evolution
    if 'portfolio_history' in oos_result:
        portfolio_df = oos_result['portfolio_history']
        
        plt.figure(figsize=(12, 6))
        plt.plot(portfolio_df['timestamp'], portfolio_df['total_value'], linewidth=2, label='Strategy')
        
        # Add buy and hold comparison
        initial_price = test_data['close'].iloc[0]
        buy_hold_values = (test_data['close'] / initial_price) * config.initial_capital
        plt.plot(test_data.index, buy_hold_values, linewidth=2, alpha=0.7, label='Buy & Hold')
        
        plt.title('Out-of-Sample Performance: Optimized Strategy vs Buy & Hold')
        plt.xlabel('Date')
        plt.ylabel('Portfolio Value ($)')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.show()
        
        # Calculate buy and hold metrics for comparison
        bh_return = (test_data['close'].iloc[-1] / test_data['close'].iloc[0]) - 1
        bh_returns = test_data['close'].pct_change().dropna()
        bh_sharpe = (bh_returns.mean() / bh_returns.std()) * np.sqrt(252) if bh_returns.std() > 0 else 0
        
        print(f"\n🆚 Strategy vs Buy & Hold:")
        print(f"  Strategy Return: {oos_result['total_return']:.2%}")
        print(f"  Buy & Hold Return: {bh_return:.2%}")
        print(f"  Strategy Sharpe: {oos_result['sharpe_ratio']:.3f}")
        print(f"  Buy & Hold Sharpe: {bh_sharpe:.3f}")
        
        outperformance = oos_result['total_return'] - bh_return
        print(f"  Outperformance: {outperformance:.2%}")

## 4. Parameter Sensitivity Analysis

Analyze how sensitive the strategy is to parameter changes.

In [None]:
def parameter_sensitivity_analysis(data, base_params, param_name, param_range):
    """
    Analyze sensitivity to a specific parameter.
    """
    engine = BacktestEngine()
    results = []
    
    for param_value in param_range:
        # Create modified parameters
        test_params = base_params.copy()
        test_params[param_name] = param_value
        
        # Skip invalid combinations
        if param_name in ['short_window', 'long_window']:
            if test_params['short_window'] >= test_params['long_window']:
                continue
        
        try:
            strategy = MovingAverageCrossover(**test_params)
            result = engine.run_backtest(strategy, data)
            
            results.append({
                param_name: param_value,
                'total_return': result['total_return'],
                'sharpe_ratio': result['sharpe_ratio'],
                'max_drawdown': result['max_drawdown'],
                'total_trades': result['total_trades']
            })
        except:
            continue
    
    return pd.DataFrame(results)

if not grid_results.empty:
    # Use best parameters as base
    base_params = {
        'short_window': int(best_params['short_window']),
        'long_window': int(best_params['long_window']),
        'position_size_pct': best_params['position_size_pct'],
        'signal_threshold': best_params['signal_threshold']
    }
    
    print("🔬 Running Parameter Sensitivity Analysis...")
    
    # Test sensitivity to different parameters
    sensitivity_tests = {
        'short_window': range(5, 35, 2),
        'long_window': range(30, 100, 5),
        'position_size_pct': np.arange(0.05, 0.35, 0.02),
        'signal_threshold': np.arange(0.001, 0.05, 0.003)
    }
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    axes = axes.flatten()
    
    for i, (param_name, param_range) in enumerate(sensitivity_tests.items()):
        if i >= len(axes):
            break
            
        sensitivity_results = parameter_sensitivity_analysis(train_data, base_params, param_name, param_range)
        
        if not sensitivity_results.empty:
            ax = axes[i]
            
            # Plot Sharpe ratio
            ax.plot(sensitivity_results[param_name], sensitivity_results['sharpe_ratio'], 'o-', linewidth=2, markersize=4)
            
            # Highlight optimal value
            optimal_value = base_params[param_name]
            ax.axvline(optimal_value, color='red', linestyle='--', alpha=0.7, label=f'Optimal: {optimal_value}')
            
            ax.set_xlabel(param_name.replace('_', ' ').title())
            ax.set_ylabel('Sharpe Ratio')
            ax.set_title(f'Sensitivity to {param_name.replace("_", " ").title()}')
            ax.grid(True, alpha=0.3)
            ax.legend()
    
    plt.tight_layout()
    plt.show()
    
    print("✅ Sensitivity analysis completed.")
    print("\n💡 Key Insights:")
    print("  - Look for parameters where performance is stable across a range")
    print("  - Avoid parameters that show high sensitivity (sharp peaks/valleys)")
    print("  - Consider using parameter ranges rather than fixed values")

## 5. Monte Carlo Analysis

Test strategy robustness using random parameter sampling.

In [None]:
def monte_carlo_optimization(data, n_trials=1000):
    """
    Monte Carlo parameter optimization.
    """
    engine = BacktestEngine()
    results = []
    
    print(f"🎲 Running Monte Carlo optimization with {n_trials} trials...")
    
    for i in range(n_trials):
        # Random parameter sampling
        short_window = np.random.randint(5, 30)
        long_window = np.random.randint(short_window + 10, 100)
        position_size_pct = np.random.uniform(0.05, 0.3)
        signal_threshold = np.random.uniform(0.001, 0.05)
        
        try:
            strategy = MovingAverageCrossover(
                short_window=short_window,
                long_window=long_window,
                position_size_pct=position_size_pct,
                signal_threshold=signal_threshold
            )
            
            result = engine.run_backtest(strategy, data)
            
            results.append({
                'trial': i,
                'short_window': short_window,
                'long_window': long_window,
                'position_size_pct': position_size_pct,
                'signal_threshold': signal_threshold,
                'total_return': result['total_return'],
                'sharpe_ratio': result['sharpe_ratio'],
                'max_drawdown': result['max_drawdown'],
                'total_trades': result['total_trades']
            })
            
        except:
            continue
        
        if (i + 1) % (n_trials // 10) == 0:
            progress = (i + 1) / n_trials * 100
            print(f"  Progress: {progress:.0f}%")
    
    return pd.DataFrame(results)

# Run Monte Carlo optimization
mc_results = monte_carlo_optimization(train_data, n_trials=500)

if not mc_results.empty:
    print(f"\n📊 Monte Carlo Results Summary:")
    print(f"  Trials completed: {len(mc_results)}")
    print(f"  Best Sharpe Ratio: {mc_results['sharpe_ratio'].max():.3f}")
    print(f"  Average Sharpe Ratio: {mc_results['sharpe_ratio'].mean():.3f}")
    print(f"  Std Dev Sharpe Ratio: {mc_results['sharpe_ratio'].std():.3f}")
    
    # Plot results distribution
    fig, axes = plt.subplots(2, 2, figsize=(16, 10))
    
    # Sharpe ratio distribution
    axes[0, 0].hist(mc_results['sharpe_ratio'], bins=50, alpha=0.7, edgecolor='black')
    axes[0, 0].axvline(mc_results['sharpe_ratio'].mean(), color='red', linestyle='--', label='Mean')
    axes[0, 0].set_title('Sharpe Ratio Distribution')
    axes[0, 0].set_xlabel('Sharpe Ratio')
    axes[0, 0].legend()
    
    # Return vs Risk scatter
    scatter = axes[0, 1].scatter(mc_results['max_drawdown'], mc_results['total_return'], 
                                c=mc_results['sharpe_ratio'], alpha=0.6, cmap='viridis')
    axes[0, 1].set_xlabel('Max Drawdown')
    axes[0, 1].set_ylabel('Total Return')
    axes[0, 1].set_title('Risk-Return Scatter')
    plt.colorbar(scatter, ax=axes[0, 1], label='Sharpe Ratio')
    
    # Parameter correlation with performance
    correlation_matrix = mc_results[['short_window', 'long_window', 'position_size_pct', 'signal_threshold', 'sharpe_ratio']].corr()
    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0, ax=axes[1, 0])
    axes[1, 0].set_title('Parameter Correlation Matrix')
    
    # Top percentile analysis
    top_10_pct = mc_results[mc_results['sharpe_ratio'] >= mc_results['sharpe_ratio'].quantile(0.9)]
    
    param_stats = top_10_pct[['short_window', 'long_window', 'position_size_pct', 'signal_threshold']].describe()
    
    axes[1, 1].axis('off')
    table_data = param_stats.round(3).T
    table = axes[1, 1].table(cellText=table_data.values, 
                            rowLabels=table_data.index, 
                            colLabels=table_data.columns,
                            cellLoc='center', loc='center')
    table.auto_set_font_size(False)
    table.set_fontsize(8)
    axes[1, 1].set_title('Top 10% Parameter Statistics')
    
    plt.tight_layout()
    plt.show()
    
    # Best Monte Carlo result
    best_mc = mc_results.loc[mc_results['sharpe_ratio'].idxmax()]
    print(f"\n🏆 Best Monte Carlo Parameters:")
    print(f"  Short Window: {best_mc['short_window']}")
    print(f"  Long Window: {best_mc['long_window']}")
    print(f"  Position Size: {best_mc['position_size_pct']:.2%}")
    print(f"  Signal Threshold: {best_mc['signal_threshold']:.4f}")
    print(f"  Sharpe Ratio: {best_mc['sharpe_ratio']:.3f}")

## 6. Final Recommendations

Summary of optimization results and recommendations.

In [None]:
print("📋 STRATEGY OPTIMIZATION SUMMARY")
print("=" * 50)

if not grid_results.empty:
    print(f"\n🔍 Grid Search Results:")
    print(f"  Best Sharpe Ratio: {best_params['sharpe_ratio']:.3f}")
    print(f"  Optimal Parameters: SW={best_params['short_window']}, LW={best_params['long_window']}, PS={best_params['position_size_pct']:.1%}")

if not mc_results.empty:
    print(f"\n🎲 Monte Carlo Results:")
    print(f"  Best Sharpe Ratio: {best_mc['sharpe_ratio']:.3f}")
    print(f"  Optimal Parameters: SW={best_mc['short_window']}, LW={best_mc['long_window']}, PS={best_mc['position_size_pct']:.1%}")

print(f"\n💡 Key Recommendations:")
print(f"  1. Use parameter ranges rather than fixed values for robustness")
print(f"  2. Always test on out-of-sample data to avoid overfitting")
print(f"  3. Consider transaction costs and slippage in optimization")
print(f"  4. Monitor parameter stability over different market conditions")
print(f"  5. Use multiple optimization objectives (Sharpe, Calmar, etc.)")

print(f"\n⚠️  Important Notes:")
print(f"  - Optimization results are specific to the training period")
print(f"  - Parameters may need adjustment for different market regimes")
print(f"  - Consider ensemble approaches using multiple parameter sets")
print(f"  - Regular re-optimization may be necessary for live trading")

## Conclusion

This notebook demonstrated comprehensive strategy optimization techniques:

1. **Grid Search**: Systematic exploration of parameter space
2. **Out-of-Sample Testing**: Validation on unseen data to avoid overfitting
3. **Sensitivity Analysis**: Understanding parameter robustness
4. **Monte Carlo Optimization**: Random sampling for broader exploration

### Key Takeaways:

- **Avoid Overfitting**: Always validate on out-of-sample data
- **Parameter Stability**: Choose parameters that are stable across ranges
- **Multiple Objectives**: Consider various performance metrics, not just returns
- **Market Regime Awareness**: Parameters may need adjustment for different market conditions
- **Transaction Costs**: Include realistic costs in optimization

### Next Steps:

1. **Walk-Forward Analysis**: Test parameter stability over time
2. **Multi-Asset Optimization**: Extend to portfolio-level optimization
3. **Machine Learning**: Use ML techniques for parameter selection
4. **Risk-Adjusted Optimization**: Incorporate risk constraints
5. **Live Trading**: Implement automated re-optimization for live strategies