# 🎯 Hockey Prediction System - Strategy Optimization Analysis

**Deep dive into parameter optimization and strategy comparison**

---

## 📊 Overview

This notebook provides comprehensive strategy optimization analysis:
- ✅ **Parameter Sensitivity Analysis** - ROI impact of each parameter
- 🔥 **Interactive Heatmaps** - Multi-dimensional parameter relationships
- 🏆 **Strategy Comparison** - Head-to-head performance analysis
- ⚡ **A/B Testing Framework** - Statistical significance testing
- 🎯 **Optimization Insights** - Actionable parameter recommendations

**Location:** `notebooks/analysis/strategy_optimization.ipynb`

---

## 🔧 Setup & Configuration

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import json
import glob
import os
from datetime import datetime, timedelta
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.figure_factory as ff

# Statistics and analysis
from scipy import stats
from scipy.stats import pearsonr, spearmanr
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from itertools import combinations

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

# Plotly theme
import plotly.io as pio
pio.templates.default = "plotly_white"

print("📚 Libraries imported successfully!")
print(f"📁 Working directory: {os.getcwd()}")
print(f"📅 Analysis timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 📂 Enhanced Data Discovery & Loading

In [None]:
# Define paths
RESULTS_DIR = Path("../../models/experiments")
CHARTS_EXPORT_DIR = Path("../../models/experiments/charts")
CHARTS_EXPORT_DIR.mkdir(exist_ok=True)

print(f"📁 Results directory: {RESULTS_DIR.absolute()}")
print(f"📊 Charts export directory: {CHARTS_EXPORT_DIR.absolute()}")

# Enhanced optimization file discovery
def discover_optimization_files():
    """Discover and categorize optimization result files"""
    
    files = {
        'detailed_csv': [],
        'best_csv': [],
        'results_json': [],
        'analysis_json': []
    }
    
    patterns = {
        'detailed_csv': ['*optimization*detailed*.csv', '*parameter*detailed*.csv'],
        'best_csv': ['*optimization*best*.csv', '*parameter*best*.csv'], 
        'results_json': ['*optimization*results*.json', '*parameter*results*.json'],
        'analysis_json': ['*performance_analysis*.json']
    }
    
    for file_type, pattern_list in patterns.items():
        for pattern in pattern_list:
            found_files = list(RESULTS_DIR.glob(pattern))
            files[file_type].extend(found_files)
        
        # Remove duplicates and sort by modification time
        files[file_type] = list(set(files[file_type]))
        files[file_type] = sorted(files[file_type], key=lambda x: x.stat().st_mtime, reverse=True)
    
    return files

# Safe loading functions
def safe_load_optimization_csv(file_path):
    """Safely load optimization CSV with enhanced error handling"""
    try:
        df = pd.read_csv(file_path)
        
        # Convert numeric columns
        numeric_cols = ['roi', 'max_drawdown', 'sharpe_ratio', 'total_bets', 'win_rate', 
                       'edge_threshold', 'min_odds', 'stake_size']
        
        for col in numeric_cols:
            if col in df.columns:
                df[col] = pd.to_numeric(df[col], errors='coerce')
        
        # Clean invalid data
        df = df.dropna(subset=['roi'])  # ROI is essential
        
        return df
    except Exception as e:
        print(f"⚠️ Error loading {file_path.name}: {e}")
        return None

def safe_load_json(file_path):
    """Safely load JSON with error handling"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            return json.load(f)
    except Exception as e:
        print(f"⚠️ Error loading {file_path.name}: {e}")
        return None

# Discover optimization files
optimization_files = discover_optimization_files()

print("\n🔍 Available optimization files:")
for file_type, file_list in optimization_files.items():
    print(f"  📄 {file_type}: {len(file_list)} files")
    for i, file_path in enumerate(file_list[:2]):
        size_mb = file_path.stat().st_size / (1024*1024)
        mod_time = datetime.fromtimestamp(file_path.stat().st_mtime)
        print(f"    [{i+1}] {file_path.name} ({size_mb:.1f}MB, {mod_time.strftime('%Y-%m-%d %H:%M')})")
    if len(file_list) > 2:
        print(f"    ... and {len(file_list) - 2} more")

In [None]:
# Enhanced optimization data loading with intelligent source selection
def load_optimization_data():
    """Load optimization data from best available source"""
    
    print("📊 Loading optimization data with priority: CSV detailed > JSON results")
    
    # Priority 1: CSV detailed (fastest and most complete)
    if optimization_files['detailed_csv']:
        csv_path = optimization_files['detailed_csv'][0]
        print(f"🎯 Loading CSV detailed data: {csv_path.name}")
        
        df = safe_load_optimization_csv(csv_path)
        if df is not None and len(df) > 0:
            print(f"✅ Loaded {len(df)} optimization records from CSV detailed")
            return df, "CSV_detailed", csv_path.name
    
    # Priority 2: JSON results (fallback)
    if optimization_files['results_json']:
        json_path = optimization_files['results_json'][0]
        print(f"🔄 Fallback to JSON results: {json_path.name}")
        
        data = safe_load_json(json_path)
        if data and data.get('optimization_results'):
            try:
                # Convert optimization results to DataFrame
                opt_records = []
                for result in data['optimization_results']:
                    if 'error' not in result and result.get('performance', {}).get('roi') is not None:
                        record = {
                            **result.get('parameters', {}),
                            **result.get('performance', {}),
                            **result.get('statistics', {})
                        }
                        opt_records.append(record)
                
                if opt_records:
                    df = pd.DataFrame(opt_records)
                    
                    # Convert numeric columns
                    numeric_cols = ['roi', 'max_drawdown', 'sharpe_ratio', 'total_bets', 'win_rate',
                                   'edge_threshold', 'min_odds', 'stake_size']
                    for col in numeric_cols:
                        if col in df.columns:
                            df[col] = pd.to_numeric(df[col], errors='coerce')
                    
                    print(f"✅ Loaded {len(df)} optimization records from JSON")
                    return df, "JSON_converted", json_path.name
                    
            except Exception as e:
                print(f"⚠️ Error converting JSON to DataFrame: {e}")
    
    print("❌ No valid optimization data available")
    return None, None, None

# Load optimization data
print("\n📥 Loading optimization data...")
opt_df, opt_source, opt_filename = load_optimization_data()

# Data quality assessment
if opt_df is not None:
    print("\n📊 Optimization Data Quality Assessment:")
    print("="*50)
    
    print(f"✅ Source: {opt_source}")
    print(f"✅ File: {opt_filename}")
    print(f"✅ Records loaded: {len(opt_df):,}")
    print(f"✅ Columns available: {len(opt_df.columns)}")
    
    # Check key columns
    key_columns = ['roi', 'edge_threshold', 'min_odds', 'stake_method', 'ev_method']
    missing_cols = [col for col in key_columns if col not in opt_df.columns]
    present_cols = [col for col in key_columns if col in opt_df.columns]
    
    print(f"✅ Key columns present: {', '.join(present_cols)}")
    if missing_cols:
        print(f"⚠️ Missing columns: {', '.join(missing_cols)}")
    
    # Data completeness
    if 'roi' in opt_df.columns:
        valid_roi = opt_df['roi'].notna().sum()
        roi_completeness = valid_roi / len(opt_df)
        print(f"✅ ROI data completeness: {roi_completeness:.1%} ({valid_roi:,}/{len(opt_df):,})")
        
        # ROI distribution summary
        roi_stats = opt_df['roi'].describe()
        profitable_pct = (opt_df['roi'] > 0).mean()
        print(f"✅ Profitable strategies: {profitable_pct:.1%}")
        print(f"✅ ROI range: {roi_stats['min']:.2%} to {roi_stats['max']:.2%}")
        print(f"✅ Mean ROI: {roi_stats['mean']:.2%}")
    
    print(f"\n🎯 Ready for strategy optimization analysis!")
    
else:
    print("❌ Cannot proceed with analysis - no optimization data available")
    print("\nPlease ensure you have optimization results in one of these formats:")
    print("  - CSV: optimization_*_detailed_*.csv")
    print("  - JSON: optimization_*_results_*.json")
    print(f"  - Location: {RESULTS_DIR.absolute()}")

## 🎯 Parameter Sensitivity Analysis

In [None]:
# Comprehensive parameter sensitivity analysis
if opt_df is not None and 'roi' in opt_df.columns:
    
    print("🎯 Parameter Sensitivity Analysis")
    print("="*60)
    
    # Detect available parameters for analysis
    numeric_params = []
    categorical_params = []
    
    # Standard parameter mappings
    param_candidates = {
        'edge_threshold': ['edge_threshold', 'edge_thresh', 'threshold'],
        'min_odds': ['min_odds', 'minimum_odds', 'odds_min'],
        'stake_size': ['stake_size', 'stake', 'bet_size'],
        'stake_method': ['stake_method', 'staking_method', 'stake_type'],
        'ev_method': ['ev_method', 'ev_calculation', 'expected_value_method']
    }
    
    # Find available parameters
    available_params = {}
    for param_name, possible_names in param_candidates.items():
        for col_name in possible_names:
            if col_name in opt_df.columns:
                available_params[param_name] = col_name
                if opt_df[col_name].dtype == 'object':
                    categorical_params.append(param_name)
                else:
                    numeric_params.append(param_name)
                break
    
    print(f"📊 Found parameters:")
    print(f"   Numeric: {', '.join(numeric_params)}")
    print(f"   Categorical: {', '.join(categorical_params)}")
    
    if len(available_params) == 0:
        print("⚠️ No standard parameters found. Available columns:")
        print(f"   {', '.join(opt_df.columns[:15])}...")
    else:
        # Correlation analysis for numeric parameters
        correlations = {}
        
        for param_name in numeric_params:
            col_name = available_params[param_name]
            param_data = opt_df[[col_name, 'roi']].dropna()
            
            if len(param_data) > 10:  # Minimum data for meaningful correlation
                try:
                    # Pearson correlation
                    pearson_r, pearson_p = pearsonr(param_data[col_name], param_data['roi'])
                    
                    # Spearman correlation (rank-based, more robust)
                    spearman_r, spearman_p = spearmanr(param_data[col_name], param_data['roi'])
                    
                    correlations[param_name] = {
                        'pearson': {'correlation': pearson_r, 'p_value': pearson_p},
                        'spearman': {'correlation': spearman_r, 'p_value': spearman_p},
                        'data_points': len(param_data),
                        'column_name': col_name
                    }
                    
                except Exception as e:
                    print(f"⚠️ Error calculating correlation for {param_name}: {e}")
        
        # Display correlation results
        if correlations:
            print(f"\n📈 Parameter-ROI Correlations:")
            print("-" * 70)
            print(f"{'Parameter':<15} {'Pearson':<10} {'P-value':<10} {'Spearman':<10} {'P-value':<10} {'N':<8}")
            print("-" * 70)
            
            # Sort by absolute Pearson correlation
            sorted_correlations = sorted(correlations.items(), 
                                       key=lambda x: abs(x[1]['pearson']['correlation']), 
                                       reverse=True)
            
            for param_name, corr_data in sorted_correlations:
                pearson_r = corr_data['pearson']['correlation']
                pearson_p = corr_data['pearson']['p_value']
                spearman_r = corr_data['spearman']['correlation']
                spearman_p = corr_data['spearman']['p_value']
                n_points = corr_data['data_points']
                
                print(f"{param_name:<15} {pearson_r:<10.3f} {pearson_p:<10.3f} {spearman_r:<10.3f} {spearman_p:<10.3f} {n_points:<8}")
                
                # Interpretation
                if abs(pearson_r) > 0.3 and pearson_p < 0.05:
                    direction = "positively" if pearson_r > 0 else "negatively"
                    strength = "strongly" if abs(pearson_r) > 0.5 else "moderately"
                    print(f"   ✅ {param_name} is {strength} {direction} correlated with ROI")
                elif abs(pearson_r) > 0.1 and pearson_p < 0.1:
                    direction = "positively" if pearson_r > 0 else "negatively"
                    print(f"   📊 {param_name} shows weak {direction} correlation with ROI")
                else:
                    print(f"   ⚪ {param_name} shows no significant correlation with ROI")
        
        # Categorical parameter analysis
        if categorical_params:
            print(f"\n📊 Categorical Parameter Analysis:")
            
            for param_name in categorical_params:
                col_name = available_params[param_name]
                param_analysis = opt_df.groupby(col_name)['roi'].agg([
                    'count', 'mean', 'std', 'min', 'max'
                ]).round(4)
                
                print(f"\n🔍 {param_name} ({col_name}):")
                print(param_analysis.to_string())
                
                # Find best performing category
                best_category = param_analysis['mean'].idxmax()
                best_roi = param_analysis.loc[best_category, 'mean']
                best_count = param_analysis.loc[best_category, 'count']
                
                print(f"   🏆 Best performing: {best_category} (ROI: {best_roi:.2%}, N: {best_count})")
                
                # Statistical test (ANOVA) if multiple categories
                if len(param_analysis) > 1:
                    try:
                        groups = [group['roi'].values for name, group in opt_df.groupby(col_name) if len(group) > 2]
                        if len(groups) > 1:
                            f_stat, p_value = stats.f_oneway(*groups)
                            print(f"   📈 ANOVA F-statistic: {f_stat:.3f}, p-value: {p_value:.3f}")
                            if p_value < 0.05:
                                print(f"   ✅ Significant difference between {param_name} categories")
                            else:
                                print(f"   ⚪ No significant difference between {param_name} categories")
                    except Exception as e:
                        print(f"   ⚠️ ANOVA test failed: {e}")
else:
    print("❌ Cannot perform parameter sensitivity analysis - no optimization data or ROI column")

## 🔥 Interactive Parameter Heatmaps

In [None]:
# Create interactive parameter heatmaps
if opt_df is not None and len(numeric_params) >= 2:
    
    print("🔥 Creating Parameter Heatmaps")
    print("="*50)
    
    # Create heatmaps for all pairs of numeric parameters
    param_pairs = list(combinations(numeric_params, 2))
    
    print(f"📊 Generating {len(param_pairs)} parameter combination heatmaps...")
    
    for i, (param1, param2) in enumerate(param_pairs, 1):
        
        col1 = available_params[param1]
        col2 = available_params[param2]
        
        print(f"\n📈 [{i}/{len(param_pairs)}] Creating heatmap: {param1} vs {param2}")
        
        try:
            # Prepare data
            heatmap_data = opt_df[[col1, col2, 'roi']].dropna()
            
            if len(heatmap_data) < 10:
                print(f"   ⚠️ Insufficient data for {param1} vs {param2} heatmap (only {len(heatmap_data)} points)")
                continue
            
            # Create bins for both parameters
            n_bins = min(10, int(np.sqrt(len(heatmap_data))))
            
            # Bin the data
            try:
                heatmap_data[f'{param1}_bin'] = pd.cut(heatmap_data[col1], bins=n_bins, precision=3)
                heatmap_data[f'{param2}_bin'] = pd.cut(heatmap_data[col2], bins=n_bins, precision=3)
            except ValueError:
                # Fallback to quantile-based binning
                heatmap_data[f'{param1}_bin'] = pd.qcut(heatmap_data[col1], q=n_bins, duplicates='drop', precision=3)
                heatmap_data[f'{param2}_bin'] = pd.qcut(heatmap_data[col2], q=n_bins, duplicates='drop', precision=3)
            
            # Create pivot table
            pivot_table = heatmap_data.groupby([f'{param1}_bin', f'{param2}_bin'])['roi'].agg([
                'mean', 'count'
            ]).reset_index()
            
            # Filter out bins with too few data points
            pivot_table = pivot_table[pivot_table['count'] >= 3]
            
            if len(pivot_table) == 0:
                print(f"   ⚠️ No sufficient data bins for {param1} vs {param2} heatmap")
                continue
            
            # Create pivot for heatmap
            pivot_for_heatmap = pivot_table.pivot(index=f'{param1}_bin', 
                                                 columns=f'{param2}_bin', 
                                                 values='mean')
            
            # Create interactive heatmap
            fig_heatmap = px.imshow(
                pivot_for_heatmap.values,
                x=[str(col) for col in pivot_for_heatmap.columns],
                y=[str(idx) for idx in pivot_for_heatmap.index],
                color_continuous_scale='RdYlGn',
                aspect='auto',
                title=f"🔥 ROI Heatmap: {param1.replace('_', ' ').title()} vs {param2.replace('_', ' ').title()}",
                labels={'color': 'ROI', 'x': param2.replace('_', ' ').title(), 'y': param1.replace('_', ' ').title()}
            )
            
            # Add text annotations for ROI values
            for i in range(len(pivot_for_heatmap.index)):
                for j in range(len(pivot_for_heatmap.columns)):
                    value = pivot_for_heatmap.iloc[i, j]
                    if not pd.isna(value):
                        fig_heatmap.add_annotation(
                            x=j, y=i,
                            text=f"{value:.1%}",
                            showarrow=False,
                            font=dict(color="white" if abs(value) > pivot_for_heatmap.abs().quantile(0.7).max() else "black", size=10)
                        )
            
            fig_heatmap.update_layout(
                height=600,
                xaxis_title=param2.replace('_', ' ').title(),
                yaxis_title=param1.replace('_', ' ').title()
            )
            
            fig_heatmap.show()
            
            # Save heatmap
            heatmap_filename = f"heatmap_{param1}_vs_{param2}.html"
            fig_heatmap.write_html(CHARTS_EXPORT_DIR / heatmap_filename)
            print(f"   💾 Heatmap saved: {heatmap_filename}")
            
            # Find best performing combination
            best_combo_idx = pivot_table['mean'].idxmax()
            best_combo = pivot_table.loc[best_combo_idx]
            
            print(f"   🏆 Best combination:")
            print(f"      {param1}: {best_combo[f'{param1}_bin']}")
            print(f"      {param2}: {best_combo[f'{param2}_bin']}")
            print(f"      ROI: {best_combo['mean']:.2%} (N: {best_combo['count']})")
            
        except Exception as e:
            print(f"   ⚠️ Error creating heatmap for {param1} vs {param2}: {e}")
    
    print(f"\n✅ Parameter heatmap analysis completed!")
    
elif opt_df is not None:
    print("⚠️ Need at least 2 numeric parameters for heatmap analysis")
    print(f"Available numeric parameters: {', '.join(numeric_params) if numeric_params else 'None'}")
else:
    print("❌ Cannot create heatmaps - no optimization data available")

## 🏆 Strategy Comparison & Ranking

In [None]:
# Comprehensive strategy comparison
if opt_df is not None and 'roi' in opt_df.columns:
    
    print("🏆 Strategy Comparison & Ranking")
    print("="*60)
    
    # Define multiple ranking criteria
    ranking_criteria = {
        'roi': {'ascending': False, 'label': 'Highest ROI'},
        'sharpe_ratio': {'ascending': False, 'label': 'Best Risk-Adjusted Return'},
        'total_bets': {'ascending': False, 'label': 'Most Active Strategy'},
        'win_rate': {'ascending': False, 'label': 'Highest Win Rate'}
    }
    
    # Filter to only profitable strategies
    profitable_strategies = opt_df[opt_df['roi'] > 0].copy()
    
    if len(profitable_strategies) == 0:
        print("⚠️ No profitable strategies found - analyzing all strategies")
        strategy_pool = opt_df.copy()
        pool_description = "all strategies"
    else:
        strategy_pool = profitable_strategies.copy()
        pool_description = "profitable strategies"
        print(f"📊 Analyzing {len(strategy_pool)} profitable strategies out of {len(opt_df)} total")
    
    # Create comprehensive ranking analysis
    strategy_rankings = {}
    
    for metric, criteria in ranking_criteria.items():
        if metric in strategy_pool.columns:
            
            # Clean data for this metric
            metric_data = strategy_pool.dropna(subset=[metric])
            
            if len(metric_data) > 0:
                top_strategies = metric_data.nlargest(10, metric) if not criteria['ascending'] else metric_data.nsmallest(10, metric)
                
                strategy_rankings[metric] = {
                    'label': criteria['label'],
                    'top_strategies': top_strategies,
                    'metric_stats': metric_data[metric].describe(),
                    'data_points': len(metric_data)
                }
                
                print(f"\n🎯 {criteria['label']} (Top 5 {pool_description}):")
                print("-" * 80)
                
                # Display top strategies
                display_cols = [metric]
                for col in ['roi', 'max_drawdown', 'total_bets', 'win_rate', 'edge_threshold', 'min_odds']:
                    if col in top_strategies.columns and col != metric:
                        display_cols.append(col)
                
                top_5 = top_strategies.head(5)[display_cols]
                
                # Format for display
                display_df = top_5.copy()
                for col in ['roi', 'win_rate', 'max_drawdown']:
                    if col in display_df.columns:
                        display_df[col] = display_df[col].apply(lambda x: f"{x:.2%}" if pd.notna(x) else "N/A")
                
                for col in ['edge_threshold', 'min_odds']:
                    if col in display_df.columns:
                        display_df[col] = display_df[col].apply(lambda x: f"{x:.3f}" if pd.notna(x) else "N/A")
                
                print(display_df.to_string(index=False))
                
                # Best strategy insight
                best_strategy = top_strategies.iloc[0]
                best_value = best_strategy[metric]
                
                if metric == 'roi':
                    print(f"   🏆 Best {metric}: {best_value:.2%}")
                elif metric in ['win_rate', 'max_drawdown']:
                    print(f"   🏆 Best {metric}: {best_value:.2%}")
                else:
                    print(f"   🏆 Best {metric}: {best_value:.3f}")
                
                # Strategy parameters
                if 'edge_threshold' in best_strategy:
                    print(f"   📊 Parameters: edge={best_strategy.get('edge_threshold', 'N/A'):.3f}, odds={best_strategy.get('min_odds', 'N/A'):.2f}")
    
    # Cross-ranking analysis
    if len(strategy_rankings) > 1:
        print(f"\n🔄 Cross-Ranking Analysis:")
        print("-" * 50)
        
        # Find strategies that appear in multiple top 10 lists
        all_top_strategies = set()
        metric_appearances = {}
        
        for metric, ranking_data in strategy_rankings.items():
            top_indices = set(ranking_data['top_strategies'].index)
            all_top_strategies.update(top_indices)
            
            for idx in top_indices:
                if idx not in metric_appearances:
                    metric_appearances[idx] = []
                metric_appearances[idx].append(metric)
        
        # Find multi-metric winners
        multi_winners = {idx: metrics for idx, metrics in metric_appearances.items() if len(metrics) > 1}
        
        if multi_winners:
            print(f"🏅 Strategies appearing in multiple top-10 lists:")
            
            for idx, metrics in sorted(multi_winners.items(), key=lambda x: len(x[1]), reverse=True)[:5]:
                strategy = strategy_pool.loc[idx]
                metrics_str = ', '.join(metrics)
                print(f"   Strategy {idx}: Top-10 in {len(metrics)} metrics ({metrics_str})")
                print(f"      ROI: {strategy.get('roi', 0):.2%}, Total Bets: {strategy.get('total_bets', 0)}")
        else:
            print(f"   No strategies appear in multiple top-10 lists")
    
else:
    print("❌ Cannot perform strategy comparison - no optimization data or ROI column")

## ⚡ A/B Testing Framework

In [None]:
# A/B Testing Framework for strategy comparison
if opt_df is not None and len(categorical_params) > 0:
    
    print("⚡ A/B Testing Framework")
    print("="*50)
    
    # Test each categorical parameter
    for param_name in categorical_params:
        col_name = available_params[param_name]
        
        print(f"\n🧪 A/B Testing: {param_name} ({col_name})")
        print("-" * 60)
        
        # Get unique categories
        categories = opt_df[col_name].value_counts()
        
        if len(categories) < 2:
            print(f"   ⚠️ Only one category found for {param_name}, skipping A/B test")
            continue
        
        print(f"   📊 Categories found: {', '.join([f'{cat}({count})' for cat, count in categories.items()])}")
        
        # Perform pairwise comparisons for all category pairs
        category_list = list(categories.index)
        
        for i, cat_a in enumerate(category_list):
            for cat_b in category_list[i+1:]:
                
                # Get data for both categories
                group_a = opt_df[opt_df[col_name] == cat_a]['roi'].dropna()
                group_b = opt_df[opt_df[col_name] == cat_b]['roi'].dropna()
                
                if len(group_a) < 5 or len(group_b) < 5:
                    print(f"   ⚠️ Insufficient data for {cat_a} vs {cat_b} (need ≥5 samples each)")
                    continue
                
                print(f"\n   🔬 Test: {cat_a} vs {cat_b}")
                
                # Descriptive statistics
                stats_a = {
                    'mean': group_a.mean(),
                    'std': group_a.std(),
                    'median': group_a.median(),
                    'count': len(group_a)
                }
                
                stats_b = {
                    'mean': group_b.mean(),
                    'std': group_b.std(),
                    'median': group_b.median(),
                    'count': len(group_b)
                }
                
                print(f"      {cat_a:>15}: μ={stats_a['mean']:.3%}, σ={stats_a['std']:.3%}, median={stats_a['median']:.3%}, n={stats_a['count']}")
                print(f"      {cat_b:>15}: μ={stats_b['mean']:.3%}, σ={stats_b['std']:.3%}, median={stats_b['median']:.3%}, n={stats_b['count']}")
                
                # Effect size (Cohen's d)
                pooled_std = np.sqrt(((stats_a['count']-1)*stats_a['std']**2 + (stats_b['count']-1)*stats_b['std']**2) / (stats_a['count'] + stats_b['count'] - 2))
                cohens_d = (stats_a['mean'] - stats_b['mean']) / pooled_std if pooled_std > 0 else 0
                
                # Statistical tests
                try:
                    # T-test (parametric)
                    t_stat, t_p_value = stats.ttest_ind(group_a, group_b)
                    
                    # Mann-Whitney U test (non-parametric)
                    u_stat, u_p_value = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
                    
                    print(f"      T-test: t={t_stat:.3f}, p={t_p_value:.4f}")
                    print(f"      Mann-Whitney U: U={u_stat:.1f}, p={u_p_value:.4f}")
                    print(f"      Effect size (Cohen's d): {cohens_d:.3f}")
                    
                    # Interpretation
                    significant = min(t_p_value, u_p_value) < 0.05
                    practical = abs(cohens_d) > 0.2  # Small effect size threshold
                    
                    if significant and practical:
                        winner = cat_a if stats_a['mean'] > stats_b['mean'] else cat_b
                        diff_pct = abs(stats_a['mean'] - stats_b['mean'])
                        print(f"      ✅ SIGNIFICANT DIFFERENCE: {winner} is better by {diff_pct:.2%}")
                        
                        # Effect size interpretation
                        if abs(cohens_d) < 0.5:
                            print(f"         Small effect size")
                        elif abs(cohens_d) < 0.8:
                            print(f"         Medium effect size")
                        else:
                            print(f"         Large effect size")
                            
                    elif significant:
                        print(f"      📊 Statistically significant but small practical effect")
                    elif practical:
                        print(f"      📊 Practical difference but not statistically significant (need more data)")
                    else:
                        print(f"      ⚪ No significant difference between {cat_a} and {cat_b}")
                    
                except Exception as e:
                    print(f"      ⚠️ Statistical test failed: {e}")
        
        # Overall recommendation for this parameter
        print(f"\n   💡 {param_name} Recommendation:")
        overall_analysis = opt_df.groupby(col_name)['roi'].agg(['mean', 'count', 'std']).sort_values('mean', ascending=False)
        
        if len(overall_analysis) > 0:
            best_category = overall_analysis.index[0]
            best_roi = overall_analysis.loc[best_category, 'mean']
            best_count = overall_analysis.loc[best_category, 'count']
            
            print(f"      🏆 Recommended: {best_category}")
            print(f"      📊 Average ROI: {best_roi:.2%} (based on {best_count} strategies)")
            
            # Confidence assessment
            if best_count >= 20:
                print(f"      ✅ High confidence (≥20 data points)")
            elif best_count >= 10:
                print(f"      📊 Medium confidence (10-19 data points)")
            else:
                print(f"      ⚠️ Low confidence (<10 data points)")
    
    print(f"\n✅ A/B Testing framework analysis completed!")
    
else:
    print("❌ Cannot perform A/B testing - no optimization data or categorical parameters")
    if opt_df is not None:
        print(f"Available categorical parameters: {', '.join(categorical_params) if categorical_params else 'None'}")

## 🎯 Optimization Insights & Recommendations

In [None]:
# Generate comprehensive optimization insights and actionable recommendations
if opt_df is not None:
    
    print("🎯 OPTIMIZATION INSIGHTS & RECOMMENDATIONS")
    print("="*70)
    
    # Collect key insights
    insights = []
    recommendations = []
    
    # 1. Overall Performance Insights
    total_strategies = len(opt_df)
    profitable_count = (opt_df['roi'] > 0).sum()
    profitable_rate = profitable_count / total_strategies
    
    best_roi = opt_df['roi'].max()
    worst_roi = opt_df['roi'].min()
    mean_roi = opt_df['roi'].mean()
    
    print(f"\n📊 OVERALL PERFORMANCE SUMMARY:")
    print(f"   Strategies tested: {total_strategies:,}")
    print(f"   Profitable strategies: {profitable_count:,} ({profitable_rate:.1%})")
    print(f"   ROI range: {worst_roi:.2%} to {best_roi:.2%}")
    print(f"   Mean ROI: {mean_roi:.2%}")
    
    if profitable_rate > 0.3:
        insights.append("✅ Strong optimization results: >30% of strategies are profitable")
        recommendations.append("Proceed with top strategies for live testing")
    elif profitable_rate > 0.1:
        insights.append("📊 Moderate optimization results: 10-30% of strategies are profitable")
        recommendations.append("Focus on top 5% of strategies only")
    else:
        insights.append("🔴 Poor optimization results: <10% of strategies are profitable")
        recommendations.append("Re-evaluate model and parameter ranges")
    
    # 2. Parameter-Specific Insights
    if correlations:
        print(f"\n📈 PARAMETER IMPACT ANALYSIS:")
        
        # Most impactful parameters
        impact_ranking = sorted(correlations.items(), 
                              key=lambda x: abs(x[1]['pearson']['correlation']), 
                              reverse=True)
        
        for i, (param, corr_data) in enumerate(impact_ranking[:3], 1):
            correlation = corr_data['pearson']['correlation']
            p_value = corr_data['pearson']['p_value']
            
            print(f"   {i}. {param}: correlation = {correlation:.3f}, p = {p_value:.4f}")
            
            if abs(correlation) > 0.3 and p_value < 0.05:
                direction = "increase" if correlation > 0 else "decrease"
                insights.append(f"🎯 {param} strongly impacts ROI - {direction} to improve performance")
            elif abs(correlation) > 0.1:
                insights.append(f"📊 {param} has moderate impact on ROI")
    
    # 3. Optimal Parameter Ranges
    if len(profitable_strategies) > 0:
        print(f"\n🎯 OPTIMAL PARAMETER RANGES (from profitable strategies):")
        
        for param_name in numeric_params:
            col_name = available_params[param_name]
            
            if col_name in profitable_strategies.columns:
                param_data = profitable_strategies[col_name].dropna()
                
                if len(param_data) > 0:
                    q25 = param_data.quantile(0.25)
                    q75 = param_data.quantile(0.75)
                    median = param_data.median()
                    
                    print(f"   {param_name}: {q25:.3f} - {q75:.3f} (median: {median:.3f})")
                    recommendations.append(f"Set {param_name} between {q25:.3f} and {q75:.3f}")
    
    # 4. Strategy Method Recommendations
    if categorical_params:
        print(f"\n🔧 STRATEGY METHOD RECOMMENDATIONS:")
        
        for param_name in categorical_params:
            col_name = available_params[param_name]
            
            method_performance = opt_df.groupby(col_name)['roi'].agg(['mean', 'count']).sort_values('mean', ascending=False)
            
            if len(method_performance) > 0:
                best_method = method_performance.index[0]
                best_roi = method_performance.loc[best_method, 'mean']
                best_count = method_performance.loc[best_method, 'count']
                
                print(f"   {param_name}: {best_method} (ROI: {best_roi:.2%}, N: {best_count})")
                
                if best_count >= 10:
                    recommendations.append(f"Use {best_method} for {param_name}")
                else:
                    recommendations.append(f"Consider {best_method} for {param_name} (limited data)")
    
    # 5. Risk Assessment
    if 'max_drawdown' in opt_df.columns:
        drawdown_data = opt_df['max_drawdown'].dropna()
        
        if len(drawdown_data) > 0:
            avg_drawdown = drawdown_data.mean()
            max_drawdown = drawdown_data.max()
            
            print(f"\n⚠️ RISK ASSESSMENT:")
            print(f"   Average max drawdown: {avg_drawdown:.2%}")
            print(f"   Worst max drawdown: {max_drawdown:.2%}")
            
            if max_drawdown > 0.5:
                insights.append("🔴 High risk detected: some strategies have >50% drawdown")
                recommendations.append("Implement strict risk management (max 2% position size)")
            elif max_drawdown > 0.3:
                insights.append("⚠️ Moderate risk: some strategies have >30% drawdown")
                recommendations.append("Use conservative position sizing (max 5% risk per bet)")
            else:
                insights.append("✅ Acceptable risk levels: drawdowns under control")
    
    # 6. Implementation Priority
    if len(profitable_strategies) > 0:
        # Find strategies with best balance of ROI and sample size
        profitable_strategies['score'] = profitable_strategies['roi'] * np.log(profitable_strategies.get('total_bets', 100))
        top_implementation = profitable_strategies.nlargest(5, 'score')
        
        print(f"\n🚀 TOP 5 STRATEGIES FOR IMPLEMENTATION:")
        for i, (idx, strategy) in enumerate(top_implementation.iterrows(), 1):
            roi = strategy.get('roi', 0)
            bets = strategy.get('total_bets', 0)
            drawdown = strategy.get('max_drawdown', 0)
            
            print(f"   {i}. ROI: {roi:.2%}, Bets: {bets}, Drawdown: {drawdown:.2%}")
    
    # Summary of insights and recommendations
    print(f"\n💡 KEY INSIGHTS:")
    for i, insight in enumerate(insights, 1):
        print(f"   {i}. {insight}")
    
    print(f"\n🎯 ACTIONABLE RECOMMENDATIONS:")
    for i, recommendation in enumerate(recommendations, 1):
        print(f"   {i}. {recommendation}")
    
    # Generate final assessment
    print(f"\n🏁 FINAL ASSESSMENT:")
    
    if profitable_rate > 0.2 and best_roi > 0.1:
        assessment = "🟢 PROCEED WITH CONFIDENCE"
        action = "Ready for careful live implementation with top strategies"
    elif profitable_rate > 0.1 and best_roi > 0.05:
        assessment = "🟡 PROCEED WITH CAUTION"
        action = "Paper trade first, then implement with very conservative sizing"
    else:
        assessment = "🔴 DO NOT IMPLEMENT YET"
        action = "Further optimization and model improvement needed"
    
    print(f"   Status: {assessment}")
    print(f"   Action: {action}")
    
    print(f"\n📅 Analysis completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"📁 Charts exported to: {CHARTS_EXPORT_DIR.absolute()}")
    
else:
    print("❌ Cannot generate insights - no optimization data available")

## 📤 Export & Summary

In [None]:
# Create optimization summary dashboard
def create_optimization_dashboard():
    """Create comprehensive optimization dashboard"""
    
    if opt_df is None:
        print("❌ Cannot create dashboard - no optimization data")
        return None
    
    try:
        # Create 2x3 dashboard
        fig = make_subplots(
            rows=3, cols=2,
            subplot_titles=('ROI Distribution', 'Parameter Correlations',
                          'Top Strategies Performance', 'Risk vs Return',
                          'Strategy Method Comparison', 'Optimization Summary'),
            specs=[[{"type": "histogram"}, {"type": "bar"}],
                   [{"type": "scatter"}, {"type": "scatter"}],
                   [{"type": "bar"}, {"type": "table"}]]
        )
        
        # 1. ROI Distribution
        fig.add_trace(
            go.Histogram(x=opt_df['roi'], name='ROI Distribution', nbinsx=30, showlegend=False),
            row=1, col=1
        )
        
        # 2. Parameter Correlations (if available)
        if correlations:
            params = list(correlations.keys())
            corr_values = [correlations[p]['pearson']['correlation'] for p in params]
            
            fig.add_trace(
                go.Bar(x=params, y=corr_values, name='ROI Correlation', showlegend=False),
                row=1, col=2
            )
        
        # 3. Top Strategies Performance
        top_strategies = opt_df.nlargest(20, 'roi')
        fig.add_trace(
            go.Scatter(x=list(range(1, 21)), y=top_strategies['roi'], 
                     mode='markers+lines', name='Top 20 ROI', showlegend=False),
            row=2, col=1
        )
        
        # 4. Risk vs Return (if drawdown available)
        if 'max_drawdown' in opt_df.columns:
            risk_return_data = opt_df[opt_df['roi'] > 0].head(100)  # Top 100 profitable
            fig.add_trace(
                go.Scatter(x=risk_return_data['max_drawdown'], y=risk_return_data['roi'],
                         mode='markers', name='Risk vs Return', showlegend=False),
                row=2, col=2
            )
        
        # 5. Strategy Method Comparison (if categorical params available)
        if categorical_params and len(categorical_params) > 0:
            param = categorical_params[0]
            col_name = available_params[param]
            method_perf = opt_df.groupby(col_name)['roi'].mean().sort_values(ascending=False)
            
            fig.add_trace(
                go.Bar(x=list(method_perf.index), y=method_perf.values, 
                      name=f'{param} Performance', showlegend=False),
                row=3, col=1
            )
        
        # 6. Summary Table
        summary_data = [
            ['Total Strategies', f"{len(opt_df):,}"],
            ['Profitable', f"{(opt_df['roi'] > 0).sum():,}"],
            ['Profitability Rate', f"{(opt_df['roi'] > 0).mean():.1%}"],
            ['Best ROI', f"{opt_df['roi'].max():.2%}"],
            ['Mean ROI', f"{opt_df['roi'].mean():.2%}"],
            ['Std ROI', f"{opt_df['roi'].std():.2%}"]
        ]
        
        fig.add_trace(
            go.Table(
                header=dict(values=['Metric', 'Value'], font=dict(size=12)),
                cells=dict(values=[[row[0] for row in summary_data], 
                                  [row[1] for row in summary_data]], font=dict(size=10))
            ),
            row=3, col=2
        )
        
        fig.update_layout(
            height=1200,
            title_text="🎯 Hockey Prediction System - Strategy Optimization Dashboard",
            showlegend=False
        )
        
        # Update axis labels
        fig.update_xaxes(title_text="ROI", row=1, col=1)
        fig.update_yaxes(title_text="Count", row=1, col=1)
        
        if correlations:
            fig.update_xaxes(title_text="Parameters", row=1, col=2)
            fig.update_yaxes(title_text="Correlation with ROI", row=1, col=2)
        
        fig.update_xaxes(title_text="Strategy Rank", row=2, col=1)
        fig.update_yaxes(title_text="ROI", row=2, col=1)
        
        if 'max_drawdown' in opt_df.columns:
            fig.update_xaxes(title_text="Max Drawdown", row=2, col=2)
            fig.update_yaxes(title_text="ROI", row=2, col=2)
        
        # Save dashboard
        dashboard_path = CHARTS_EXPORT_DIR / "strategy_optimization_dashboard.html"
        fig.write_html(dashboard_path)
        
        return dashboard_path
        
    except Exception as e:
        print(f"⚠️ Error creating optimization dashboard: {e}")
        return None

# Create optimization dashboard
print("📊 Creating strategy optimization dashboard...")
dashboard_path = create_optimization_dashboard()

if dashboard_path:
    print(f"✅ Optimization dashboard created: {dashboard_path.name}")
    print(f"🌐 Open in browser: file://{dashboard_path.absolute()}")

# List all exported files
print("\n📋 Exported Files:")
chart_files = list(CHARTS_EXPORT_DIR.glob("*.html"))
optimization_files = [f for f in chart_files if 'optimization' in f.name.lower() or 'heatmap' in f.name.lower()]

for chart_file in sorted(optimization_files):
    size_kb = chart_file.stat().st_size / 1024
    print(f"  📊 {chart_file.name} ({size_kb:.1f}KB)")

print(f"\n🎯 Strategy optimization analysis completed!")
print(f"📁 {len(optimization_files)} charts exported to: {CHARTS_EXPORT_DIR.absolute()}")
print(f"\n📚 Analysis Summary:")
print(f"   ✅ Parameter sensitivity analysis")
print(f"   ✅ Interactive heatmaps for parameter combinations")
print(f"   ✅ Strategy comparison and ranking")
print(f"   ✅ A/B testing framework for categorical parameters")
print(f"   ✅ Comprehensive optimization insights and recommendations")
print(f"   ✅ Interactive optimization dashboard")

print(f"\n🚀 Next Steps:")
print(f"   1. Review optimization dashboard for key insights")
print(f"   2. Implement recommended parameter ranges")
print(f"   3. Run risk_assessment.ipynb for detailed risk analysis")
print(f"   4. Consider live testing with top-performing strategies")

---

## 🎯 Strategy Optimization Analysis Summary

This notebook provides comprehensive strategy optimization analysis with:

✅ **Parameter Sensitivity Analysis** - Correlation analysis and impact assessment  
✅ **Interactive Heatmaps** - Multi-dimensional parameter relationships visualization  
✅ **Strategy Comparison** - Multi-criteria ranking and performance analysis  
✅ **A/B Testing Framework** - Statistical significance testing for categorical parameters  
✅ **Optimization Insights** - Actionable parameter recommendations  
✅ **Comprehensive Dashboard** - Interactive overview of all optimization results  

## 📊 Key Features

- **Robust Data Loading** - Intelligent source selection (CSV preferred, JSON fallback)
- **Statistical Rigor** - Pearson/Spearman correlations, t-tests, Mann-Whitney U tests
- **Visual Analytics** - Interactive heatmaps, scatter plots, and comparison charts
- **Actionable Insights** - Clear recommendations based on statistical analysis
- **Risk Assessment** - Drawdown analysis and risk-adjusted performance metrics

## 📁 Installation

**Save as:** `notebooks/analysis/strategy_optimization.ipynb`

**Dependencies:** `pip install plotly pandas numpy scipy scikit-learn`

**Usage:** Run cells sequentially - notebook automatically detects optimization data

## 🔗 Related Notebooks

- **main_analysis.ipynb** - Core backtesting overview
- **risk_assessment.ipynb** - Detailed risk metrics and drawdown analysis
- **model_validation.ipynb** - Prediction accuracy and model performance

---

*Hockey Prediction System - Strategy Optimization Analysis*  
*Location: notebooks/analysis/strategy_optimization.ipynb*  
*Specialized analysis for parameter optimization and strategy comparison*