# 🏒 Hockey Prediction System - Main Backtesting Analysis

**Core overview with robust data loading and key insights**

---

## 📊 Overview

This notebook provides comprehensive overview of backtesting results:
- ✅ **Robust Data Loading** - Multi-format support with error handling
- 🏆 **Strategy Performance** - ROI analysis and top strategies
- 🎯 **Parameter Insights** - Key parameter optimization findings
- 📈 **Performance Summary** - Time-series overview
- 💡 **Executive Summary** - Actionable recommendations

**Location:** `notebooks/analysis/main_analysis.ipynb`

---

## 🔧 Setup & Configuration

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import json
import glob
import os
from datetime import datetime, timedelta
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Visualization
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

# Statistics (for basic metrics)
from scipy import stats

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

# Plotly theme
import plotly.io as pio
pio.templates.default = "plotly_white"

print("📚 Libraries imported successfully!")
print(f"📁 Working directory: {os.getcwd()}")
print(f"📅 Analysis timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 📂 Enhanced Data Discovery & Loading

In [None]:
# Define paths
RESULTS_DIR = Path("../../models/experiments")
CHARTS_EXPORT_DIR = Path("../../models/experiments/charts")
CHARTS_EXPORT_DIR.mkdir(exist_ok=True)

print(f"📁 Results directory: {RESULTS_DIR.absolute()}")
print(f"📊 Charts export directory: {CHARTS_EXPORT_DIR.absolute()}")

# Enhanced file discovery
def discover_result_files():
    """Discover and categorize all available result files"""
    
    files = {
        'optimization_results': [],
        'optimization_detailed': [],
        'optimization_best': [],
        'backtest_single': [],
        'backtest_bets': [],
        'backtest_daily': [],
        'performance_analysis': []
    }
    
    patterns = {
        'optimization_results': 'optimization_*_results_*.json',
        'optimization_detailed': 'optimization_*_detailed_*.csv',
        'optimization_best': 'optimization_*_best_*.csv',
        'backtest_single': 'backtest_results_*.json',
        'backtest_bets': 'backtest_bets_*.csv',
        'backtest_daily': 'backtest_daily_*.csv',
        'performance_analysis': 'performance_analysis_*.json'
    }
    
    for file_type, pattern in patterns.items():
        found_files = list(RESULTS_DIR.glob(pattern))
        files[file_type] = sorted(found_files, key=lambda x: x.stat().st_mtime, reverse=True)
    
    return files

# Safe loading functions
def safe_load_json(file_path):
    """Safely load JSON with error handling"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            return json.load(f)
    except Exception as e:
        print(f"⚠️ Error loading {file_path.name}: {e}")
        return None

def safe_load_csv(file_path):
    """Safely load CSV with error handling"""
    try:
        df = pd.read_csv(file_path)
        # Convert date columns if present
        for col in ['date', 'timestamp', 'created_at']:
            if col in df.columns:
                try:
                    df[col] = pd.to_datetime(df[col])
                except:
                    pass
        return df
    except Exception as e:
        print(f"⚠️ Error loading {file_path.name}: {e}")
        return None

# Discover files
available_files = discover_result_files()

print("\n🔍 Available result files:")
for file_type, file_list in available_files.items():
    print(f"  📄 {file_type}: {len(file_list)} files")
    for i, file_path in enumerate(file_list[:2]):
        size_mb = file_path.stat().st_size / (1024*1024)
        mod_time = datetime.fromtimestamp(file_path.stat().st_mtime)
        print(f"    [{i+1}] {file_path.name} ({size_mb:.1f}MB, {mod_time.strftime('%Y-%m-%d %H:%M')})")
    if len(file_list) > 2:
        print(f"    ... and {len(file_list) - 2} more")

In [None]:
# Enhanced data loading with intelligent source selection
def load_optimization_data():
    """Load optimization data from best available source"""
    
    # Try CSV detailed first (fastest and most complete)
    if available_files['optimization_detailed']:
        csv_path = available_files['optimization_detailed'][0]
        print(f"📊 Loading CSV optimization data: {csv_path.name}")
        
        df = safe_load_csv(csv_path)
        if df is not None:
            print(f"✅ Loaded {len(df)} strategies from CSV")
            return df, "CSV"
    
    # Fallback to JSON
    if available_files['optimization_results']:
        json_path = available_files['optimization_results'][0]
        print(f"📊 Loading JSON optimization data: {json_path.name}")
        
        data = safe_load_json(json_path)
        if data and data.get('optimization_results'):
            try:
                df = pd.DataFrame(data['optimization_results'])
                print(f"✅ Loaded {len(df)} strategies from JSON")
                return df, "JSON"
            except Exception as e:
                print(f"⚠️ Error converting JSON to DataFrame: {e}")
    
    print("❌ No optimization data available")
    return None, None

def load_backtest_data():
    """Load backtest data from best available source"""
    
    # Try CSV bets first (fastest for time-series)
    if available_files['backtest_bets']:
        csv_path = available_files['backtest_bets'][0]
        print(f"📊 Loading CSV bet data: {csv_path.name}")
        
        df = safe_load_csv(csv_path)
        if df is not None:
            print(f"✅ Loaded {len(df)} bet records from CSV")
            return df, "CSV"
    
    # Fallback to JSON
    if available_files['backtest_single']:
        json_path = available_files['backtest_single'][0]
        print(f"📊 Loading JSON backtest data: {json_path.name}")
        
        data = safe_load_json(json_path)
        if data and data.get('bet_history'):
            try:
                df = pd.DataFrame(data['bet_history'])
                df['date'] = pd.to_datetime(df['date'])
                print(f"✅ Loaded {len(df)} bet records from JSON")
                return df, "JSON"
            except Exception as e:
                print(f"⚠️ Error converting JSON bet history: {e}")
    
    print("❌ No backtest data available")
    return None, None

# Load primary datasets
print("\n📥 Loading primary datasets...")

# Load optimization data
opt_df, opt_source = load_optimization_data()

# Load backtest data
bets_df, bets_source = load_backtest_data()

# Summary
print("\n📊 Data Loading Summary:")
print("="*40)
print(f"✅ Optimization data: {'Yes' if opt_df is not None else 'No'} ({opt_source if opt_source else 'N/A'})")
print(f"✅ Backtest data: {'Yes' if bets_df is not None else 'No'} ({bets_source if bets_source else 'N/A'})")

if opt_df is not None:
    print(f"   📊 Strategies loaded: {len(opt_df):,}")
if bets_df is not None:
    print(f"   📊 Bet records loaded: {len(bets_df):,}")

## 🏆 Strategy Performance Overview

In [None]:
# Analyze optimization results with flexible column detection
if opt_df is not None:
    
    print(f"📊 Strategy Performance Analysis ({len(opt_df)} strategies)")
    print("="*60)
    
    # Flexible ROI column detection
    roi_col = None
    possible_roi_cols = ['roi', 'total_return', 'return_pct', 'final_roi', 'return']
    for col in possible_roi_cols:
        if col in opt_df.columns:
            roi_col = col
            break
    
    if roi_col is None:
        print("⚠️ ROI column not found. Available columns:")
        print(f"   {list(opt_df.columns)[:10]}...")
    else:
        # Standardize and clean ROI data
        if roi_col != 'roi':
            opt_df['roi'] = opt_df[roi_col]
            print(f"📊 Using '{roi_col}' as ROI column")
        
        opt_df['roi'] = pd.to_numeric(opt_df['roi'], errors='coerce')
        opt_df = opt_df.dropna(subset=['roi'])
        
        if len(opt_df) == 0:
            print("❌ No valid ROI data after cleaning")
        else:
            # Key statistics
            roi_stats = opt_df['roi'].describe()
            
            print(f"\n📈 ROI Statistics ({len(opt_df)} valid strategies):")
            print(f"   Mean ROI: {roi_stats['mean']:.2%}")
            print(f"   Median ROI: {roi_stats['50%']:.2%}")
            print(f"   Best ROI: {roi_stats['max']:.2%}")
            print(f"   Worst ROI: {roi_stats['min']:.2%}")
            print(f"   Std Dev: {roi_stats['std']:.2%}")
            
            # Profitability analysis
            profitable = opt_df[opt_df['roi'] > 0]
            profitable_pct = len(profitable) / len(opt_df)
            
            print(f"\n💰 Profitability Analysis:")
            print(f"   Profitable strategies: {len(profitable)}/{len(opt_df)} ({profitable_pct:.1%})")
            if len(profitable) > 0:
                print(f"   Average profit (profitable only): {profitable['roi'].mean():.2%}")
                print(f"   Best profitable ROI: {profitable['roi'].max():.2%}")
            
            # Sharpe ratio analysis (if available)
            sharpe_cols = ['sharpe_ratio', 'sharpe', 'risk_adjusted_return']
            sharpe_col = None
            
            for col in sharpe_cols:
                if col in opt_df.columns:
                    sharpe_col = col
                    break
            
            if sharpe_col:
                if sharpe_col != 'sharpe_ratio':
                    opt_df['sharpe_ratio'] = opt_df[sharpe_col]
                
                sharpe_data = opt_df[opt_df['sharpe_ratio'].notna()]
                if len(sharpe_data) > 0:
                    sharpe_stats = sharpe_data['sharpe_ratio'].describe()
                    print(f"\n📊 Risk-Adjusted Performance ({len(sharpe_data)} strategies):")
                    print(f"   Mean Sharpe Ratio: {sharpe_stats['mean']:.2f}")
                    print(f"   Best Sharpe Ratio: {sharpe_stats['max']:.2f}")
                    
                    good_sharpe = sharpe_data[sharpe_data['sharpe_ratio'] > 1.0]
                    print(f"   Strategies with Sharpe > 1.0: {len(good_sharpe)}/{len(sharpe_data)} ({len(good_sharpe)/len(sharpe_data):.1%})")
else:
    print("❌ No optimization data available for strategy analysis")

In [None]:
# Key visualizations
if opt_df is not None and 'roi' in opt_df.columns:
    
    print("📊 Creating key performance visualizations...")
    
    # 1. ROI Distribution
    try:
        fig_roi = px.histogram(
            opt_df, 
            x='roi',
            nbins=50,
            title="📊 ROI Distribution - All Tested Strategies",
            labels={'roi': 'Return on Investment', 'count': 'Number of Strategies'}
        )
        
        fig_roi.add_vline(x=0, line_dash="dash", line_color="red", 
                         annotation_text="Break-even")
        fig_roi.update_xaxes(tickformat=".1%")
        fig_roi.update_layout(height=500)
        
        fig_roi.show()
        fig_roi.write_html(CHARTS_EXPORT_DIR / "roi_distribution.html")
        print(f"💾 Chart exported: roi_distribution.html")
        
    except Exception as e:
        print(f"⚠️ Error creating ROI distribution: {e}")
    
    # 2. Top 10 Strategies Table
    try:
        print("\n🏆 Top 10 Strategies by ROI:")
        
        display_cols = ['roi']
        for col in ['sharpe_ratio', 'total_bets', 'win_rate', 'edge_threshold', 'min_odds']:
            if col in opt_df.columns:
                display_cols.append(col)
        
        top_strategies = opt_df.nlargest(10, 'roi')[display_cols].round(4)
        
        # Format for display
        display_df = top_strategies.copy()
        for col in ['roi', 'win_rate']:
            if col in display_df.columns:
                display_df[col] = display_df[col].apply(lambda x: f"{x:.2%}" if pd.notna(x) else "N/A")
        
        print(display_df.to_string(index=False))
        
    except Exception as e:
        print(f"⚠️ Error creating top strategies table: {e}")
    
    # 3. ROI vs Sharpe (if available)
    if 'sharpe_ratio' in opt_df.columns:
        try:
            sharpe_clean = opt_df[
                (opt_df['sharpe_ratio'].notna()) & 
                (opt_df['sharpe_ratio'].between(-5, 10)) & 
                (opt_df['roi'].between(-1, 1))
            ]
            
            if len(sharpe_clean) > 0:
                fig_scatter = px.scatter(
                    sharpe_clean,
                    x='roi',
                    y='sharpe_ratio',
                    title="📈 Risk vs Return: ROI vs Sharpe Ratio",
                    labels={'roi': 'Return on Investment', 'sharpe_ratio': 'Sharpe Ratio'}
                )
                
                fig_scatter.add_hline(y=1, line_dash="dash", line_color="green", 
                                     annotation_text="Sharpe = 1.0")
                fig_scatter.add_vline(x=0, line_dash="dash", line_color="red", 
                                     annotation_text="Break-even")
                
                fig_scatter.update_xaxes(tickformat=".1%")
                fig_scatter.update_layout(height=500)
                
                fig_scatter.show()
                fig_scatter.write_html(CHARTS_EXPORT_DIR / "roi_vs_sharpe.html")
                print(f"💾 Chart exported: roi_vs_sharpe.html")
                
        except Exception as e:
            print(f"⚠️ Error creating ROI vs Sharpe chart: {e}")
            
else:
    print("❌ Cannot create visualizations - no valid optimization data")

## 🎯 Parameter Insights

In [None]:
# Quick parameter analysis
if opt_df is not None and 'roi' in opt_df.columns:
    
    print("🎯 Parameter Analysis Summary")
    print("="*50)
    
    # Detect available parameters
    param_mappings = {
        'edge_threshold': ['edge_threshold', 'edge_thresh', 'threshold'],
        'min_odds': ['min_odds', 'minimum_odds', 'odds_min'],
        'stake_method': ['stake_method', 'staking_method', 'stake_type'],
        'ev_method': ['ev_method', 'ev_calculation', 'expected_value_method']
    }
    
    found_params = {}
    for param_name, possible_names in param_mappings.items():
        for col_name in possible_names:
            if col_name in opt_df.columns:
                found_params[param_name] = col_name
                break
    
    if found_params:
        print(f"\n📊 Available Parameters: {', '.join(found_params.keys())}")
        
        # Parameter ranges and profitable insights
        profitable_strategies = opt_df[opt_df['roi'] > 0]
        
        if len(profitable_strategies) > 0:
            print(f"\n💰 Profitable Strategy Insights ({len(profitable_strategies)} strategies):")
            
            for param_name, col_name in found_params.items():
                try:
                    if opt_df[col_name].dtype == 'object':
                        # Categorical parameter
                        profitable_methods = profitable_strategies[col_name].value_counts()
                        if len(profitable_methods) > 0:
                            best_method = profitable_methods.index[0]
                            count = profitable_methods.iloc[0]
                            print(f"   {param_name}: '{best_method}' most successful ({count} strategies)")
                    else:
                        # Numerical parameter
                        param_data = profitable_strategies[col_name].dropna()
                        if len(param_data) > 0:
                            optimal_range = (param_data.quantile(0.25), param_data.quantile(0.75))
                            print(f"   {param_name}: optimal range {optimal_range[0]:.3f} - {optimal_range[1]:.3f}")
                            
                except Exception as e:
                    print(f"   ⚠️ Error analyzing {param_name}: {e}")
        else:
            print("\n⚠️ No profitable strategies found for parameter analysis")
            
        # Overall parameter ranges
        print(f"\n📊 Overall Parameter Ranges:")
        for param_name, col_name in found_params.items():
            try:
                if opt_df[col_name].dtype == 'object':
                    unique_values = opt_df[col_name].value_counts().head(3)
                    values_str = ', '.join([f'{v}({c})' for v, c in unique_values.items()])
                    print(f"   {param_name}: {values_str}")
                else:
                    param_clean = opt_df[col_name].dropna()
                    if len(param_clean) > 0:
                        print(f"   {param_name}: {param_clean.min():.3f} - {param_clean.max():.3f}")
            except Exception as e:
                print(f"   ⚠️ Error summarizing {param_name}: {e}")
    else:
        print("⚠️ No standard parameter columns detected")
        print(f"Available columns: {', '.join(opt_df.columns[:10])}...")
        
else:
    print("❌ No optimization data available for parameter analysis")

## 📈 Performance Overview

In [None]:
# Basic time-series overview
if bets_df is not None:
    
    print("📈 Performance Time-Series Overview")
    print("="*50)
    
    try:
        # Detect profit/loss column
        pnl_col = None
        for col in ['net_result', 'profit_loss', 'pnl', 'result']:
            if col in bets_df.columns:
                pnl_col = col
                break
        
        if pnl_col and 'date' in bets_df.columns:
            # Clean and sort data
            bets_clean = bets_df.copy()
            bets_clean[pnl_col] = pd.to_numeric(bets_clean[pnl_col], errors='coerce')
            bets_clean = bets_clean.dropna(subset=[pnl_col, 'date'])
            bets_clean = bets_clean.sort_values('date')
            
            if len(bets_clean) > 0:
                print(f"📊 Analyzing {len(bets_clean)} bet records")
                
                # Basic statistics
                total_return = bets_clean[pnl_col].sum()
                avg_return = bets_clean[pnl_col].mean()
                
                print(f"\n💰 Performance Summary:")
                print(f"   Total Return: ${total_return:,.2f}")
                print(f"   Average per Bet: ${avg_return:,.2f}")
                print(f"   Period: {bets_clean['date'].min().strftime('%Y-%m-%d')} to {bets_clean['date'].max().strftime('%Y-%m-%d')}")
                
                # Win rate (if available)
                win_cols = ['bet_won', 'won', 'is_win', 'win']
                win_col = None
                for col in win_cols:
                    if col in bets_clean.columns:
                        win_col = col
                        break
                
                if win_col:
                    try:
                        win_rate = bets_clean[win_col].mean()
                        winning_bets = bets_clean[win_col].sum()
                        print(f"   Win Rate: {win_rate:.1%} ({winning_bets}/{len(bets_clean)})")
                    except:
                        pass
                
                # Create cumulative return chart
                bets_clean['cumulative_return'] = bets_clean[pnl_col].cumsum()
                
                fig_cumulative = px.line(
                    bets_clean,
                    x='date',
                    y='cumulative_return',
                    title="📈 Cumulative Return Over Time",
                    labels={'cumulative_return': 'Cumulative Return ($)', 'date': 'Date'}
                )
                
                fig_cumulative.add_hline(y=0, line_dash="dash", line_color="red", 
                                       annotation_text="Break-even")
                fig_cumulative.update_layout(height=500)
                
                fig_cumulative.show()
                fig_cumulative.write_html(CHARTS_EXPORT_DIR / "cumulative_return.html")
                print(f"💾 Chart exported: cumulative_return.html")
                
            else:
                print("⚠️ No valid bet data after cleaning")
        else:
            print("⚠️ Required columns (profit/loss and date) not found")
            print(f"Available columns: {list(bets_df.columns)}")
            
    except Exception as e:
        print(f"❌ Error in performance analysis: {e}")
        
else:
    print("❌ No backtest data available for performance analysis")

## 💡 Executive Summary

In [None]:
# Comprehensive executive summary
print("🎯 EXECUTIVE SUMMARY - Hockey Prediction System")
print("="*60)

# Collect key metrics
summary_metrics = {}

# Optimization metrics
if opt_df is not None and 'roi' in opt_df.columns:
    summary_metrics['total_strategies'] = len(opt_df)
    summary_metrics['profitable_strategies'] = len(opt_df[opt_df['roi'] > 0])
    summary_metrics['profitability_rate'] = summary_metrics['profitable_strategies'] / summary_metrics['total_strategies']
    summary_metrics['best_roi'] = opt_df['roi'].max()
    summary_metrics['mean_roi'] = opt_df['roi'].mean()
    summary_metrics['median_roi'] = opt_df['roi'].median()

# Backtest metrics
if bets_df is not None:
    pnl_col = None
    for col in ['net_result', 'profit_loss', 'pnl']:
        if col in bets_df.columns:
            pnl_col = col
            break
    
    if pnl_col:
        try:
            bets_clean = bets_df.dropna(subset=[pnl_col])
            summary_metrics['total_bets'] = len(bets_clean)
            summary_metrics['total_return'] = bets_clean[pnl_col].sum()
            
            # Estimate ROI
            if 'stake' in bets_clean.columns:
                total_staked = bets_clean['stake'].sum()
                if total_staked > 0:
                    summary_metrics['actual_roi'] = summary_metrics['total_return'] / total_staked
        except:
            pass

# Print summary
print("\n📊 KEY PERFORMANCE INDICATORS:")
print("-" * 40)

if 'total_strategies' in summary_metrics:
    print(f"🔬 OPTIMIZATION ANALYSIS:")
    print(f"   Strategies Tested: {summary_metrics['total_strategies']:,}")
    print(f"   Profitable Strategies: {summary_metrics['profitable_strategies']:,} ({summary_metrics['profitability_rate']:.1%})")
    print(f"   Best ROI: {summary_metrics['best_roi']:.2%}")
    print(f"   Mean ROI: {summary_metrics['mean_roi']:.2%}")
    print(f"   Median ROI: {summary_metrics['median_roi']:.2%}")

if 'total_bets' in summary_metrics:
    print(f"\n📈 BACKTEST PERFORMANCE:")
    print(f"   Total Bets Analyzed: {summary_metrics['total_bets']:,}")
    print(f"   Total Return: ${summary_metrics['total_return']:,.2f}")
    if 'actual_roi' in summary_metrics:
        print(f"   Actual ROI: {summary_metrics['actual_roi']:.2%}")

# Key insights and recommendations
print("\n💡 KEY INSIGHTS:")
print("-" * 20)

insights = []

# Profitability insights
if 'profitability_rate' in summary_metrics:
    if summary_metrics['profitability_rate'] > 0.3:
        insights.append(f"✅ Strong profitability: {summary_metrics['profitability_rate']:.1%} of strategies are profitable")
    elif summary_metrics['profitability_rate'] > 0.1:
        insights.append(f"⚠️ Moderate profitability: {summary_metrics['profitability_rate']:.1%} of strategies are profitable")
    else:
        insights.append(f"🔴 Low profitability: Only {summary_metrics['profitability_rate']:.1%} of strategies are profitable")

# ROI insights
if 'best_roi' in summary_metrics:
    if summary_metrics['best_roi'] > 0.2:
        insights.append(f"✅ Excellent top performance: Best strategy achieves {summary_metrics['best_roi']:.1%} ROI")
    elif summary_metrics['best_roi'] > 0.05:
        insights.append(f"✅ Good top performance: Best strategy achieves {summary_metrics['best_roi']:.1%} ROI")
    else:
        insights.append(f"⚠️ Limited upside: Best strategy only achieves {summary_metrics['best_roi']:.1%} ROI")

# Mean performance insight
if 'mean_roi' in summary_metrics:
    if summary_metrics['mean_roi'] > 0:
        insights.append(f"✅ Positive average performance: Mean ROI is {summary_metrics['mean_roi']:.2%}")
    else:
        insights.append(f"🔴 Negative average performance: Mean ROI is {summary_metrics['mean_roi']:.2%}")

for insight in insights:
    print(f"   {insight}")

# Recommendations
print("\n🚀 RECOMMENDATIONS:")
print("-" * 25)

recommendations = []

if 'profitability_rate' in summary_metrics:
    if summary_metrics['profitability_rate'] > 0.2 and summary_metrics.get('best_roi', 0) > 0.1:
        recommendations.extend([
            "1. ✅ Proceed with live testing using top 5-10 strategies",
            "2. 📊 Implement strict risk management (max 2% per bet)",
            "3. 🔄 Monitor performance and adjust parameters monthly"
        ])
    elif summary_metrics['profitability_rate'] > 0.1:
        recommendations.extend([
            "1. ⚠️ Paper trade top strategies before real money",
            "2. 🔍 Focus only on top 5% of strategies",
            "3. 📉 Use very conservative stake sizes (1% max)"
        ])
    else:
        recommendations.extend([
            "1. 🔴 Do NOT proceed with live trading yet",
            "2. 🔬 Investigate model accuracy and edge detection",
            "3. 📚 Consider alternative approaches or markets"
        ])
else:
    recommendations.append("1. 📊 Need more comprehensive optimization data for recommendations")

for recommendation in recommendations:
    print(f"   {recommendation}")

# Analysis metadata
print(f"\n📅 Analysis completed: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"📁 Charts exported to: {CHARTS_EXPORT_DIR.absolute()}")

# Data sources used
data_sources = []
if opt_source:
    data_sources.append(f"Optimization: {opt_source}")
if bets_source:
    data_sources.append(f"Backtest: {bets_source}")

if data_sources:
    print(f"📊 Data sources: {', '.join(data_sources)}")

print("\n🎯 Analysis Summary: Main overview completed successfully!")
print("💡 For detailed analysis, run specialized notebooks:")
print("   - strategy_optimization.ipynb (parameter deep dive)")
print("   - risk_assessment.ipynb (drawdown & risk metrics)")
print("   - model_validation.ipynb (prediction performance)")

## 📤 Export & Next Steps

In [None]:
# Create summary dashboard
def create_summary_dashboard():
    """Create a compact summary dashboard"""
    
    if opt_df is None:
        print("❌ Cannot create dashboard - no optimization data")
        return None
    
    try:
        # Create 2x2 dashboard
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=('ROI Distribution', 'Top 10 Strategies', 
                          'Performance Over Time', 'Summary Stats'),
            specs=[[{"type": "histogram"}, {"type": "bar"}],
                   [{"type": "scatter"}, {"type": "table"}]]
        )
        
        # 1. ROI Distribution
        fig.add_trace(
            go.Histogram(x=opt_df['roi'], name='ROI Distribution', nbinsx=30),
            row=1, col=1
        )
        
        # 2. Top 10 ROIs
        top_10 = opt_df.nlargest(10, 'roi')
        fig.add_trace(
            go.Bar(x=list(range(1, 11)), y=top_10['roi'], name='Top 10 ROIs'),
            row=1, col=2
        )
        
        # 3. Performance over time (if bet data available)
        if bets_df is not None and 'date' in bets_df.columns:
            pnl_col = None
            for col in ['net_result', 'profit_loss', 'pnl']:
                if col in bets_df.columns:
                    pnl_col = col
                    break
            
            if pnl_col:
                try:
                    bets_sample = bets_df.dropna(subset=[pnl_col, 'date']).head(1000)
                    bets_sample = bets_sample.sort_values('date')
                    cumulative = bets_sample[pnl_col].cumsum()
                    
                    fig.add_trace(
                        go.Scatter(x=bets_sample['date'], y=cumulative, 
                                 mode='lines', name='Cumulative Return'),
                        row=2, col=1
                    )
                except:
                    pass
        
        # 4. Summary table
        summary_data = [
            ['Strategies Tested', f"{len(opt_df):,}"],
            ['Profitable', f"{len(opt_df[opt_df['roi'] > 0]):,}"],
            ['Best ROI', f"{opt_df['roi'].max():.2%}"],
            ['Mean ROI', f"{opt_df['roi'].mean():.2%}"],
            ['Median ROI', f"{opt_df['roi'].median():.2%}"]
        ]
        
        fig.add_trace(
            go.Table(
                header=dict(values=['Metric', 'Value']),
                cells=dict(values=[[row[0] for row in summary_data], 
                                  [row[1] for row in summary_data]])
            ),
            row=2, col=2
        )
        
        fig.update_layout(
            height=800,
            title_text="🏒 Hockey Prediction System - Summary Dashboard",
            showlegend=False
        )
        
        # Save dashboard
        dashboard_path = CHARTS_EXPORT_DIR / "summary_dashboard.html"
        fig.write_html(dashboard_path)
        
        return dashboard_path
        
    except Exception as e:
        print(f"⚠️ Error creating summary dashboard: {e}")
        return None

# Create dashboard
print("📊 Creating summary dashboard...")
dashboard_path = create_summary_dashboard()

if dashboard_path:
    print(f"✅ Summary dashboard created: {dashboard_path.name}")
    print(f"🌐 Open in browser: file://{dashboard_path.absolute()}")

# List all exported files
print("\n📋 Exported Files:")
chart_files = list(CHARTS_EXPORT_DIR.glob("*.html"))
for chart_file in sorted(chart_files):
    size_kb = chart_file.stat().st_size / 1024
    print(f"  📊 {chart_file.name} ({size_kb:.1f}KB)")

print(f"\n🎯 Main analysis completed!")
print(f"📁 {len(chart_files)} charts exported to: {CHARTS_EXPORT_DIR.absolute()}")
print(f"\n📚 Next steps:")
print(f"   1. Review summary dashboard for key insights")
print(f"   2. Run specialized notebooks for detailed analysis")
print(f"   3. Implement recommended strategies based on findings")

---

## 🎯 Main Analysis Summary

This notebook provides core backtesting analysis with:

✅ **Robust Data Loading** - Multi-format support with intelligent fallbacks  
✅ **Strategy Performance Overview** - ROI analysis and top strategies identification  
✅ **Parameter Insights** - Key optimization findings and profitable ranges  
✅ **Performance Overview** - Time-series analysis and key metrics  
✅ **Executive Summary** - Actionable insights and recommendations  
✅ **Interactive Visualizations** - Core charts for decision making  

## 📚 Specialized Notebooks

For detailed analysis, use these specialized notebooks:

- **strategy_optimization.ipynb** - Deep parameter analysis and sensitivity studies
- **risk_assessment.ipynb** - Comprehensive risk metrics and drawdown analysis  
- **model_validation.ipynb** - Prediction accuracy and calibration analysis
- **time_series_analysis.ipynb** - Detailed performance trends and seasonality

## 📁 Installation

**Save as:** `notebooks/analysis/main_analysis.ipynb`

**Dependencies:** `pip install plotly pandas numpy scipy`

**Usage:** Run cells sequentially - notebook automatically detects best data sources

---

*Hockey Prediction System - Main Backtesting Analysis*  
*Location: notebooks/analysis/main_analysis.ipynb*  
*Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}*