# Stock Prediction Performance Analysis

This notebook provides comprehensive analysis of your stock prediction model's performance across different stocks, time periods, and confidence levels.

## Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import glob
from datetime import datetime, timedelta
import yfinance as yf
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

print("📊 Stock Prediction Analysis Setup Complete")

## 1. Load Prediction Data

Load prediction results from your deployed service or local experiments.

In [None]:
def load_prediction_files(prediction_dir="./predictions"):
    """
    Load all prediction files from directory
    """
    prediction_files = glob.glob(f"{prediction_dir}/predictions_*.json")
    
    if not prediction_files:
        print(f"❌ No prediction files found in {prediction_dir}")
        print("💡 Tip: If running from local experiments, check if files are in current directory")
        return pd.DataFrame()
    
    all_predictions = []
    
    for file_path in sorted(prediction_files):
        try:
            with open(file_path, 'r') as f:
                predictions = json.load(f)
            
            # Convert to DataFrame and add file info
            df = pd.DataFrame(predictions)
            df['source_file'] = file_path
            all_predictions.append(df)
            
        except Exception as e:
            print(f"⚠️  Error loading {file_path}: {e}")
    
    if all_predictions:
        combined_df = pd.concat(all_predictions, ignore_index=True)
        print(f"✅ Loaded {len(combined_df)} predictions from {len(prediction_files)} files")
        return combined_df
    else:
        return pd.DataFrame()

def load_sample_results():
    """
    Create sample prediction results for demonstration
    Replace this with your actual prediction loading
    """
    np.random.seed(42)
    
    tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'META', 'TSLA', 'JPM', 'V', 'PG']
    dates = pd.date_range('2024-01-01', '2024-03-01', freq='B')  # Business days only
    
    predictions = []
    
    for date in dates:
        for ticker in tickers:
            # Simulate predictions with realistic patterns
            base_accuracy = np.random.uniform(0.45, 0.65)  # Stock-specific performance
            confidence = np.random.uniform(0.5, 0.95)
            
            # Higher confidence should correlate with better accuracy
            if confidence > 0.8:
                prediction_correct = np.random.random() < (base_accuracy + 0.1)
            elif confidence > 0.6:
                prediction_correct = np.random.random() < base_accuracy
            else:
                prediction_correct = np.random.random() < (base_accuracy - 0.05)
            
            prediction = np.random.choice([0, 1])
            actual = prediction if prediction_correct else (1 - prediction)
            
            predictions.append({
                'ticker': ticker,
                'prediction_date': date.strftime('%Y-%m-%d'),
                'target_date': (date + timedelta(days=1)).strftime('%Y-%m-%d'),
                'prediction': prediction,
                'actual': actual,
                'confidence': confidence,
                'correct': prediction == actual,
                'latest_close': np.random.uniform(100, 300)
            })
    
    return pd.DataFrame(predictions)

# Try to load actual prediction files, fall back to sample data
predictions_df = load_prediction_files()

if predictions_df.empty:
    print("📝 Using sample data for demonstration")
    predictions_df = load_sample_results()

# Convert date columns
predictions_df['prediction_date'] = pd.to_datetime(predictions_df['prediction_date'])
predictions_df['target_date'] = pd.to_datetime(predictions_df['target_date'])

print(f"\n📊 Dataset Overview:")
print(f"   Total Predictions: {len(predictions_df):,}")
print(f"   Date Range: {predictions_df['prediction_date'].min().date()} to {predictions_df['prediction_date'].max().date()}")
print(f"   Unique Stocks: {predictions_df['ticker'].nunique()}")
print(f"   Overall Accuracy: {predictions_df['correct'].mean():.1%}")

predictions_df.head()

## 2. Download Actual Market Data for Verification

Download actual stock prices to verify prediction accuracy and calculate real returns.

In [None]:
def download_actual_market_data(tickers, start_date, end_date):
    """
    Download actual market data to verify predictions
    """
    print(f"📥 Downloading actual market data for {len(tickers)} stocks...")
    
    market_data = {}
    failed_tickers = []
    
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            hist = stock.history(start=start_date, end=end_date)
            
            if not hist.empty:
                market_data[ticker] = hist
                print(f"✓ {ticker}: {len(hist)} days")
            else:
                failed_tickers.append(ticker)
                
        except Exception as e:
            failed_tickers.append(ticker)
            print(f"✗ {ticker}: {e}")
    
    print(f"\n✅ Downloaded data for {len(market_data)}/{len(tickers)} stocks")
    if failed_tickers:
        print(f"❌ Failed: {failed_tickers}")
    
    return market_data

def verify_predictions_with_actual_data(predictions_df, market_data):
    """
    Verify predictions against actual market data
    """
    verified_predictions = []
    
    for _, row in predictions_df.iterrows():
        ticker = row['ticker']
        target_date = row['target_date']
        
        if ticker in market_data:
            stock_data = market_data[ticker]
            
            # Find the target trading day (handle weekends/holidays)
            target_dates = pd.date_range(target_date, target_date + timedelta(days=3), freq='B')
            
            for check_date in target_dates:
                if check_date.date() in stock_data.index.date:
                    day_data = stock_data.loc[stock_data.index.date == check_date.date()].iloc[0]
                    
                    # Calculate actual direction and return
                    actual_direction = 1 if day_data['Close'] > day_data['Open'] else 0
                    actual_return = (day_data['Close'] - day_data['Open']) / day_data['Open']
                    
                    # Update prediction record
                    verified_row = row.copy()
                    verified_row['actual_verified'] = actual_direction
                    verified_row['actual_return'] = actual_return
                    verified_row['correct_verified'] = (row['prediction'] == actual_direction)
                    verified_row['actual_open'] = day_data['Open']
                    verified_row['actual_close'] = day_data['Close']
                    verified_row['actual_volume'] = day_data['Volume']
                    
                    verified_predictions.append(verified_row)
                    break
    
    return pd.DataFrame(verified_predictions)

# Download actual market data for verification
unique_tickers = predictions_df['ticker'].unique()
start_date = predictions_df['target_date'].min() - timedelta(days=5)
end_date = predictions_df['target_date'].max() + timedelta(days=5)

market_data = download_actual_market_data(unique_tickers, start_date, end_date)

# Verify predictions (only if we have real market data)
if market_data:
    verified_df = verify_predictions_with_actual_data(predictions_df, market_data)
    
    if not verified_df.empty:
        print(f"\n🔍 Verification Results:")
        print(f"   Verified Predictions: {len(verified_df):,}")
        print(f"   Verified Accuracy: {verified_df['correct_verified'].mean():.1%}")
        
        # Use verified data if available
        if 'actual_verified' in verified_df.columns:
            predictions_df = verified_df
            predictions_df['actual'] = predictions_df['actual_verified']
            predictions_df['correct'] = predictions_df['correct_verified']
else:
    print("📊 Using original prediction data (no market data verification)")

## 3. Overall Performance Analysis

In [None]:
def analyze_overall_performance(df):
    """
    Comprehensive overall performance analysis
    """
    print("🎯 OVERALL MODEL PERFORMANCE")
    print("="*50)
    
    total_predictions = len(df)
    overall_accuracy = df['correct'].mean()
    
    # Basic metrics
    print(f"📊 Basic Metrics:")
    print(f"   Total Predictions: {total_predictions:,}")
    print(f"   Overall Accuracy: {overall_accuracy:.1%}")
    print(f"   Improvement over Random: {(overall_accuracy - 0.5) * 100:+.1f} percentage points")
    
    # Statistical significance
    from scipy.stats import binomtest
    p_value = binomtest(df['correct'].sum(), total_predictions, 0.5).pvalue
    is_significant = p_value < 0.05
    
    print(f"\n📈 Statistical Analysis:")
    print(f"   P-value vs Random: {p_value:.4f}")
    print(f"   Statistically Significant: {'✅ YES' if is_significant else '❌ NO'}")
    
    # Precision, Recall, F1
    from sklearn.metrics import precision_score, recall_score, f1_score
    precision = precision_score(df['actual'], df['prediction'])
    recall = recall_score(df['actual'], df['prediction'])
    f1 = f1_score(df['actual'], df['prediction'])
    
    print(f"\n🎯 Classification Metrics:")
    print(f"   Precision: {precision:.3f}")
    print(f"   Recall: {recall:.3f}")
    print(f"   F1-Score: {f1:.3f}")
    
    # Prediction distribution
    up_predictions = (df['prediction'] == 1).sum()
    down_predictions = (df['prediction'] == 0).sum()
    actual_up = (df['actual'] == 1).sum()
    actual_down = (df['actual'] == 0).sum()
    
    print(f"\n📊 Prediction Distribution:")
    print(f"   Predicted UP: {up_predictions:,} ({up_predictions/total_predictions:.1%})")
    print(f"   Predicted DOWN: {down_predictions:,} ({down_predictions/total_predictions:.1%})")
    print(f"   Actual UP: {actual_up:,} ({actual_up/total_predictions:.1%})")
    print(f"   Actual DOWN: {actual_down:,} ({actual_down/total_predictions:.1%})")
    
    # Confidence analysis
    avg_confidence = df['confidence'].mean()
    high_conf_mask = df['confidence'] >= 0.7
    high_conf_accuracy = df[high_conf_mask]['correct'].mean() if high_conf_mask.sum() > 0 else 0
    
    print(f"\n🎯 Confidence Analysis:")
    print(f"   Average Confidence: {avg_confidence:.3f}")
    print(f"   High Confidence (≥0.7) Predictions: {high_conf_mask.sum():,} ({high_conf_mask.sum()/total_predictions:.1%})")
    print(f"   High Confidence Accuracy: {high_conf_accuracy:.1%}")
    
    return {
        'total_predictions': total_predictions,
        'overall_accuracy': overall_accuracy,
        'p_value': p_value,
        'is_significant': is_significant,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'high_confidence_accuracy': high_conf_accuracy
    }

overall_metrics = analyze_overall_performance(predictions_df)

## 4. Performance by Individual Stock

In [None]:
def analyze_stock_performance(df):
    """
    Analyze performance for each individual stock
    """
    stock_metrics = []
    
    for ticker in df['ticker'].unique():
        stock_data = df[df['ticker'] == ticker]
        
        if len(stock_data) < 5:  # Skip stocks with too few predictions
            continue
        
        accuracy = stock_data['correct'].mean()
        total_preds = len(stock_data)
        avg_confidence = stock_data['confidence'].mean()
        
        # Separate accuracy for UP vs DOWN predictions
        up_preds = stock_data[stock_data['prediction'] == 1]
        down_preds = stock_data[stock_data['prediction'] == 0]
        
        up_accuracy = up_preds['correct'].mean() if len(up_preds) > 0 else 0
        down_accuracy = down_preds['correct'].mean() if len(down_preds) > 0 else 0
        
        # Prediction bias
        up_prediction_rate = (stock_data['prediction'] == 1).mean()
        actual_up_rate = (stock_data['actual'] == 1).mean()
        prediction_bias = up_prediction_rate - actual_up_rate
        
        # High confidence performance
        high_conf_data = stock_data[stock_data['confidence'] >= 0.7]
        high_conf_accuracy = high_conf_data['correct'].mean() if len(high_conf_data) > 0 else accuracy
        
        stock_metrics.append({
            'ticker': ticker,
            'accuracy': accuracy,
            'total_predictions': total_preds,
            'avg_confidence': avg_confidence,
            'up_accuracy': up_accuracy,
            'down_accuracy': down_accuracy,
            'up_prediction_rate': up_prediction_rate,
            'actual_up_rate': actual_up_rate,
            'prediction_bias': prediction_bias,
            'high_conf_accuracy': high_conf_accuracy,
            'high_conf_count': len(high_conf_data)
        })
    
    stock_df = pd.DataFrame(stock_metrics).sort_values('accuracy', ascending=False)
    
    print("📈 STOCK-BY-STOCK PERFORMANCE")
    print("="*50)
    
    print(f"\n🏆 TOP 10 PERFORMING STOCKS:")
    for i, (_, row) in enumerate(stock_df.head(10).iterrows(), 1):
        print(f"{i:2d}. {row['ticker']:5s} - {row['accuracy']:.1%} accuracy ({row['total_predictions']:3d} predictions, {row['avg_confidence']:.2f} avg conf)")
    
    print(f"\n📉 BOTTOM 5 PERFORMING STOCKS:")
    for i, (_, row) in enumerate(stock_df.tail(5).iterrows(), 1):
        print(f"{i:2d}. {row['ticker']:5s} - {row['accuracy']:.1%} accuracy ({row['total_predictions']:3d} predictions, {row['avg_confidence']:.2f} avg conf)")
    
    # Performance categories
    excellent = stock_df[stock_df['accuracy'] > 0.65]
    good = stock_df[(stock_df['accuracy'] > 0.55) & (stock_df['accuracy'] <= 0.65)]
    average = stock_df[(stock_df['accuracy'] > 0.45) & (stock_df['accuracy'] <= 0.55)]
    poor = stock_df[stock_df['accuracy'] <= 0.45]
    
    print(f"\n📊 PERFORMANCE CATEGORIES:")
    print(f"   Excellent (>65%): {len(excellent)} stocks")
    print(f"   Good (55-65%): {len(good)} stocks")
    print(f"   Average (45-55%): {len(average)} stocks")
    print(f"   Poor (<45%): {len(poor)} stocks")
    
    if len(poor) > 0:
        print(f"\n⚠️  Consider excluding poor performers: {', '.join(poor['ticker'].tolist())}")
    
    return stock_df

stock_performance = analyze_stock_performance(predictions_df)
stock_performance.head(10)

## 5. Visualization: Stock Performance Heatmap

In [None]:
# Stock Performance Bar Chart
plt.figure(figsize=(15, 10))

# Sort stocks by accuracy for better visualization
stock_sorted = stock_performance.sort_values('accuracy', ascending=True)

# Color code based on performance
colors = ['red' if x < 0.45 else 'orange' if x < 0.5 else 'lightblue' if x < 0.55 else 'lightgreen' if x < 0.65 else 'green' 
          for x in stock_sorted['accuracy']]

# Create horizontal bar chart
bars = plt.barh(range(len(stock_sorted)), stock_sorted['accuracy'], color=colors, alpha=0.7)

# Add reference lines
plt.axvline(x=0.5, color='black', linestyle='--', alpha=0.5, label='Random (50%)')
plt.axvline(x=0.55, color='blue', linestyle='--', alpha=0.5, label='Good (55%)')
plt.axvline(x=0.65, color='green', linestyle='--', alpha=0.5, label='Excellent (65%)')

# Customize chart
plt.yticks(range(len(stock_sorted)), stock_sorted['ticker'])
plt.xlabel('Prediction Accuracy')
plt.title('Stock Prediction Accuracy by Ticker\n(Higher is Better)', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(axis='x', alpha=0.3)

# Add accuracy labels on bars
for i, (bar, accuracy) in enumerate(zip(bars, stock_sorted['accuracy'])):
    plt.text(accuracy + 0.005, i, f'{accuracy:.1%}', 
             va='center', fontsize=9, fontweight='bold')

plt.tight_layout()
plt.show()

# Summary statistics
print(f"\n📊 Stock Performance Summary:")
print(f"   Best Performing Stock: {stock_performance.iloc[0]['ticker']} ({stock_performance.iloc[0]['accuracy']:.1%})")
print(f"   Worst Performing Stock: {stock_performance.iloc[-1]['ticker']} ({stock_performance.iloc[-1]['accuracy']:.1%})")
print(f"   Average Accuracy Across Stocks: {stock_performance['accuracy'].mean():.1%}")
print(f"   Standard Deviation: {stock_performance['accuracy'].std():.1%}")

## 6. Temporal Performance Analysis

In [None]:
def analyze_temporal_performance(df):
    """
    Analyze how performance changes over time
    """
    # Daily performance
    daily_performance = df.groupby('prediction_date').agg({
        'correct': ['mean', 'count'],
        'confidence': 'mean'
    }).round(3)
    
    daily_performance.columns = ['accuracy', 'prediction_count', 'avg_confidence']
    daily_performance = daily_performance.reset_index()
    
    # Weekly performance
    df['week'] = df['prediction_date'].dt.isocalendar().week
    weekly_performance = df.groupby('week')['correct'].mean().reset_index()
    
    # Day of week performance
    df['day_of_week'] = df['prediction_date'].dt.day_name()
    dow_performance = df.groupby('day_of_week')['correct'].agg(['mean', 'count']).round(3)
    
    print("📅 TEMPORAL PERFORMANCE ANALYSIS")
    print("="*50)
    
    print(f"\n📊 Performance by Day of Week:")
    day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
    for day in day_order:
        if day in dow_performance.index:
            row = dow_performance.loc[day]
            print(f"   {day:10s}: {row['mean']:.1%} accuracy ({int(row['count']):3d} predictions)")
    
    # Best and worst days
    best_days = daily_performance.nlargest(5, 'accuracy')
    worst_days = daily_performance.nsmallest(5, 'accuracy')
    
    print(f"\n🏆 Best Performing Days:")
    for _, day in best_days.iterrows():
        print(f"   {day['prediction_date'].strftime('%Y-%m-%d')}: {day['accuracy']:.1%} ({day['prediction_count']} predictions)")
    
    print(f"\n📉 Worst Performing Days:")
    for _, day in worst_days.iterrows():
        print(f"   {day['prediction_date'].strftime('%Y-%m-%d')}: {day['accuracy']:.1%} ({day['prediction_count']} predictions)")
    
    return daily_performance, weekly_performance, dow_performance

daily_perf, weekly_perf, dow_perf = analyze_temporal_performance(predictions_df)

In [None]:
# Plot temporal performance
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))

# 1. Daily accuracy over time
ax1.plot(daily_perf['prediction_date'], daily_perf['accuracy'], marker='o', linewidth=2, markersize=4)
ax1.axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='Random (50%)')
ax1.axhline(y=daily_perf['accuracy'].mean(), color='blue', linestyle='--', alpha=0.7, label=f'Average ({daily_perf["accuracy"].mean():.1%})')
ax1.set_title('Daily Prediction Accuracy Over Time')
ax1.set_ylabel('Accuracy')
ax1.tick_params(axis='x', rotation=45)
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. Day of week performance
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
dow_data = [dow_perf.loc[day, 'mean'] if day in dow_perf.index else 0 for day in day_order]
dow_colors = ['lightgreen' if x > 0.5 else 'lightcoral' for x in dow_data]

bars = ax2.bar(day_order, dow_data, color=dow_colors, alpha=0.7)
ax2.axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='Random (50%)')
ax2.set_title('Accuracy by Day of Week')
ax2.set_ylabel('Accuracy')
ax2.tick_params(axis='x', rotation=45)
ax2.legend()
ax2.grid(True, alpha=0.3)

# Add accuracy labels on bars
for bar, accuracy in zip(bars, dow_data):
    if accuracy > 0:
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005, 
                f'{accuracy:.1%}', ha='center', va='bottom', fontweight='bold')

# 3. Confidence vs Accuracy Scatter
ax3.scatter(predictions_df['confidence'], predictions_df['correct'], alpha=0.5, s=10)

# Add trend line
confidence_bins = np.arange(0.5, 1.0, 0.05)
bin_centers = []
bin_accuracies = []

for i in range(len(confidence_bins)-1):
    mask = (predictions_df['confidence'] >= confidence_bins[i]) & (predictions_df['confidence'] < confidence_bins[i+1])
    if mask.sum() > 0:
        bin_centers.append((confidence_bins[i] + confidence_bins[i+1]) / 2)
        bin_accuracies.append(predictions_df[mask]['correct'].mean())

if bin_centers:
    ax3.plot(bin_centers, bin_accuracies, color='red', linewidth=3, marker='o', label='Binned Average')

ax3.axhline(y=0.5, color='black', linestyle='--', alpha=0.5, label='Random (50%)')
ax3.set_xlabel('Prediction Confidence')
ax3.set_ylabel('Actual Accuracy (0=Wrong, 1=Correct)')
ax3.set_title('Model Calibration: Confidence vs Accuracy')
ax3.legend()
ax3.grid(True, alpha=0.3)

# 4. Prediction count over time
ax4.bar(daily_perf['prediction_date'], daily_perf['prediction_count'], alpha=0.7, color='skyblue')
ax4.set_title('Number of Predictions Per Day')
ax4.set_ylabel('Prediction Count')
ax4.tick_params(axis='x', rotation=45)
ax4.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 7. Confidence Analysis

In [None]:
def analyze_confidence_patterns(df):
    """
    Detailed analysis of confidence score patterns
    """
    print("🎯 CONFIDENCE ANALYSIS")
    print("="*50)
    
    # Confidence distribution
    conf_stats = df['confidence'].describe()
    print(f"\n📊 Confidence Distribution:")
    print(f"   Mean: {conf_stats['mean']:.3f}")
    print(f"   Median: {conf_stats['50%']:.3f}")
    print(f"   Std Dev: {conf_stats['std']:.3f}")
    print(f"   Min: {conf_stats['min']:.3f}")
    print(f"   Max: {conf_stats['max']:.3f}")
    
    # Confidence bins analysis
    confidence_bins = [0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
    df['confidence_bin'] = pd.cut(df['confidence'], bins=confidence_bins, include_lowest=True)
    
    confidence_analysis = df.groupby('confidence_bin').agg({
        'correct': ['mean', 'count'],
        'confidence': 'mean'
    }).round(3)
    
    confidence_analysis.columns = ['accuracy', 'count', 'avg_confidence']
    
    print(f"\n🎯 Performance by Confidence Level:")
    for bin_name, row in confidence_analysis.iterrows():
        if row['count'] > 0:
            print(f"   {str(bin_name):15s}: {row['accuracy']:.1%} accuracy ({int(row['count']):4d} predictions)")
    
    # Find optimal confidence threshold
    thresholds = np.arange(0.5, 1.0, 0.05)
    threshold_analysis = []
    
    for threshold in thresholds:
        high_conf_preds = df[df['confidence'] >= threshold]
        if len(high_conf_preds) > 10:
            accuracy = high_conf_preds['correct'].mean()
            count = len(high_conf_preds)
            coverage = count / len(df)
            improvement = accuracy - df['correct'].mean()
            
            threshold_analysis.append({
                'threshold': threshold,
                'accuracy': accuracy,
                'count': count,
                'coverage': coverage,
                'improvement': improvement
            })
    
    if threshold_analysis:
        threshold_df = pd.DataFrame(threshold_analysis)
        # Find best threshold (balance accuracy improvement and coverage)
        threshold_df['score'] = threshold_df['improvement'] * np.sqrt(threshold_df['coverage'])
        best_threshold = threshold_df.loc[threshold_df['score'].idxmax()]
        
        print(f"\n🏆 Optimal Confidence Threshold Analysis:")
        print(f"   Recommended Threshold: {best_threshold['threshold']:.2f}")
        print(f"   Accuracy at Threshold: {best_threshold['accuracy']:.1%}")
        print(f"   Predictions Remaining: {best_threshold['count']:,} ({best_threshold['coverage']:.1%} coverage)")
        print(f"   Accuracy Improvement: +{best_threshold['improvement']:.1%}")
        
        return threshold_df, best_threshold
    
    return None, None

threshold_analysis, best_threshold = analyze_confidence_patterns(predictions_df)

## 8. Trading Strategy Simulation

In [None]:
def simulate_trading_strategies(df, market_data=None):
    """
    Simulate simple trading strategies based on predictions
    """
    print("💰 TRADING STRATEGY SIMULATION")
    print("="*50)
    
    strategies = {}
    
    # Strategy 1: Trade all predictions
    df['strategy_return'] = np.where(
        df['correct'], 
        0.01,  # 1% return for correct prediction
        -0.01  # -1% return for incorrect prediction
    )
    
    total_return_all = df['strategy_return'].sum()
    win_rate_all = df['correct'].mean()
    total_trades_all = len(df)
    
    strategies['all_predictions'] = {
        'name': 'Trade All Predictions',
        'total_return': total_return_all,
        'win_rate': win_rate_all,
        'total_trades': total_trades_all,
        'avg_return_per_trade': total_return_all / total_trades_all if total_trades_all > 0 else 0
    }
    
    # Strategy 2: Trade only high confidence predictions
    high_conf_mask = df['confidence'] >= 0.7
    high_conf_df = df[high_conf_mask]
    
    if len(high_conf_df) > 0:
        total_return_hc = high_conf_df['strategy_return'].sum()
        win_rate_hc = high_conf_df['correct'].mean()
        total_trades_hc = len(high_conf_df)
        
        strategies['high_confidence'] = {
            'name': 'High Confidence Only (≥0.7)',
            'total_return': total_return_hc,
            'win_rate': win_rate_hc,
            'total_trades': total_trades_hc,
            'avg_return_per_trade': total_return_hc / total_trades_hc
        }
    
    # Strategy 3: Trade only best performing stocks
    if 'stock_performance' in globals():
        best_stocks = stock_performance[stock_performance['accuracy'] > 0.55]['ticker'].tolist()
        best_stocks_df = df[df['ticker'].isin(best_stocks)]
        
        if len(best_stocks_df) > 0:
            total_return_bs = best_stocks_df['strategy_return'].sum()
            win_rate_bs = best_stocks_df['correct'].mean()
            total_trades_bs = len(best_stocks_df)
            
            strategies['best_stocks'] = {
                'name': f'Best Stocks Only (>55% accuracy)',
                'total_return': total_return_bs,
                'win_rate': win_rate_bs,
                'total_trades': total_trades_bs,
                'avg_return_per_trade': total_return_bs / total_trades_bs
            }
    
    # Strategy 4: Combined (High confidence + Best stocks)
    if 'best_stocks' in strategies:
        combined_mask = (df['confidence'] >= 0.7) & (df['ticker'].isin(best_stocks))
        combined_df = df[combined_mask]
        
        if len(combined_df) > 0:
            total_return_cb = combined_df['strategy_return'].sum()
            win_rate_cb = combined_df['correct'].mean()
            total_trades_cb = len(combined_df)
            
            strategies['combined'] = {
                'name': 'Combined (High Conf + Best Stocks)',
                'total_return': total_return_cb,
                'win_rate': win_rate_cb,
                'total_trades': total_trades_cb,
                'avg_return_per_trade': total_return_cb / total_trades_cb
            }
    
    # Display results
    print(f"\n📊 Strategy Comparison (Simplified Returns):")
    print(f"{'Strategy':<35} {'Return':<10} {'Win Rate':<10} {'Trades':<8} {'Avg/Trade':<10}")
    print("-" * 75)
    
    for strategy in strategies.values():
        print(f"{strategy['name']:<35} {strategy['total_return']:>+7.1%} {strategy['win_rate']:>8.1%} {strategy['total_trades']:>6d} {strategy['avg_return_per_trade']:>+8.2%}")
    
    # Risk metrics
    print(f"\n📈 Risk Metrics:")
    
    for name, strategy in strategies.items():
        if name == 'all_predictions':
            strategy_df = df
        elif name == 'high_confidence':
            strategy_df = df[df['confidence'] >= 0.7]
        elif name == 'best_stocks':
            strategy_df = df[df['ticker'].isin(best_stocks)]
        elif name == 'combined':
            strategy_df = df[(df['confidence'] >= 0.7) & (df['ticker'].isin(best_stocks))]
        else:
            continue
            
        if len(strategy_df) > 1:
            returns = strategy_df['strategy_return']
            volatility = returns.std()
            sharpe = returns.mean() / volatility if volatility > 0 else 0
            max_loss_streak = 0
            current_streak = 0
            
            # Calculate maximum loss streak
            for ret in returns:
                if ret < 0:
                    current_streak += 1
                    max_loss_streak = max(max_loss_streak, current_streak)
                else:
                    current_streak = 0
            
            print(f"   {strategy['name']:<35}: Volatility={volatility:.3f}, Sharpe={sharpe:.2f}, Max Loss Streak={max_loss_streak}")
    
    return strategies

trading_strategies = simulate_trading_strategies(predictions_df)

## 9. Advanced Analysis: Market Conditions Impact

In [None]:
def analyze_market_conditions_impact(df, market_data=None):
    """
    Analyze how different market conditions affect prediction accuracy
    """
    print("📊 MARKET CONDITIONS IMPACT ANALYSIS")
    print("="*50)
    
    if market_data is None or len(market_data) == 0:
        print("⚠️  No market data available for market conditions analysis")
        print("💡 To enable this analysis, ensure market data is downloaded successfully")
        return
    
    # Analyze performance during different volatility regimes
    # Get SPY data as market proxy
    try:
        spy_data = yf.Ticker('SPY').history(start=df['prediction_date'].min(), 
                                          end=df['prediction_date'].max())
        
        if not spy_data.empty:
            # Calculate daily market returns and volatility
            spy_data['returns'] = spy_data['Close'].pct_change()
            spy_data['volatility'] = spy_data['returns'].rolling(window=10).std()
            
            # Add market regime to predictions
            df_with_market = df.copy()
            df_with_market['market_return'] = df_with_market['prediction_date'].map(
                lambda x: spy_data.loc[spy_data.index.date == x.date(), 'returns'].iloc[0] 
                if x.date() in spy_data.index.date else np.nan
            )
            
            df_with_market['market_volatility'] = df_with_market['prediction_date'].map(
                lambda x: spy_data.loc[spy_data.index.date == x.date(), 'volatility'].iloc[0] 
                if x.date() in spy_data.index.date else np.nan
            )
            
            # Remove NaN values
            df_with_market = df_with_market.dropna(subset=['market_return', 'market_volatility'])
            
            if len(df_with_market) > 0:
                # Classify market conditions
                vol_high_threshold = df_with_market['market_volatility'].quantile(0.75)
                vol_low_threshold = df_with_market['market_volatility'].quantile(0.25)
                
                def classify_volatility(vol):
                    if vol > vol_high_threshold:
                        return 'High Volatility'
                    elif vol < vol_low_threshold:
                        return 'Low Volatility'
                    else:
                        return 'Medium Volatility'
                
                def classify_market_direction(ret):
                    if ret > 0.005:  # >0.5% up
                        return 'Strong Up'
                    elif ret > 0:
                        return 'Weak Up'
                    elif ret > -0.005:  # >-0.5% down
                        return 'Weak Down'
                    else:
                        return 'Strong Down'
                
                df_with_market['volatility_regime'] = df_with_market['market_volatility'].apply(classify_volatility)
                df_with_market['market_regime'] = df_with_market['market_return'].apply(classify_market_direction)
                
                # Performance by volatility regime
                vol_performance = df_with_market.groupby('volatility_regime')['correct'].agg(['mean', 'count'])
                print(f"\n📈 Performance by Volatility Regime:")
                for regime, row in vol_performance.iterrows():
                    print(f"   {regime:18s}: {row['mean']:.1%} accuracy ({int(row['count']):3d} predictions)")
                
                # Performance by market direction
                market_performance = df_with_market.groupby('market_regime')['correct'].agg(['mean', 'count'])
                print(f"\n📊 Performance by Market Regime:")
                for regime, row in market_performance.iterrows():
                    print(f"   {regime:12s}: {row['mean']:.1%} accuracy ({int(row['count']):3d} predictions)")
                    
                return df_with_market
            
    except Exception as e:
        print(f"⚠️  Error in market conditions analysis: {e}")
    
    return None

market_analysis = analyze_market_conditions_impact(predictions_df, market_data)

## 10. Action Items and Recommendations

In [None]:
def generate_actionable_recommendations(overall_metrics, stock_performance, threshold_analysis, trading_strategies):
    """
    Generate specific, actionable recommendations based on analysis
    """
    print("🎯 ACTIONABLE RECOMMENDATIONS")
    print("="*50)
    
    recommendations = []
    
    # Overall performance recommendations
    if overall_metrics['is_significant'] and overall_metrics['overall_accuracy'] > 0.55:
        recommendations.append({
            'priority': 'HIGH',
            'category': 'Model Deployment',
            'action': f"Deploy model for live trading - shows {overall_metrics['overall_accuracy']:.1%} accuracy with statistical significance",
            'details': [
                "Set up automated daily prediction pipeline",
                "Implement position sizing based on confidence scores",
                "Monitor performance decay weekly"
            ]
        })
    elif overall_metrics['overall_accuracy'] > 0.52:
        recommendations.append({
            'priority': 'MEDIUM',
            'category': 'Model Improvement',
            'action': f"Model shows promise ({overall_metrics['overall_accuracy']:.1%}) but needs optimization before deployment",
            'details': [
                "Focus on high-confidence predictions only",
                "Experiment with ensemble methods",
                "Consider additional feature engineering"
            ]
        })
    else:
        recommendations.append({
            'priority': 'HIGH',
            'category': 'Model Redesign',
            'action': f"Model performance ({overall_metrics['overall_accuracy']:.1%}) requires significant improvement",
            'details': [
                "Gather more training data",
                "Try different prediction targets (volatility, relative performance)",
                "Consider market regime awareness",
                "Add macroeconomic features"
            ]
        })
    
    # Stock selection recommendations
    poor_performers = stock_performance[stock_performance['accuracy'] < 0.45]
    if len(poor_performers) > 0:
        recommendations.append({
            'priority': 'HIGH',
            'category': 'Stock Selection',
            'action': f"Exclude {len(poor_performers)} poor-performing stocks from trading universe",
            'details': [
                f"Blacklist: {', '.join(poor_performers['ticker'].tolist()[:10])}",
                "Focus on stocks with >55% accuracy",
                "Investigate sector-specific patterns"
            ]
        })
    
    best_performers = stock_performance[stock_performance['accuracy'] > 0.6]
    if len(best_performers) > 0:
        recommendations.append({
            'priority': 'MEDIUM',
            'category': 'Portfolio Concentration',
            'action': f"Consider concentrating on {len(best_performers)} top-performing stocks",
            'details': [
                f"Focus on: {', '.join(best_performers['ticker'].tolist()[:10])}",
                "Allocate more capital to high-accuracy stocks",
                "Monitor if performance persists out-of-sample"
            ]
        })
    
    # Confidence threshold recommendations
    if threshold_analysis is not None:
        best_threshold = threshold_analysis.loc[threshold_analysis['score'].idxmax()]
        recommendations.append({
            'priority': 'HIGH',
            'category': 'Risk Management',
            'action': f"Implement confidence threshold of {best_threshold['threshold']:.2f} for trade filtering",
            'details': [
                f"Expected accuracy: {best_threshold['accuracy']:.1%} (+{best_threshold['improvement']:.1%} improvement)",
                f"Trade coverage: {best_threshold['coverage']:.1%} of all signals",
                "Use confidence scores for position sizing"
            ]
        })
    
    # Trading strategy recommendations
    if trading_strategies:
        best_strategy = max(trading_strategies.values(), key=lambda x: x['total_return'])
        recommendations.append({
            'priority': 'MEDIUM',
            'category': 'Trading Strategy',
            'action': f"Implement '{best_strategy['name']}' strategy (best backtested performance)",
            'details': [
                f"Expected return: {best_strategy['total_return']:+.1%} over testing period",
                f"Win rate: {best_strategy['win_rate']:.1%}",
                f"Number of trades: {best_strategy['total_trades']}"
            ]
        })
    
    # Technical recommendations
    recommendations.extend([
        {
            'priority': 'HIGH',
            'category': 'Monitoring',
            'action': 'Set up comprehensive monitoring and alerting system',
            'details': [
                "Daily accuracy tracking",
                "Model performance decay alerts",
                "Data quality monitoring",
                "Exception handling and logging"
            ]
        },
        {
            'priority': 'MEDIUM',
            'category': 'Model Maintenance', 
            'action': 'Establish regular model retraining schedule',
            'details': [
                "Weekly performance review",
                "Monthly model retraining",
                "Quarterly feature engineering review",
                "Document model version changes"
            ]
        }
    ])
    
    # Display recommendations
    for i, rec in enumerate(recommendations, 1):
        priority_emoji = "🔴" if rec['priority'] == 'HIGH' else "🟡" if rec['priority'] == 'MEDIUM' else "🟢"
        print(f"\n{priority_emoji} {rec['priority']} PRIORITY: {rec['category']}")
        print(f"   Action: {rec['action']}")
        print(f"   Details:")
        for detail in rec['details']:
            print(f"     • {detail}")
    
    return recommendations

recommendations = generate_actionable_recommendations(
    overall_metrics, stock_performance, threshold_analysis, trading_strategies
)

## 11. Export Analysis Results

In [None]:
def export_analysis_results():
    """
    Export analysis results for future reference
    """
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    
    # Create results directory
    results_dir = Path("analysis_results")
    results_dir.mkdir(exist_ok=True)
    
    # Export stock performance
    stock_performance.to_csv(results_dir / f"stock_performance_{timestamp}.csv", index=False)
    
    # Export daily performance
    if 'daily_perf' in globals():
        daily_perf.to_csv(results_dir / f"daily_performance_{timestamp}.csv", index=False)
    
    # Export summary report
    summary_report = {
        'analysis_date': datetime.now().isoformat(),
        'data_period': {
            'start_date': predictions_df['prediction_date'].min().isoformat(),
            'end_date': predictions_df['prediction_date'].max().isoformat(),
            'total_days': (predictions_df['prediction_date'].max() - predictions_df['prediction_date'].min()).days
        },
        'overall_metrics': overall_metrics,
        'top_performing_stocks': stock_performance.head(10).to_dict('records'),
        'poor_performing_stocks': stock_performance.tail(5).to_dict('records'),
        'confidence_analysis': {
            'average_confidence': float(predictions_df['confidence'].mean()),
            'confidence_std': float(predictions_df['confidence'].std()),
            'high_confidence_count': int((predictions_df['confidence'] >= 0.7).sum()),
            'high_confidence_accuracy': float(predictions_df[predictions_df['confidence'] >= 0.7]['correct'].mean()) if (predictions_df['confidence'] >= 0.7).sum() > 0 else 0
        },
        'recommendations': recommendations
    }
    
    with open(results_dir / f"analysis_summary_{timestamp}.json", 'w') as f:
        json.dump(summary_report, f, indent=2, default=str)
    
    print(f"\n💾 Analysis results exported to: {results_dir}")
    print(f"   📊 Stock performance: stock_performance_{timestamp}.csv")
    print(f"   📅 Daily performance: daily_performance_{timestamp}.csv")
    print(f"   📋 Summary report: analysis_summary_{timestamp}.json")
    
    return results_dir

results_dir = export_analysis_results()

## Summary

This notebook provides a comprehensive analysis of your stock prediction model's performance. Here's what we've covered:

### Key Analysis Areas:
1. **📊 Overall Performance**: Statistical significance, accuracy metrics, classification performance
2. **📈 Stock-by-Stock Analysis**: Individual stock performance, best/worst performers, exclusion recommendations
3. **📅 Temporal Patterns**: Daily performance trends, day-of-week effects, performance decay analysis
4. **🎯 Confidence Analysis**: Optimal thresholds, calibration assessment, filtering strategies
5. **💰 Trading Simulation**: Strategy backtesting, risk metrics, return projections
6. **🌊 Market Conditions**: Performance under different volatility and market regimes
7. **🎯 Actionable Recommendations**: Prioritized action items for model improvement and deployment

### Next Steps:
1. **Review the recommendations** generated in section 10
2. **Implement the suggested improvements** based on your analysis results
3. **Set up monitoring** using the deployment guide (README_DEPLOYMENT.md)
4. **Re-run this analysis weekly** to track model performance over time
5. **Adjust your trading strategy** based on the confidence and stock-specific insights

### Files Generated:
- Stock performance CSV for detailed stock-level analysis
- Daily performance CSV for temporal trend analysis
- Summary JSON with key metrics and recommendations

**Remember**: Past performance doesn't guarantee future results. Always use proper risk management and consider this analysis as one input in your trading decisions.