# 🎯 NFL Player Movement - Prediction & Evaluation

**Final Predictions and Comprehensive Model Evaluation**

This notebook generates predictions on the test set and performs detailed evaluation and error analysis.

---

## 📋 Table of Contents

1. [Setup & Configuration](#1-setup)
2. [Load Trained Models](#2-load-models)
3. [Load Test Data](#3-test-data)
4. [Generate Predictions](#4-predictions)
5. [Overall Evaluation](#5-evaluation)
6. [Residual Analysis](#6-residuals)
7. [Error by Categories](#7-categories)
8. [Prediction Visualization](#8-visualization)
9. [Export Results](#9-export)

---

## 1. Setup & Configuration 🔧

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
import pickle
from datetime import datetime
warnings.filterwarnings('ignore')

# ML libraries
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from scipy import stats

# Set plotting style
sns.set_style('darkgrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("✅ Libraries imported successfully")

In [None]:
# Configuration
class Config:
    """Prediction and evaluation configuration"""
    
    # Paths
    DATA_DIR = Path('../data/raw/train')
    MODEL_DIR = Path('../outputs/model_comparison')
    OUTPUT_DIR = Path('../outputs/predictions')
    
    # Data settings
    USE_SAMPLE = True
    SAMPLE_SIZE = 50000
    MAX_FILES = 2
    
    # Evaluation settings
    CONFIDENCE_LEVEL = 0.95
    RANDOM_STATE = 42

config = Config()
config.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print("✅ Configuration loaded")
print(f"   Model directory: {config.MODEL_DIR}")
print(f"   Output directory: {config.OUTPUT_DIR}")

## 2. Load Trained Models 📦

In [None]:
print("📦 Loading trained models...\n")

# Find model files
model_x_files = list(config.MODEL_DIR.glob('best_model_x_*.pkl'))
model_y_files = list(config.MODEL_DIR.glob('best_model_y_*.pkl'))

if model_x_files and model_y_files:
    # Load models
    with open(model_x_files[0], 'rb') as f:
        model_x = pickle.load(f)
    
    with open(model_y_files[0], 'rb') as f:
        model_y = pickle.load(f)
    
    model_x_name = model_x_files[0].stem.replace('best_model_x_', '')
    model_y_name = model_y_files[0].stem.replace('best_model_y_', '')
    
    print(f"   ✓ X model loaded: {model_x_name}")
    print(f"   ✓ Y model loaded: {model_y_name}")
    
    # Load feature names
    feature_file = config.MODEL_DIR / 'feature_names.pkl'
    if feature_file.exists():
        with open(feature_file, 'rb') as f:
            feature_names = pickle.load(f)
        print(f"   ✓ Feature names loaded: {len(feature_names)} features")
    else:
        feature_names = None
        print("   ⚠️  Feature names not found")
    
    MODELS_LOADED = True
else:
    print("⚠️  No trained models found. Run 04_model_comparison.ipynb first.")
    MODELS_LOADED = False

print("\n✅ Models loaded" if MODELS_LOADED else "")

## 3. Load Test Data 📂

In [None]:
def load_and_prepare_data(data_dir, max_files=None, sample_size=None):
    """
    Load and prepare test data
    """
    print("📂 Loading test data...\n")
    
    # Load files
    input_files = sorted(data_dir.glob('input_*.csv'))[:max_files] if max_files else sorted(data_dir.glob('input_*.csv'))
    output_files = sorted(data_dir.glob('output_*.csv'))[:max_files] if max_files else sorted(data_dir.glob('output_*.csv'))
    
    input_df = pd.concat([pd.read_csv(f) for f in input_files], ignore_index=True)
    output_df = pd.concat([pd.read_csv(f) for f in output_files], ignore_index=True)
    
    print(f"   Input: {input_df.shape}")
    print(f"   Output: {output_df.shape}")
    
    # Sample
    if sample_size and len(input_df) > sample_size:
        input_df = input_df.sample(n=sample_size, random_state=42)
        sampled_keys = input_df[['game_id', 'play_id', 'nfl_id', 'frame_id']]
        output_df = output_df.merge(sampled_keys, on=['game_id', 'play_id', 'nfl_id', 'frame_id'])
    
    # Merge
    df = input_df.merge(
        output_df[['game_id', 'play_id', 'nfl_id', 'frame_id', 'x', 'y']],
        on=['game_id', 'play_id', 'nfl_id', 'frame_id'],
        suffixes=('', '_target')
    )
    df = df.rename(columns={'x_target': 'target_x', 'y_target': 'target_y'})
    
    # Handle missing
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    for col in numeric_cols:
        if df[col].isnull().any():
            df[col].fillna(df[col].median(), inplace=True)
    
    print(f"\n✅ Data loaded: {df.shape}")
    return df


def create_features(df):
    """
    Create features for prediction
    """
    print("\n⚙️  Creating features...\n")
    df_copy = df.copy()
    
    # Physics features
    if 's' in df_copy.columns and 'dir' in df_copy.columns:
        df_copy['velocity_x'] = df_copy['s'] * np.cos(np.radians(df_copy['dir']))
        df_copy['velocity_y'] = df_copy['s'] * np.sin(np.radians(df_copy['dir']))
    
    if 'a' in df_copy.columns and 'dir' in df_copy.columns:
        df_copy['acceleration_x'] = df_copy['a'] * np.cos(np.radians(df_copy['dir']))
        df_copy['acceleration_y'] = df_copy['a'] * np.sin(np.radians(df_copy['dir']))
    
    if 'player_weight' in df_copy.columns and 's' in df_copy.columns:
        df_copy['momentum'] = df_copy['player_weight'] * df_copy['s']
        df_copy['kinetic_energy'] = 0.5 * df_copy['player_weight'] * (df_copy['s'] ** 2)
    
    # Spatial features
    if all(col in df_copy.columns for col in ['x', 'y', 'ball_land_x', 'ball_land_y']):
        df_copy['dist_to_ball'] = np.sqrt((df_copy['x'] - df_copy['ball_land_x'])**2 + (df_copy['y'] - df_copy['ball_land_y'])**2)
        df_copy['dx_to_ball'] = df_copy['ball_land_x'] - df_copy['x']
        df_copy['dy_to_ball'] = df_copy['ball_land_y'] - df_copy['y']
    
    if 'x' in df_copy.columns:
        df_copy['field_position_norm'] = df_copy['x'] / 120.0
        df_copy['dist_to_sideline'] = np.minimum(df_copy['y'], 53.3 - df_copy['y'])
    
    # NFL features
    if 'player_role' in df_copy.columns:
        df_copy['is_targeted_receiver'] = (df_copy['player_role'] == 'Targeted Receiver').astype(int)
        df_copy['is_passer'] = (df_copy['player_role'] == 'Passer').astype(int)
    
    print(f"   ✓ Features created: {len([c for c in df_copy.columns if c not in df.columns])} new")
    return df_copy


# Load and prepare test data
test_df = load_and_prepare_data(
    config.DATA_DIR,
    max_files=config.MAX_FILES,
    sample_size=config.SAMPLE_SIZE if config.USE_SAMPLE else None
)

test_df = create_features(test_df)

## 4. Generate Predictions 🎯

In [None]:
if MODELS_LOADED:
    print("🎯 Generating predictions...\n")
    
    # Prepare features
    exclude_cols = ['game_id', 'play_id', 'nfl_id', 'frame_id', 'target_x', 'target_y',
                    'player_name', 'player_position', 'player_role', 'player_side',
                    'play_direction', 'player_birth_date', 'player_to_predict']
    
    if feature_names:
        X_test = test_df[feature_names].fillna(0)
    else:
        feature_cols = [col for col in test_df.columns if col not in exclude_cols and test_df[col].dtype in ['int64', 'float64']]
        X_test = test_df[feature_cols].fillna(0)
    
    y_test_x = test_df['target_x'].fillna(test_df['target_x'].median())
    y_test_y = test_df['target_y'].fillna(test_df['target_y'].median())
    
    print(f"   Features shape: {X_test.shape}")
    print(f"   Targets shape: ({len(y_test_x)}, 2)")
    
    # Generate predictions
    pred_x = model_x.predict(X_test)
    pred_y = model_y.predict(X_test)
    
    print(f"\n✅ Predictions generated")
    print(f"   X predictions: {len(pred_x):,}")
    print(f"   Y predictions: {len(pred_y):,}")

## 5. Overall Evaluation 📊

Calculate comprehensive performance metrics.

In [None]:
if MODELS_LOADED:
    print("="*70)
    print("OVERALL MODEL PERFORMANCE")
    print("="*70 + "\n")
    
    # Calculate metrics
    metrics = {
        'X Coordinate': {
            'RMSE': np.sqrt(mean_squared_error(y_test_x, pred_x)),
            'MAE': mean_absolute_error(y_test_x, pred_x),
            'R²': r2_score(y_test_x, pred_x),
            'MAPE': np.mean(np.abs((y_test_x - pred_x) / y_test_x)) * 100
        },
        'Y Coordinate': {
            'RMSE': np.sqrt(mean_squared_error(y_test_y, pred_y)),
            'MAE': mean_absolute_error(y_test_y, pred_y),
            'R²': r2_score(y_test_y, pred_y),
            'MAPE': np.mean(np.abs((y_test_y - pred_y) / y_test_y)) * 100
        }
    }
    
    # Euclidean error
    euclidean_error = np.sqrt((y_test_x - pred_x)**2 + (y_test_y - pred_y)**2)
    
    # Display metrics
    for coord, vals in metrics.items():
        print(f"📊 {coord}:")
        for metric, value in vals.items():
            print(f"   {metric:6s}: {value:.4f}")
        print()
    
    print(f"📏 Euclidean Distance Error:")
    print(f"   Mean:   {euclidean_error.mean():.4f} yards")
    print(f"   Median: {np.median(euclidean_error):.4f} yards")
    print(f"   Std:    {euclidean_error.std():.4f} yards")
    print(f"   Max:    {euclidean_error.max():.4f} yards")
    
    print("\n" + "="*70)
    
    # Create metrics dataframe
    metrics_df = pd.DataFrame(metrics).T
    metrics_df.to_csv(config.OUTPUT_DIR / 'overall_metrics.csv')
    print(f"\n✅ Metrics saved to: {config.OUTPUT_DIR / 'overall_metrics.csv'}")

## 6. Residual Analysis 📉

Analyze prediction residuals for model diagnostics.

In [None]:
if MODELS_LOADED:
    print("📉 Analyzing residuals...\n")
    
    # Calculate residuals
    residuals_x = y_test_x - pred_x
    residuals_y = y_test_y - pred_y
    
    # Visualize residuals
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # Row 1: X coordinate
    # 1. Residuals vs Predicted
    axes[0, 0].scatter(pred_x, residuals_x, alpha=0.3, s=1, c='blue')
    axes[0, 0].axhline(0, color='red', linestyle='--', linewidth=2)
    axes[0, 0].set_xlabel('Predicted X', fontsize=11)
    axes[0, 0].set_ylabel('Residuals', fontsize=11)
    axes[0, 0].set_title('X: Residuals vs Predicted', fontsize=12, fontweight='bold')
    axes[0, 0].grid(alpha=0.3)
    
    # 2. Histogram
    axes[0, 1].hist(residuals_x, bins=50, edgecolor='black', alpha=0.7, color='blue')
    axes[0, 1].axvline(0, color='red', linestyle='--', linewidth=2)
    axes[0, 1].axvline(residuals_x.mean(), color='green', linestyle='--', linewidth=2, label=f'Mean: {residuals_x.mean():.3f}')
    axes[0, 1].set_xlabel('Residuals', fontsize=11)
    axes[0, 1].set_ylabel('Frequency', fontsize=11)
    axes[0, 1].set_title('X: Residual Distribution', fontsize=12, fontweight='bold')
    axes[0, 1].legend()
    
    # 3. Q-Q Plot
    stats.probplot(residuals_x, dist="norm", plot=axes[0, 2])
    axes[0, 2].set_title('X: Q-Q Plot', fontsize=12, fontweight='bold')
    axes[0, 2].grid(alpha=0.3)
    
    # Row 2: Y coordinate
    # 4. Residuals vs Predicted
    axes[1, 0].scatter(pred_y, residuals_y, alpha=0.3, s=1, c='orange')
    axes[1, 0].axhline(0, color='red', linestyle='--', linewidth=2)
    axes[1, 0].set_xlabel('Predicted Y', fontsize=11)
    axes[1, 0].set_ylabel('Residuals', fontsize=11)
    axes[1, 0].set_title('Y: Residuals vs Predicted', fontsize=12, fontweight='bold')
    axes[1, 0].grid(alpha=0.3)
    
    # 5. Histogram
    axes[1, 1].hist(residuals_y, bins=50, edgecolor='black', alpha=0.7, color='orange')
    axes[1, 1].axvline(0, color='red', linestyle='--', linewidth=2)
    axes[1, 1].axvline(residuals_y.mean(), color='green', linestyle='--', linewidth=2, label=f'Mean: {residuals_y.mean():.3f}')
    axes[1, 1].set_xlabel('Residuals', fontsize=11)
    axes[1, 1].set_ylabel('Frequency', fontsize=11)
    axes[1, 1].set_title('Y: Residual Distribution', fontsize=12, fontweight='bold')
    axes[1, 1].legend()
    
    # 6. Q-Q Plot
    stats.probplot(residuals_y, dist="norm", plot=axes[1, 2])
    axes[1, 2].set_title('Y: Q-Q Plot', fontsize=12, fontweight='bold')
    axes[1, 2].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'residual_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("✅ Residual analysis complete")

## 7. Error by Categories 🏷️

Analyze errors by player position, field region, speed, and distance to ball.

In [None]:
if MODELS_LOADED:
    # Add predictions to dataframe
    test_df['pred_x'] = pred_x
    test_df['pred_y'] = pred_y
    test_df['error_x'] = np.abs(y_test_x - pred_x)
    test_df['error_y'] = np.abs(y_test_y - pred_y)
    test_df['euclidean_error'] = euclidean_error

In [None]:
if MODELS_LOADED:
    print("🏷️  Analyzing errors by categories...\n")
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Error by Player Position
    if 'player_position' in test_df.columns:
        pos_errors = test_df.groupby('player_position')['euclidean_error'].agg(['mean', 'median', 'count'])
        pos_errors = pos_errors[pos_errors['count'] >= 50].sort_values('mean', ascending=False).head(10)
        
        axes[0, 0].barh(range(len(pos_errors)), pos_errors['mean'], alpha=0.7, color='steelblue')
        axes[0, 0].set_yticks(range(len(pos_errors)))
        axes[0, 0].set_yticklabels(pos_errors.index)
        axes[0, 0].set_xlabel('Mean Euclidean Error (yards)', fontsize=11)
        axes[0, 0].set_title('Error by Player Position (Top 10)', fontsize=12, fontweight='bold')
        axes[0, 0].grid(axis='x', alpha=0.3)
        
        print("📊 Error by Position (Top 10):")
        display(pos_errors.head(10))
    
    # 2. Error by Field Region
    if 'x' in test_df.columns:
        test_df['field_region'] = pd.cut(test_df['x'], bins=[0, 20, 40, 60, 80, 100, 120],
                                         labels=['0-20', '20-40', '40-60', '60-80', '80-100', '100-120'])
        region_errors = test_df.groupby('field_region')['euclidean_error'].mean()
        
        axes[0, 1].bar(range(len(region_errors)), region_errors.values, alpha=0.7, color='orange')
        axes[0, 1].set_xticks(range(len(region_errors)))
        axes[0, 1].set_xticklabels(region_errors.index, rotation=45)
        axes[0, 1].set_ylabel('Mean Euclidean Error (yards)', fontsize=11)
        axes[0, 1].set_xlabel('Field Region (X yards)', fontsize=11)
        axes[0, 1].set_title('Error by Field Region', fontsize=12, fontweight='bold')
        axes[0, 1].grid(axis='y', alpha=0.3)
    
    # 3. Error by Speed Range
    if 's' in test_df.columns:
        test_df['speed_range'] = pd.cut(test_df['s'], bins=[0, 2, 4, 6, 8, 100],
                                        labels=['0-2', '2-4', '4-6', '6-8', '8+'])
        speed_errors = test_df.groupby('speed_range')['euclidean_error'].mean()
        
        axes[1, 0].bar(range(len(speed_errors)), speed_errors.values, alpha=0.7, color='green')
        axes[1, 0].set_xticks(range(len(speed_errors)))
        axes[1, 0].set_xticklabels(speed_errors.index)
        axes[1, 0].set_ylabel('Mean Euclidean Error (yards)', fontsize=11)
        axes[1, 0].set_xlabel('Speed Range (yards/sec)', fontsize=11)
        axes[1, 0].set_title('Error by Speed Range', fontsize=12, fontweight='bold')
        axes[1, 0].grid(axis='y', alpha=0.3)
    
    # 4. Error by Distance to Ball
    if 'dist_to_ball' in test_df.columns:
        test_df['dist_to_ball_range'] = pd.cut(test_df['dist_to_ball'], bins=[0, 10, 20, 30, 40, 1000],
                                               labels=['0-10', '10-20', '20-30', '30-40', '40+'])
        ball_errors = test_df.groupby('dist_to_ball_range')['euclidean_error'].mean()
        
        axes[1, 1].bar(range(len(ball_errors)), ball_errors.values, alpha=0.7, color='purple')
        axes[1, 1].set_xticks(range(len(ball_errors)))
        axes[1, 1].set_xticklabels(ball_errors.index)
        axes[1, 1].set_ylabel('Mean Euclidean Error (yards)', fontsize=11)
        axes[1, 1].set_xlabel('Distance to Ball (yards)', fontsize=11)
        axes[1, 1].set_title('Error by Distance to Ball', fontsize=12, fontweight='bold')
        axes[1, 1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'error_by_categories.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\n✅ Category analysis complete")

## 8. Prediction Visualization 🎨

Visualize predictions on the field and compare with actual positions.

In [None]:
if MODELS_LOADED:
    print("🎨 Visualizing predictions...\n")
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Predicted vs Actual (X)
    axes[0, 0].scatter(y_test_x, pred_x, alpha=0.3, s=1, c='blue')
    axes[0, 0].plot([y_test_x.min(), y_test_x.max()], [y_test_x.min(), y_test_x.max()], 'r--', lw=2)
    axes[0, 0].set_xlabel('Actual X', fontsize=12)
    axes[0, 0].set_ylabel('Predicted X', fontsize=12)
    axes[0, 0].set_title(f'X Coordinate: Predicted vs Actual\n(R² = {metrics["X Coordinate"]["R²"]:.4f})', 
                         fontsize=13, fontweight='bold')
    axes[0, 0].grid(alpha=0.3)
    
    # 2. Predicted vs Actual (Y)
    axes[0, 1].scatter(y_test_y, pred_y, alpha=0.3, s=1, c='orange')
    axes[0, 1].plot([y_test_y.min(), y_test_y.max()], [y_test_y.min(), y_test_y.max()], 'r--', lw=2)
    axes[0, 1].set_xlabel('Actual Y', fontsize=12)
    axes[0, 1].set_ylabel('Predicted Y', fontsize=12)
    axes[0, 1].set_title(f'Y Coordinate: Predicted vs Actual\n(R² = {metrics["Y Coordinate"]["R²"]:.4f})', 
                         fontsize=13, fontweight='bold')
    axes[0, 1].grid(alpha=0.3)
    
    # 3. Error heatmap on field
    sample = test_df.sample(min(5000, len(test_df)))
    scatter = axes[1, 0].scatter(sample['x'], sample['y'], c=sample['euclidean_error'], 
                                 cmap='RdYlGn_r', alpha=0.5, s=10, vmin=0, vmax=np.percentile(euclidean_error, 95))
    axes[1, 0].set_xlabel('X Position (yards)', fontsize=12)
    axes[1, 0].set_ylabel('Y Position (yards)', fontsize=12)
    axes[1, 0].set_title('Error Heatmap on Field', fontsize=13, fontweight='bold')
    axes[1, 0].set_xlim(0, 120)
    axes[1, 0].set_ylim(0, 53.3)
    plt.colorbar(scatter, ax=axes[1, 0], label='Error (yards)')
    
    # 4. Error distribution (Euclidean)
    axes[1, 1].hist(euclidean_error, bins=50, edgecolor='black', alpha=0.7, color='purple')
    axes[1, 1].axvline(euclidean_error.mean(), color='red', linestyle='--', linewidth=2, 
                       label=f'Mean: {euclidean_error.mean():.2f}')
    axes[1, 1].axvline(np.median(euclidean_error), color='green', linestyle='--', linewidth=2, 
                       label=f'Median: {np.median(euclidean_error):.2f}')
    axes[1, 1].set_xlabel('Euclidean Error (yards)', fontsize=12)
    axes[1, 1].set_ylabel('Frequency', fontsize=12)
    axes[1, 1].set_title('Euclidean Distance Error Distribution', fontsize=13, fontweight='bold')
    axes[1, 1].legend()
    axes[1, 1].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'prediction_visualization.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("✅ Predictions visualized")

In [None]:
if MODELS_LOADED:
    # Visualize sample play trajectories
    print("\n🏈 Visualizing sample play trajectories...\n")
    
    # Select a random play with multiple players
    if 'play_id' in test_df.columns:
        play_counts = test_df.groupby('play_id').size()
        sample_play = play_counts[play_counts >= 10].sample(1, random_state=42).index[0]
        play_data = test_df[test_df['play_id'] == sample_play]
        
        fig, ax = plt.subplots(figsize=(16, 8))
        
        # Plot field boundaries
        ax.plot([0, 120], [0, 0], 'k-', linewidth=2)
        ax.plot([0, 120], [53.3, 53.3], 'k-', linewidth=2)
        ax.plot([0, 0], [0, 53.3], 'k-', linewidth=2)
        ax.plot([120, 120], [0, 53.3], 'k-', linewidth=2)
        
        # Plot yard lines
        for yard in range(10, 120, 10):
            ax.plot([yard, yard], [0, 53.3], 'k-', linewidth=0.5, alpha=0.3)
        
        # Plot players
        for idx, row in play_data.iterrows():
            # Current position
            ax.plot(row['x'], row['y'], 'bo', markersize=8, alpha=0.7)
            
            # Actual future position
            ax.plot(row['target_x'], row['target_y'], 'go', markersize=10, alpha=0.7)
            
            # Predicted future position
            ax.plot(row['pred_x'], row['pred_y'], 'r^', markersize=10, alpha=0.7)
            
            # Lines
            ax.plot([row['x'], row['target_x']], [row['y'], row['target_y']], 'g--', alpha=0.3, linewidth=1)
            ax.plot([row['x'], row['pred_x']], [row['y'], row['pred_y']], 'r--', alpha=0.3, linewidth=1)
        
        # Legend
        ax.plot([], [], 'bo', markersize=8, label='Current Position')
        ax.plot([], [], 'go', markersize=10, label='Actual Future')
        ax.plot([], [], 'r^', markersize=10, label='Predicted Future')
        
        ax.set_xlabel('X Position (yards)', fontsize=12)
        ax.set_ylabel('Y Position (yards)', fontsize=12)
        ax.set_title(f'Sample Play Trajectories (Play ID: {sample_play})', fontsize=14, fontweight='bold')
        ax.legend(fontsize=11, loc='upper right')
        ax.set_xlim(-5, 125)
        ax.set_ylim(-5, 58)
        ax.grid(alpha=0.3)
        
        plt.tight_layout()
        plt.savefig(config.OUTPUT_DIR / 'sample_play_trajectories.png', dpi=150, bbox_inches='tight')
        plt.show()
        
        print("✅ Sample play visualized")

## 9. Export Results 💾

Export predictions and create submission file.

In [None]:
if MODELS_LOADED:
    print("💾 Exporting results...\n")
    
    # Create submission file
    submission = test_df[['game_id', 'play_id', 'nfl_id', 'frame_id']].copy()
    submission['x'] = pred_x
    submission['y'] = pred_y
    
    submission_file = config.OUTPUT_DIR / 'submission.csv'
    submission.to_csv(submission_file, index=False)
    print(f"   ✓ Submission saved: {submission_file}")
    print(f"      Shape: {submission.shape}")
    
    # Save detailed predictions with errors
    detailed_predictions = test_df[['game_id', 'play_id', 'nfl_id', 'frame_id', 
                                     'x', 'y', 'target_x', 'target_y', 
                                     'pred_x', 'pred_y', 'error_x', 'error_y', 'euclidean_error']].copy()
    
    detailed_file = config.OUTPUT_DIR / 'detailed_predictions.csv'
    detailed_predictions.to_csv(detailed_file, index=False)
    print(f"   ✓ Detailed predictions saved: {detailed_file}")
    
    # Save error summary by categories
    error_summary = {
        'by_position': test_df.groupby('player_position')['euclidean_error'].mean().to_dict() if 'player_position' in test_df.columns else {},
        'by_field_region': test_df.groupby('field_region')['euclidean_error'].mean().to_dict() if 'field_region' in test_df.columns else {},
        'by_speed_range': test_df.groupby('speed_range')['euclidean_error'].mean().to_dict() if 'speed_range' in test_df.columns else {},
        'by_distance_to_ball': test_df.groupby('dist_to_ball_range')['euclidean_error'].mean().to_dict() if 'dist_to_ball_range' in test_df.columns else {}
    }
    
    import json
    with open(config.OUTPUT_DIR / 'error_summary.json', 'w') as f:
        json.dump(error_summary, f, indent=2, default=str)
    
    print(f"   ✓ Error summary saved: {config.OUTPUT_DIR / 'error_summary.json'}")
    
    # Generate final report
    report = {
        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'model_x': model_x_name,
        'model_y': model_y_name,
        'num_predictions': len(submission),
        'metrics': {
            'x_rmse': float(metrics['X Coordinate']['RMSE']),
            'y_rmse': float(metrics['Y Coordinate']['RMSE']),
            'x_mae': float(metrics['X Coordinate']['MAE']),
            'y_mae': float(metrics['Y Coordinate']['MAE']),
            'x_r2': float(metrics['X Coordinate']['R²']),
            'y_r2': float(metrics['Y Coordinate']['R²']),
            'mean_euclidean_error': float(euclidean_error.mean()),
            'median_euclidean_error': float(np.median(euclidean_error))
        },
        'files': {
            'submission': str(submission_file),
            'detailed_predictions': str(detailed_file),
            'error_summary': str(config.OUTPUT_DIR / 'error_summary.json')
        }
    }
    
    with open(config.OUTPUT_DIR / 'final_report.json', 'w') as f:
        json.dump(report, f, indent=2)
    
    print(f"   ✓ Final report saved: {config.OUTPUT_DIR / 'final_report.json'}")
    
    print(f"\n✅ All results exported to: {config.OUTPUT_DIR}")

In [None]:
if MODELS_LOADED:
    # Display final summary
    print("\n" + "="*70)
    print("FINAL SUMMARY")
    print("="*70 + "\n")
    
    print(f"📊 Models Used:")
    print(f"   X coordinate: {model_x_name}")
    print(f"   Y coordinate: {model_y_name}")
    
    print(f"\n📈 Performance:")
    print(f"   Average RMSE: {(metrics['X Coordinate']['RMSE'] + metrics['Y Coordinate']['RMSE']) / 2:.4f} yards")
    print(f"   Mean Euclidean Error: {euclidean_error.mean():.4f} yards")
    print(f"   Median Euclidean Error: {np.median(euclidean_error):.4f} yards")
    
    print(f"\n📁 Output Files:")
    print(f"   Submission: {submission_file}")
    print(f"   Detailed predictions: {detailed_file}")
    print(f"   Final report: {config.OUTPUT_DIR / 'final_report.json'}")
    
    print(f"\n🎯 Next Steps:")
    print(f"   1. Review error analysis for improvement opportunities")
    print(f"   2. Consider ensemble methods to combine multiple models")
    print(f"   3. Experiment with feature engineering")
    print(f"   4. Submit predictions to competition (if applicable)")
    
    print("\n" + "="*70)

---

## 🎉 Prediction & Evaluation Complete!

### Summary:

✅ **Models Loaded**: Best performing models from model comparison  
✅ **Predictions Generated**: Future player positions predicted  
✅ **Comprehensive Evaluation**: RMSE, MAE, R², MAPE calculated  
✅ **Residual Analysis**: Diagnostic plots for model quality  
✅ **Category Analysis**: Errors by position, region, speed, distance  
✅ **Visualizations**: Field heatmaps, trajectories, error distributions  
✅ **Results Exported**: Submission file and detailed analysis  

### Key Insights:

1. **Model Performance**: Check RMSE and R² scores above
2. **Error Patterns**: Some positions/regions harder to predict
3. **Residuals**: Check Q-Q plots for normality assumption
4. **Field Coverage**: Errors may vary by field location

### Files Created:

- `submission.csv`: Competition submission file
- `detailed_predictions.csv`: Predictions with errors
- `final_report.json`: Summary of results
- `overall_metrics.csv`: Performance metrics
- `error_summary.json`: Errors by categories
- Multiple visualization PNGs

### Recommendations:

1. **High Error Positions**: Focus on improving predictions for difficult positions
2. **Field Regions**: Consider region-specific models
3. **Speed Ranges**: Different models for different speed ranges
4. **Ensemble**: Combine multiple models for better performance

---

**Congratulations! You've completed the full NFL Player Movement Prediction pipeline!** 🏈🎉

---