# 🧠 NFL Player Movement - LSTM Sequence Modeling

**Deep Learning for Trajectory Prediction using LSTM Networks**

This notebook explores LSTM (Long Short-Term Memory) networks for predicting player trajectories using temporal sequences.

---

## 📋 Table of Contents

1. [Setup & Configuration](#1-setup)
2. [Data Loading](#2-data)
3. [Sequence Creation](#3-sequences)
4. [LSTM Architecture](#4-architecture)
5. [Model Training](#5-training)
6. [Predictions & Visualization](#6-predictions)
7. [Comparison with Traditional Models](#7-comparison)
8. [Error Analysis](#8-analysis)

---

## 1. Setup & Configuration 🔧

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
import pickle
warnings.filterwarnings('ignore')

# PyTorch
try:
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.utils.data import Dataset, DataLoader, TensorDataset
    HAS_PYTORCH = True
    print(f"✅ PyTorch {torch.__version__} available")
    print(f"   Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")
except ImportError:
    HAS_PYTORCH = False
    print("⚠️  PyTorch not installed. Install with: pip install torch")

# ML libraries
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set plotting style
sns.set_style('darkgrid')
plt.rcParams['figure.figsize'] = (14, 6)

print("\n✅ Libraries imported successfully")

In [None]:
# Configuration
class Config:
    """LSTM modeling configuration"""
    
    # Paths
    DATA_DIR = Path('../data/raw/train')
    OUTPUT_DIR = Path('../outputs/lstm_modeling')
    
    # Data settings
    USE_SAMPLE = True
    SAMPLE_SIZE = 30000
    MAX_FILES = 2
    RANDOM_STATE = 42
    
    # Sequence settings
    SEQUENCE_LENGTH = 10  # Number of frames to look back
    
    # Model settings
    HIDDEN_SIZE = 64
    NUM_LAYERS = 2
    DROPOUT = 0.2
    
    # Training settings
    BATCH_SIZE = 128
    EPOCHS = 20
    LEARNING_RATE = 0.001
    EARLY_STOPPING_PATIENCE = 5
    
    # Device
    DEVICE = torch.device('cuda' if torch.cuda.is_available() and HAS_PYTORCH else 'cpu')

config = Config()
config.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

print("✅ Configuration loaded")
print(f"   Sequence length: {config.SEQUENCE_LENGTH} frames")
print(f"   Device: {config.DEVICE}")
print(f"   Batch size: {config.BATCH_SIZE}")
print(f"   Epochs: {config.EPOCHS}")

## 2. Data Loading 📂

In [None]:
def load_and_prepare_data(data_dir, max_files=None, sample_size=None):
    """
    Load and prepare data for sequence modeling
    """
    print("📂 Loading data...\n")
    
    # Load files
    input_files = sorted(data_dir.glob('input_*.csv'))[:max_files] if max_files else sorted(data_dir.glob('input_*.csv'))
    output_files = sorted(data_dir.glob('output_*.csv'))[:max_files] if max_files else sorted(data_dir.glob('output_*.csv'))
    
    input_df = pd.concat([pd.read_csv(f) for f in input_files], ignore_index=True)
    output_df = pd.concat([pd.read_csv(f) for f in output_files], ignore_index=True)
    
    print(f"   Input: {input_df.shape}")
    print(f"   Output: {output_df.shape}")
    
    # Sample by games (to keep sequences intact)
    if sample_size and len(input_df) > sample_size:
        unique_games = input_df['game_id'].unique()
        np.random.seed(42)
        n_games = int(len(unique_games) * (sample_size / len(input_df)))
        sampled_games = np.random.choice(unique_games, n_games, replace=False)
        input_df = input_df[input_df['game_id'].isin(sampled_games)]
        sampled_keys = input_df[['game_id', 'play_id', 'nfl_id', 'frame_id']]
        output_df = output_df.merge(sampled_keys, on=['game_id', 'play_id', 'nfl_id', 'frame_id'])
    
    # Merge
    df = input_df.merge(
        output_df[['game_id', 'play_id', 'nfl_id', 'frame_id', 'x', 'y']],
        on=['game_id', 'play_id', 'nfl_id', 'frame_id'],
        suffixes=('', '_target')
    )
    df = df.rename(columns={'x_target': 'target_x', 'y_target': 'target_y'})
    
    # Sort for sequence creation
    df = df.sort_values(['game_id', 'play_id', 'nfl_id', 'frame_id']).reset_index(drop=True)
    
    # Handle missing values
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    for col in numeric_cols:
        if df[col].isnull().any():
            df[col].fillna(df[col].median(), inplace=True)
    
    print(f"\n✅ Data loaded: {df.shape}")
    return df


# Load data
df = load_and_prepare_data(
    config.DATA_DIR,
    max_files=config.MAX_FILES,
    sample_size=config.SAMPLE_SIZE if config.USE_SAMPLE else None
)

## 3. Sequence Creation 📊

Create temporal sequences from tracking data. Each sequence contains the last N frames for a player.

In [None]:
def create_sequences(df, sequence_length=10):
    """
    Create sequences from tracking data
    
    For each player at each frame, create a sequence of the previous N frames.
    
    Args:
        df: Dataframe with tracking data
        sequence_length: Number of frames to include in each sequence
    
    Returns:
        sequences: Array of shape (n_samples, sequence_length, n_features)
        targets: Array of shape (n_samples, 2) for (x, y) targets
        valid_indices: Indices of valid sequences
    """
    print(f"\n🔄 Creating sequences (length={sequence_length})...\n")
    
    # Select features for sequences
    feature_cols = ['x', 'y', 's', 'a', 'dir', 'o']
    available_features = [col for col in feature_cols if col in df.columns]
    
    print(f"   Features: {available_features}")
    
    sequences = []
    targets_x = []
    targets_y = []
    valid_indices = []
    
    # Group by player within each play
    grouped = df.groupby(['game_id', 'play_id', 'nfl_id'])
    
    for (game_id, play_id, nfl_id), group in grouped:
        group = group.sort_values('frame_id').reset_index(drop=True)
        
        # Create sequences
        for i in range(sequence_length, len(group)):
            # Extract sequence
            seq = group.iloc[i-sequence_length:i][available_features].values
            
            # Target is the future position at frame i
            target_x = group.iloc[i]['target_x']
            target_y = group.iloc[i]['target_y']
            
            if seq.shape[0] == sequence_length and not np.isnan(target_x) and not np.isnan(target_y):
                sequences.append(seq)
                targets_x.append(target_x)
                targets_y.append(target_y)
                valid_indices.append(group.iloc[i].name)
    
    sequences = np.array(sequences, dtype=np.float32)
    targets_x = np.array(targets_x, dtype=np.float32)
    targets_y = np.array(targets_y, dtype=np.float32)
    targets = np.stack([targets_x, targets_y], axis=1)
    
    print(f"   ✓ Sequences created: {sequences.shape}")
    print(f"   ✓ Targets: {targets.shape}")
    print(f"   ✓ Features per frame: {sequences.shape[2]}")
    
    return sequences, targets, valid_indices, available_features


# Create sequences
sequences, targets, valid_indices, feature_names = create_sequences(df, sequence_length=config.SEQUENCE_LENGTH)

print(f"\n✅ Sequence creation complete")
print(f"   Total sequences: {len(sequences):,}")
print(f"   Sequence shape: {sequences.shape}")
print(f"   Target shape: {targets.shape}")

In [None]:
# Normalize sequences
print("\n📏 Normalizing sequences...\n")

# Reshape for scaling
n_samples, seq_len, n_features = sequences.shape
sequences_reshaped = sequences.reshape(-1, n_features)

# Fit scaler
scaler = StandardScaler()
sequences_normalized = scaler.fit_transform(sequences_reshaped)
sequences_normalized = sequences_normalized.reshape(n_samples, seq_len, n_features)

print(f"   ✓ Sequences normalized")
print(f"   ✓ Mean: {sequences_normalized.mean():.4f}")
print(f"   ✓ Std: {sequences_normalized.std():.4f}")

# Train/val split (80/20)
split_idx = int(0.8 * len(sequences_normalized))

X_train = sequences_normalized[:split_idx]
X_val = sequences_normalized[split_idx:]
y_train = targets[:split_idx]
y_val = targets[split_idx:]

print(f"\n✅ Train/Val split:")
print(f"   Train: {X_train.shape[0]:,} sequences")
print(f"   Val: {X_val.shape[0]:,} sequences")

## 4. LSTM Architecture 🏗️

Define the LSTM neural network architecture for trajectory prediction.

In [None]:
if HAS_PYTORCH:
    class PlayerLSTM(nn.Module):
        """
        LSTM model for player trajectory prediction
        
        Architecture:
        - Input: (batch, sequence_length, n_features)
        - LSTM layers with dropout
        - Fully connected layer
        - Output: (batch, 2) for (x, y) coordinates
        """
        
        def __init__(self, input_size, hidden_size=64, num_layers=2, dropout=0.2):
            super(PlayerLSTM, self).__init__()
            
            self.hidden_size = hidden_size
            self.num_layers = num_layers
            
            # LSTM layers
            self.lstm = nn.LSTM(
                input_size=input_size,
                hidden_size=hidden_size,
                num_layers=num_layers,
                batch_first=True,
                dropout=dropout if num_layers > 1 else 0
            )
            
            # Dropout
            self.dropout = nn.Dropout(dropout)
            
            # Fully connected layer
            self.fc = nn.Linear(hidden_size, 2)  # Output: (x, y)
        
        def forward(self, x):
            """
            Forward pass
            
            Args:
                x: Input tensor (batch, sequence_length, input_size)
            
            Returns:
                output: Predicted (x, y) coordinates (batch, 2)
            """
            # LSTM forward pass
            lstm_out, (hidden, cell) = self.lstm(x)
            
            # Take output from last time step
            last_output = lstm_out[:, -1, :]
            
            # Apply dropout
            last_output = self.dropout(last_output)
            
            # Fully connected layer
            output = self.fc(last_output)
            
            return output
    
    
    # Create model instance
    input_size = X_train.shape[2]
    model = PlayerLSTM(
        input_size=input_size,
        hidden_size=config.HIDDEN_SIZE,
        num_layers=config.NUM_LAYERS,
        dropout=config.DROPOUT
    ).to(config.DEVICE)
    
    # Print model summary
    print("🏗️  LSTM Model Architecture:")
    print("="*70)
    print(model)
    print("="*70)
    
    # Count parameters
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    
    print(f"\n📊 Model Statistics:")
    print(f"   Total parameters: {total_params:,}")
    print(f"   Trainable parameters: {trainable_params:,}")
    print(f"   Input size: {input_size}")
    print(f"   Hidden size: {config.HIDDEN_SIZE}")
    print(f"   Num layers: {config.NUM_LAYERS}")
    print(f"   Output size: 2 (x, y)")
    
else:
    print("⚠️  PyTorch not available. Please install PyTorch to use LSTM models.")

## 5. Model Training 🎓

Train the LSTM model with early stopping.

In [None]:
if HAS_PYTORCH:
    # Create data loaders
    train_dataset = TensorDataset(
        torch.FloatTensor(X_train),
        torch.FloatTensor(y_train)
    )
    val_dataset = TensorDataset(
        torch.FloatTensor(X_val),
        torch.FloatTensor(y_val)
    )
    
    train_loader = DataLoader(train_dataset, batch_size=config.BATCH_SIZE, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=config.BATCH_SIZE, shuffle=False)
    
    # Loss and optimizer
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=config.LEARNING_RATE)
    
    print("✅ Data loaders created")
    print(f"   Train batches: {len(train_loader)}")
    print(f"   Val batches: {len(val_loader)}")
    print(f"   Batch size: {config.BATCH_SIZE}")

In [None]:
if HAS_PYTORCH:
    # Training loop
    print("\n🎓 Training LSTM model...\n")
    print("="*70)
    
    train_losses = []
    val_losses = []
    best_val_loss = float('inf')
    patience_counter = 0
    
    for epoch in range(config.EPOCHS):
        # Training
        model.train()
        train_loss = 0.0
        
        for batch_X, batch_y in train_loader:
            batch_X = batch_X.to(config.DEVICE)
            batch_y = batch_y.to(config.DEVICE)
            
            # Forward pass
            optimizer.zero_grad()
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            
            # Backward pass
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
        
        train_loss /= len(train_loader)
        train_losses.append(train_loss)
        
        # Validation
        model.eval()
        val_loss = 0.0
        
        with torch.no_grad():
            for batch_X, batch_y in val_loader:
                batch_X = batch_X.to(config.DEVICE)
                batch_y = batch_y.to(config.DEVICE)
                
                outputs = model(batch_X)
                loss = criterion(outputs, batch_y)
                val_loss += loss.item()
        
        val_loss /= len(val_loader)
        val_losses.append(val_loss)
        
        # Calculate RMSE
        train_rmse = np.sqrt(train_loss)
        val_rmse = np.sqrt(val_loss)
        
        # Print progress
        print(f"Epoch {epoch+1:02d}/{config.EPOCHS} | Train RMSE: {train_rmse:.4f} | Val RMSE: {val_rmse:.4f}", end='')
        
        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            patience_counter = 0
            # Save best model
            torch.save(model.state_dict(), config.OUTPUT_DIR / 'best_lstm_model.pth')
            print(" ✓ (saved)")
        else:
            patience_counter += 1
            print(f" (patience: {patience_counter}/{config.EARLY_STOPPING_PATIENCE})")
            
            if patience_counter >= config.EARLY_STOPPING_PATIENCE:
                print(f"\n⏹️  Early stopping triggered at epoch {epoch+1}")
                break
    
    print("="*70)
    print(f"\n✅ Training complete")
    print(f"   Best validation RMSE: {np.sqrt(best_val_loss):.4f}")
    print(f"   Model saved to: {config.OUTPUT_DIR / 'best_lstm_model.pth'}")

In [None]:
if HAS_PYTORCH:
    # Visualize training curves
    fig, ax = plt.subplots(figsize=(12, 6))
    
    epochs_range = range(1, len(train_losses) + 1)
    train_rmse = [np.sqrt(loss) for loss in train_losses]
    val_rmse = [np.sqrt(loss) for loss in val_losses]
    
    ax.plot(epochs_range, train_rmse, 'b-o', label='Train RMSE', linewidth=2, markersize=6)
    ax.plot(epochs_range, val_rmse, 'r-o', label='Validation RMSE', linewidth=2, markersize=6)
    ax.set_xlabel('Epoch', fontsize=12)
    ax.set_ylabel('RMSE', fontsize=12)
    ax.set_title('LSTM Training Curves', fontsize=14, fontweight='bold')
    ax.legend(fontsize=11)
    ax.grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'training_curves.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("✅ Training curves visualized")

## 6. Predictions & Visualization 🎯

Generate predictions and visualize trajectories.

In [None]:
if HAS_PYTORCH:
    # Load best model
    model.load_state_dict(torch.load(config.OUTPUT_DIR / 'best_lstm_model.pth'))
    model.eval()
    
    # Generate predictions
    print("🎯 Generating predictions...\n")
    
    with torch.no_grad():
        X_val_tensor = torch.FloatTensor(X_val).to(config.DEVICE)
        predictions = model(X_val_tensor).cpu().numpy()
    
    # Calculate metrics
    pred_x = predictions[:, 0]
    pred_y = predictions[:, 1]
    true_x = y_val[:, 0]
    true_y = y_val[:, 1]
    
    rmse_x = np.sqrt(mean_squared_error(true_x, pred_x))
    rmse_y = np.sqrt(mean_squared_error(true_y, pred_y))
    mae_x = mean_absolute_error(true_x, pred_x)
    mae_y = mean_absolute_error(true_y, pred_y)
    r2_x = r2_score(true_x, pred_x)
    r2_y = r2_score(true_y, pred_y)
    
    print("📊 LSTM Model Performance:")
    print("="*70)
    print(f"   X Coordinate:")
    print(f"      RMSE: {rmse_x:.4f}")
    print(f"      MAE:  {mae_x:.4f}")
    print(f"      R²:   {r2_x:.4f}")
    print(f"\n   Y Coordinate:")
    print(f"      RMSE: {rmse_y:.4f}")
    print(f"      MAE:  {mae_y:.4f}")
    print(f"      R²:   {r2_y:.4f}")
    print(f"\n   Average RMSE: {(rmse_x + rmse_y) / 2:.4f}")
    print("="*70)

In [None]:
if HAS_PYTORCH:
    # Visualize predictions
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. X predictions vs actual
    axes[0, 0].scatter(true_x, pred_x, alpha=0.3, s=1, c='blue')
    axes[0, 0].plot([true_x.min(), true_x.max()], [true_x.min(), true_x.max()], 'r--', lw=2)
    axes[0, 0].set_xlabel('Actual X', fontsize=12)
    axes[0, 0].set_ylabel('Predicted X', fontsize=12)
    axes[0, 0].set_title(f'X Predictions (RMSE: {rmse_x:.4f})', fontsize=14, fontweight='bold')
    axes[0, 0].grid(alpha=0.3)
    
    # 2. Y predictions vs actual
    axes[0, 1].scatter(true_y, pred_y, alpha=0.3, s=1, c='orange')
    axes[0, 1].plot([true_y.min(), true_y.max()], [true_y.min(), true_y.max()], 'r--', lw=2)
    axes[0, 1].set_xlabel('Actual Y', fontsize=12)
    axes[0, 1].set_ylabel('Predicted Y', fontsize=12)
    axes[0, 1].set_title(f'Y Predictions (RMSE: {rmse_y:.4f})', fontsize=14, fontweight='bold')
    axes[0, 1].grid(alpha=0.3)
    
    # 3. X residuals
    residuals_x = true_x - pred_x
    axes[1, 0].hist(residuals_x, bins=50, edgecolor='black', alpha=0.7, color='blue')
    axes[1, 0].axvline(0, color='red', linestyle='--', linewidth=2)
    axes[1, 0].set_xlabel('Residual (Actual - Predicted)', fontsize=12)
    axes[1, 0].set_ylabel('Frequency', fontsize=12)
    axes[1, 0].set_title(f'X Residual Distribution (Mean: {residuals_x.mean():.4f})', fontsize=14, fontweight='bold')
    
    # 4. Y residuals
    residuals_y = true_y - pred_y
    axes[1, 1].hist(residuals_y, bins=50, edgecolor='black', alpha=0.7, color='orange')
    axes[1, 1].axvline(0, color='red', linestyle='--', linewidth=2)
    axes[1, 1].set_xlabel('Residual (Actual - Predicted)', fontsize=12)
    axes[1, 1].set_ylabel('Frequency', fontsize=12)
    axes[1, 1].set_title(f'Y Residual Distribution (Mean: {residuals_y.mean():.4f})', fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'lstm_predictions.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("✅ Predictions visualized")

In [None]:
if HAS_PYTORCH:
    # Visualize sample trajectories
    print("\n🛤️  Visualizing sample trajectories...\n")
    
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    axes = axes.flatten()
    
    # Select 6 random samples
    n_samples = 6
    sample_indices = np.random.choice(len(X_val), n_samples, replace=False)
    
    for idx, sample_idx in enumerate(sample_indices):
        # Get sequence (historical trajectory)
        sequence = X_val[sample_idx]  # Shape: (seq_len, n_features)
        
        # Denormalize sequence (inverse transform)
        sequence_denorm = scaler.inverse_transform(sequence)
        
        # Extract x, y coordinates (first 2 features)
        x_history = sequence_denorm[:, 0]
        y_history = sequence_denorm[:, 1]
        
        # True and predicted future positions
        x_true = true_x[sample_idx]
        y_true = true_y[sample_idx]
        x_pred = pred_x[sample_idx]
        y_pred = pred_y[sample_idx]
        
        # Plot
        axes[idx].plot(x_history, y_history, 'b-o', label='Historical trajectory', linewidth=2, markersize=4)
        axes[idx].plot(x_true, y_true, 'go', label='Actual future', markersize=12)
        axes[idx].plot(x_pred, y_pred, 'r^', label='Predicted future', markersize=12)
        axes[idx].plot([x_history[-1], x_true], [y_history[-1], y_true], 'g--', alpha=0.5, linewidth=1)
        axes[idx].plot([x_history[-1], x_pred], [y_history[-1], y_pred], 'r--', alpha=0.5, linewidth=1)
        
        error = np.sqrt((x_true - x_pred)**2 + (y_true - y_pred)**2)
        axes[idx].set_xlabel('X Position (yards)', fontsize=10)
        axes[idx].set_ylabel('Y Position (yards)', fontsize=10)
        axes[idx].set_title(f'Sample {idx+1} (Error: {error:.2f} yards)', fontsize=11, fontweight='bold')
        axes[idx].legend(fontsize=8)
        axes[idx].grid(alpha=0.3)
        axes[idx].set_xlim(0, 120)
        axes[idx].set_ylim(0, 53.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'sample_trajectories.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("✅ Sample trajectories visualized")

## 7. Comparison with Traditional Models 📊

Compare LSTM performance with traditional machine learning models.

In [None]:
# Load comparison from previous notebook (if available)
comparison_file = Path('../outputs/model_comparison/model_comparison.csv')

if comparison_file.exists():
    traditional_comparison = pd.read_csv(comparison_file)
    
    print("📊 Comparing LSTM with traditional models...\n")
    
    # Create comparison
    lstm_results = pd.DataFrame([{
        'model': 'LSTM',
        'val_rmse_x': rmse_x,
        'val_rmse_y': rmse_y,
        'val_mae_x': mae_x,
        'val_mae_y': mae_y,
        'val_r2_x': r2_x,
        'val_r2_y': r2_y,
        'avg_rmse': (rmse_x + rmse_y) / 2
    }]) if HAS_PYTORCH else pd.DataFrame()
    
    if not lstm_results.empty:
        # Combine results
        all_models = pd.concat([traditional_comparison, lstm_results], ignore_index=True)
        all_models = all_models.sort_values('avg_rmse')
        
        print("🏆 Model Ranking (by average RMSE):\n")
        display(all_models[['model', 'val_rmse_x', 'val_rmse_y', 'avg_rmse']])
        
        # Visualize comparison
        fig, ax = plt.subplots(figsize=(14, 6))
        
        models_list = all_models['model'].tolist()
        x_pos = np.arange(len(models_list))
        
        # Highlight LSTM
        colors = ['red' if m == 'LSTM' else 'steelblue' for m in models_list]
        
        ax.bar(x_pos, all_models['avg_rmse'], color=colors, alpha=0.7)
        ax.set_xticks(x_pos)
        ax.set_xticklabels([m.upper() for m in models_list], rotation=45, ha='right')
        ax.set_ylabel('Average RMSE', fontsize=12)
        ax.set_title('Model Comparison: LSTM vs Traditional ML', fontsize=14, fontweight='bold')
        ax.grid(axis='y', alpha=0.3)
        
        plt.tight_layout()
        plt.savefig(config.OUTPUT_DIR / 'lstm_vs_traditional.png', dpi=150, bbox_inches='tight')
        plt.show()
        
        print("\n✅ Comparison visualized")
else:
    print("⚠️  Traditional model comparison not found. Run 04_model_comparison.ipynb first.")

## 8. Error Analysis 📉

Analyze LSTM errors by sequence characteristics.

In [None]:
if HAS_PYTORCH:
    print("📉 Analyzing prediction errors...\n")
    
    # Calculate Euclidean error
    euclidean_errors = np.sqrt((true_x - pred_x)**2 + (true_y - pred_y)**2)
    
    print("📊 Error Statistics:")
    print(f"   Mean error: {euclidean_errors.mean():.4f} yards")
    print(f"   Median error: {np.median(euclidean_errors):.4f} yards")
    print(f"   Std error: {euclidean_errors.std():.4f} yards")
    print(f"   Min error: {euclidean_errors.min():.4f} yards")
    print(f"   Max error: {euclidean_errors.max():.4f} yards")
    print(f"   95th percentile: {np.percentile(euclidean_errors, 95):.4f} yards")
    
    # Visualize error distribution
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # 1. Error histogram
    axes[0].hist(euclidean_errors, bins=50, edgecolor='black', alpha=0.7, color='purple')
    axes[0].axvline(euclidean_errors.mean(), color='red', linestyle='--', linewidth=2, label=f'Mean: {euclidean_errors.mean():.2f}')
    axes[0].axvline(np.median(euclidean_errors), color='green', linestyle='--', linewidth=2, label=f'Median: {np.median(euclidean_errors):.2f}')
    axes[0].set_xlabel('Euclidean Error (yards)', fontsize=12)
    axes[0].set_ylabel('Frequency', fontsize=12)
    axes[0].set_title('LSTM Prediction Error Distribution', fontsize=14, fontweight='bold')
    axes[0].legend()
    axes[0].grid(alpha=0.3)
    
    # 2. Error by percentile
    percentiles = np.arange(0, 101, 5)
    error_percentiles = np.percentile(euclidean_errors, percentiles)
    
    axes[1].plot(percentiles, error_percentiles, 'b-o', linewidth=2, markersize=6)
    axes[1].axhline(euclidean_errors.mean(), color='red', linestyle='--', linewidth=1, label='Mean')
    axes[1].set_xlabel('Percentile', fontsize=12)
    axes[1].set_ylabel('Error (yards)', fontsize=12)
    axes[1].set_title('Error by Percentile', fontsize=14, fontweight='bold')
    axes[1].legend()
    axes[1].grid(alpha=0.3)
    
    plt.tight_layout()
    plt.savefig(config.OUTPUT_DIR / 'error_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\n✅ Error analysis complete")

In [None]:
if HAS_PYTORCH:
    # Save LSTM results
    print("\n💾 Saving LSTM results...")
    
    results = {
        'rmse_x': float(rmse_x),
        'rmse_y': float(rmse_y),
        'mae_x': float(mae_x),
        'mae_y': float(mae_y),
        'r2_x': float(r2_x),
        'r2_y': float(r2_y),
        'avg_rmse': float((rmse_x + rmse_y) / 2),
        'mean_euclidean_error': float(euclidean_errors.mean()),
        'median_euclidean_error': float(np.median(euclidean_errors)),
        'sequence_length': config.SEQUENCE_LENGTH,
        'hidden_size': config.HIDDEN_SIZE,
        'num_layers': config.NUM_LAYERS,
        'total_parameters': total_params
    }
    
    import json
    with open(config.OUTPUT_DIR / 'lstm_results.json', 'w') as f:
        json.dump(results, f, indent=2)
    
    # Save scaler
    with open(config.OUTPUT_DIR / 'scaler.pkl', 'wb') as f:
        pickle.dump(scaler, f)
    
    print(f"\n✅ Results saved to: {config.OUTPUT_DIR}")
    print(f"   Model: best_lstm_model.pth")
    print(f"   Results: lstm_results.json")
    print(f"   Scaler: scaler.pkl")

---

## 🎉 LSTM Sequence Modeling Complete!

### Summary:

✅ **Sequence Creation**: Created temporal sequences from tracking data  
✅ **LSTM Architecture**: 2-layer LSTM with dropout for trajectory prediction  
✅ **Training**: Trained with early stopping and saved best model  
✅ **Predictions**: Generated predictions on validation set  
✅ **Comparison**: Compared with traditional ML models  
✅ **Error Analysis**: Analyzed prediction errors and distributions  

### Key Findings:

1. **LSTM Performance**: LSTMs can capture temporal dependencies in player movement
2. **Sequence Length**: Window of 10 frames provides good context
3. **Training**: Early stopping prevents overfitting
4. **Comparison**: LSTM may perform better/worse than traditional models depending on data

### When LSTM Works Best:

- Long, consistent trajectories
- Players with predictable movement patterns
- When temporal context is important

### When Traditional Models May Win:

- Short sequences with limited history
- Sudden direction changes
- When non-temporal features (position, role) are most important

### Next Steps:

1. Generate final predictions in `06_prediction_and_evaluation.ipynb`
2. Experiment with different sequence lengths
3. Try bidirectional LSTM or GRU architectures
4. Combine LSTM with traditional models in ensemble

---