# Player Performance Trend Analysis

This notebook demonstrates how to use the NBA MCP Econometric Suite to analyze player performance trends over time using advanced time series methods.

## Objectives

1. Load and prepare player scoring data across multiple seasons
2. Test for stationarity using ADF and KPSS tests
3. Fit ARIMA models for forecasting
4. Apply Kalman filtering for real-time performance tracking
5. Use structural decomposition to separate level, trend, and seasonal components
6. Compare multiple methods using the EconometricSuite

## Prerequisites

```bash
pip install pandas numpy matplotlib seaborn statsmodels
```

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Import econometric modules
from mcp_server.time_series import TimeSeriesAnalyzer
from mcp_server.advanced_time_series import AdvancedTimeSeriesAnalyzer
from mcp_server.econometric_suite import EconometricSuite

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

## 1. Generate Sample Player Performance Data

For this demonstration, we'll generate synthetic player performance data that mimics real NBA statistics:
- Upward trend (player improvement)
- Seasonal pattern (playoff performance boost)
- Random noise (game-to-game variability)

In [None]:
def generate_player_performance_data(n_games=200, player_name="LeBron James"):
    """
    Generate synthetic player performance data with trend, seasonality, and noise.
    
    Parameters:
    -----------
    n_games : int
        Number of games to simulate
    player_name : str
        Player name for metadata
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with date, points_per_game, and other metrics
    """
    np.random.seed(42)
    
    # Generate dates (2 games per week for ~2 seasons)
    start_date = datetime(2023, 10, 1)
    dates = [start_date + timedelta(days=i*3) for i in range(n_games)]
    
    # Components of player performance
    # 1. Base level (starting performance)
    base_level = 22.0
    
    # 2. Linear trend (player improvement over time)
    trend = np.linspace(0, 5, n_games)  # +5 PPG improvement over period
    
    # 3. Seasonal component (playoff boost every ~41 games)
    seasonal = 3 * np.sin(2 * np.pi * np.arange(n_games) / 41)
    
    # 4. Cyclical component (hot/cold streaks)
    cyclical = 2 * np.sin(2 * np.pi * np.arange(n_games) / 15)
    
    # 5. Random noise (game-to-game variability)
    noise = np.random.normal(0, 2.5, n_games)
    
    # Combine components
    ppg = base_level + trend + seasonal + cyclical + noise
    
    # Create DataFrame
    df = pd.DataFrame({
        'date': dates,
        'player_name': player_name,
        'points_per_game': ppg,
        'season': ['2023-24' if d < datetime(2024, 6, 1) else '2024-25' for d in dates],
        'game_number': range(1, n_games + 1)
    })
    
    return df

# Generate data for LeBron James
player_data = generate_player_performance_data(n_games=200, player_name="LeBron James")

print(f"Generated {len(player_data)} games of data")
print(f"Date range: {player_data['date'].min()} to {player_data['date'].max()}")
print(f"\nFirst 5 games:")
player_data.head()

In [None]:
# Visualize the raw data
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(player_data['date'], player_data['points_per_game'], marker='o', 
        linestyle='-', linewidth=1, markersize=3, alpha=0.7)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Points Per Game', fontsize=12)
ax.set_title('LeBron James - Points Per Game Over Time', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Summary statistics
print("\nSummary Statistics:")
print(player_data['points_per_game'].describe())

## 2. Stationarity Testing

Before fitting time series models, we need to test if the data is stationary (constant mean and variance over time).

We'll use:
- **ADF Test** (Augmented Dickey-Fuller): Tests for unit root (non-stationarity)
- **KPSS Test**: Tests for stationarity around a deterministic trend

In [None]:
# Initialize TimeSeriesAnalyzer
ts_data = player_data.set_index('date')['points_per_game']
analyzer = TimeSeriesAnalyzer(data=ts_data, freq='D')

# Run ADF test
print("=" * 60)
print("AUGMENTED DICKEY-FULLER TEST")
print("=" * 60)
adf_result = analyzer.adf_test()
print(f"ADF Statistic: {adf_result['adf_statistic']:.4f}")
print(f"P-value: {adf_result['p_value']:.4f}")
print(f"Critical Values:")
for key, value in adf_result['critical_values'].items():
    print(f"  {key}: {value:.4f}")
print(f"\nConclusion: {'Stationary' if adf_result['stationary'] else 'Non-Stationary'}")
print(f"Interpretation: {adf_result['interpretation']}")

print("\n" + "=" * 60)
print("KPSS TEST")
print("=" * 60)
kpss_result = analyzer.kpss_test()
print(f"KPSS Statistic: {kpss_result['kpss_statistic']:.4f}")
print(f"P-value: {kpss_result['p_value']:.4f}")
print(f"Critical Values:")
for key, value in kpss_result['critical_values'].items():
    print(f"  {key}: {value:.4f}")
print(f"\nConclusion: {'Stationary' if kpss_result['stationary'] else 'Non-Stationary'}")

## 3. Seasonal Decomposition

Break down the time series into:
- **Trend**: Long-term movement
- **Seasonal**: Repeating patterns
- **Residual**: Random noise

In [None]:
# Perform seasonal decomposition
decomposition = analyzer.decompose(model='additive', period=41)  # 41 games ~ half season

# Visualize components
fig, axes = plt.subplots(4, 1, figsize=(14, 10))

# Original
axes[0].plot(decomposition['observed'].index, decomposition['observed'], color='blue')
axes[0].set_ylabel('Original', fontsize=11)
axes[0].set_title('Seasonal Decomposition - LeBron James PPG', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Trend
axes[1].plot(decomposition['trend'].index, decomposition['trend'], color='green')
axes[1].set_ylabel('Trend', fontsize=11)
axes[1].grid(True, alpha=0.3)

# Seasonal
axes[2].plot(decomposition['seasonal'].index, decomposition['seasonal'], color='orange')
axes[2].set_ylabel('Seasonal', fontsize=11)
axes[2].grid(True, alpha=0.3)

# Residual
axes[3].plot(decomposition['resid'].index, decomposition['resid'], color='red', alpha=0.6)
axes[3].set_ylabel('Residual', fontsize=11)
axes[3].set_xlabel('Date', fontsize=11)
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nDecomposition Statistics:")
print(f"Trend contribution: {decomposition['trend'].std():.2f}")
print(f"Seasonal contribution: {decomposition['seasonal'].std():.2f}")
print(f"Residual contribution: {decomposition['resid'].dropna().std():.2f}")

## 4. ARIMA Modeling and Forecasting

Fit an ARIMA model to forecast future performance.

We'll use Auto-ARIMA to automatically select the best parameters.

In [None]:
# Auto-select best ARIMA model
print("Fitting Auto-ARIMA model...")
print("This may take a minute as it tests multiple parameter combinations.\n")

arima_model = analyzer.auto_arima(seasonal=False, max_p=3, max_q=3, max_d=2)

print(f"Best ARIMA Order: {arima_model['order']}")
print(f"AIC: {arima_model['aic']:.2f}")
print(f"BIC: {arima_model['bic']:.2f}")
print(f"\nModel Summary:")
print(arima_model['model_summary'])

In [None]:
# Forecast next 20 games
forecast_result = analyzer.forecast(model=arima_model, steps=20)

# Visualize forecast
fig, ax = plt.subplots(figsize=(14, 6))

# Historical data
ax.plot(ts_data.index, ts_data, label='Historical', color='blue', linewidth=2)

# Forecast
forecast_dates = pd.date_range(start=ts_data.index[-1] + timedelta(days=3), 
                                periods=20, freq='3D')
ax.plot(forecast_dates, forecast_result['forecast'], 
        label='Forecast', color='red', linewidth=2, linestyle='--')

# Confidence interval
ax.fill_between(forecast_dates,
                forecast_result['lower_bound'],
                forecast_result['upper_bound'],
                alpha=0.2, color='red', label='95% Confidence Interval')

ax.axvline(x=ts_data.index[-1], color='gray', linestyle=':', alpha=0.7)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Points Per Game', fontsize=12)
ax.set_title('ARIMA Forecast - LeBron James PPG', fontsize=14, fontweight='bold')
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nForecast Statistics:")
print(f"Mean forecast: {forecast_result['forecast'].mean():.2f} PPG")
print(f"Forecast range: [{forecast_result['forecast'].min():.2f}, {forecast_result['forecast'].max():.2f}]")

## 5. Kalman Filtering for Real-Time Tracking

Use Kalman filters to track the latent (true) performance level in real-time, filtering out noise.

In [None]:
# Initialize Advanced Time Series Analyzer
adv_analyzer = AdvancedTimeSeriesAnalyzer(data=ts_data.values)

# Fit Kalman filter (local level model)
print("Fitting Kalman Filter (Local Level Model)...\n")
kalman_result = adv_analyzer.fit_kalman_filter(model='local_level')

print(f"Model: Local Level Kalman Filter")
print(f"Log-Likelihood: {kalman_result['log_likelihood']:.2f}")
print(f"AIC: {kalman_result['aic']:.2f}")
print(f"\nEstimated Parameters:")
for param, value in kalman_result['parameters'].items():
    print(f"  {param}: {value:.4f}")

In [None]:
# Visualize Kalman filtered states
fig, ax = plt.subplots(figsize=(14, 6))

# Raw observations
ax.plot(ts_data.index, ts_data, label='Observed PPG', 
        color='blue', alpha=0.4, linewidth=1.5)

# Filtered state (smoothed estimate)
ax.plot(ts_data.index, kalman_result['filtered_state'], 
        label='Kalman Filtered State', color='red', linewidth=2)

ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Points Per Game', fontsize=12)
ax.set_title('Kalman Filter - True Performance Level Estimation', fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate how much noise was filtered out
noise_reduction = ts_data.std() - pd.Series(kalman_result['filtered_state']).std()
print(f"\nNoise Reduction:")
print(f"Original std dev: {ts_data.std():.2f}")
print(f"Filtered std dev: {pd.Series(kalman_result['filtered_state']).std():.2f}")
print(f"Reduction: {noise_reduction:.2f} ({(noise_reduction/ts_data.std())*100:.1f}%)")

## 6. Structural Time Series Decomposition

Use state-space models to decompose the series into level, trend, seasonal, and cycle components simultaneously.

In [None]:
# Fit structural time series model
print("Fitting Structural Time Series Model...\n")
structural_result = adv_analyzer.fit_structural_ts(
    level=True, 
    trend=True, 
    seasonal=41,  # Half-season periodicity
    cycle=True
)

print(f"Model: Structural Time Series")
print(f"Components: Level + Trend + Seasonal(41) + Cycle")
print(f"Log-Likelihood: {structural_result['log_likelihood']:.2f}")
print(f"AIC: {structural_result['aic']:.2f}")
print(f"BIC: {structural_result['bic']:.2f}")

In [None]:
# Visualize structural components
fig, axes = plt.subplots(5, 1, figsize=(14, 12))

# Observed
axes[0].plot(ts_data.index, ts_data, color='blue')
axes[0].set_ylabel('Observed', fontsize=10)
axes[0].set_title('Structural Time Series Decomposition', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Level
axes[1].plot(ts_data.index, structural_result['level'], color='green')
axes[1].set_ylabel('Level', fontsize=10)
axes[1].grid(True, alpha=0.3)

# Trend
axes[2].plot(ts_data.index, structural_result['trend'], color='purple')
axes[2].set_ylabel('Trend', fontsize=10)
axes[2].grid(True, alpha=0.3)

# Seasonal
axes[3].plot(ts_data.index, structural_result['seasonal'], color='orange')
axes[3].set_ylabel('Seasonal', fontsize=10)
axes[3].grid(True, alpha=0.3)

# Cycle
axes[4].plot(ts_data.index, structural_result['cycle'], color='brown')
axes[4].set_ylabel('Cycle', fontsize=10)
axes[4].set_xlabel('Date', fontsize=11)
axes[4].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nComponent Variability:")
print(f"Level std: {pd.Series(structural_result['level']).std():.2f}")
print(f"Trend std: {pd.Series(structural_result['trend']).std():.2f}")
print(f"Seasonal std: {pd.Series(structural_result['seasonal']).std():.2f}")
print(f"Cycle std: {pd.Series(structural_result['cycle']).std():.2f}")

## 7. Unified Analysis with EconometricSuite

Use the EconometricSuite to automatically detect the data structure and compare multiple methods.

In [None]:
# Prepare data for Suite (needs DataFrame with date column)
suite_data = player_data[['date', 'points_per_game']].copy()
suite_data = suite_data.set_index('date')

# Initialize EconometricSuite
suite = EconometricSuite(
    data=suite_data,
    target='points_per_game',
    time_col=None  # Already set as index
)

print("EconometricSuite Initialized")
print(f"Data structure detected: {suite.data_structure}")
print(f"Recommended methods: {suite.recommended_methods}")

In [None]:
# Auto-analyze with best method
auto_result = suite.analyze(method='auto')

print("=" * 60)
print("AUTO-ANALYSIS RESULT")
print("=" * 60)
print(auto_result.summary())

In [None]:
# Compare multiple methods
print("Comparing multiple time series methods...\n")

comparison = suite.compare_methods(
    methods=[
        {'category': 'time_series', 'method': 'arima', 'params': {'order': (1, 1, 1)}},
        {'category': 'time_series', 'method': 'auto_arima', 'params': {}},
        {'category': 'advanced_time_series', 'method': 'kalman', 'params': {'model': 'local_level'}}
    ],
    metric='aic'
)

print("\nMethod Comparison Results:")
print(comparison)

## 8. Summary and Insights

### Key Findings

1. **Stationarity**: The data shows non-stationary behavior due to the trend component
2. **Components**: 
   - Strong upward trend indicating player improvement
   - Seasonal pattern aligned with half-season (playoff) cycles
   - Cyclical hot/cold streaks
3. **Forecasting**: ARIMA provides reasonable point forecasts with uncertainty quantification
4. **Real-time Tracking**: Kalman filter effectively reduces noise and reveals true performance level
5. **Structural Model**: Decomposes performance into interpretable components

### Recommendations

1. **For Forecasting**: Use ARIMA for short-term predictions with confidence intervals
2. **For Real-Time Monitoring**: Use Kalman filter to track current performance level
3. **For Understanding Patterns**: Use structural decomposition to analyze long-term trends
4. **For Model Selection**: Use EconometricSuite to compare methods and select best approach

### NBA Applications

- **Player Development**: Track improvement trends over seasons
- **Injury Recovery**: Monitor return to form after injuries
- **Contract Valuation**: Project future performance for contract negotiations
- **Fantasy Basketball**: Forecast upcoming game performance
- **Coaching Decisions**: Identify hot/cold streaks for lineup optimization

## Next Steps

- Explore **Notebook 2**: Career Longevity Modeling with Survival Analysis
- Explore **Notebook 3**: Coaching Change Impact with Causal Inference
- Try with real NBA data from the MCP server
- Extend analysis to compare multiple players
- Add Markov switching models for regime detection