# Notebook 4: Injury Recovery Tracking with Markov Switching Models

**Use Case**: Track player performance recovery after injury using regime-switching models  
**Methods**: Markov Switching Models, Kalman Filter, Structural Time Series  
**Business Value**: Predict return-to-form timeline, optimize return-to-play decisions

---

## Table of Contents

1. [Problem Statement](#1-problem-statement)
2. [Data Setup](#2-data-setup)
3. [Method 1: Markov Switching Models](#3-method-1-markov-switching-models)
4. [Method 2: Kalman Filter for Trajectory](#4-method-2-kalman-filter-for-trajectory)
5. [Method 3: Structural Decomposition](#5-method-3-structural-decomposition)
6. [Recovery Timeline Prediction](#6-recovery-timeline-prediction)
7. [Business Recommendations](#7-business-recommendations)
8. [Production Deployment](#8-production-deployment)
9. [Summary](#9-summary)

---

## 1. Problem Statement

### The Challenge

When NBA players return from injury, their performance typically follows distinct phases:
- **Regime 1 (Struggling)**: Below pre-injury baseline, limited mobility
- **Regime 2 (Recovering)**: Improving but inconsistent performance
- **Regime 3 (Recovered)**: Back to pre-injury level

### Key Questions

1. **Detection**: Which recovery regime is the player currently in?
2. **Prediction**: When will they transition to full recovery?
3. **Monitoring**: Are they progressing as expected?
4. **Risk**: Is there regression to struggling phase?

### Why Traditional Methods Fail

- **Linear models**: Assume constant performance trajectory
- **Simple averages**: Miss regime changes and transitions
- **Threshold rules**: Arbitrary cutoffs, no uncertainty quantification

### Our Solution

**Markov Switching Models** allow performance to switch between discrete regimes, with transition probabilities governing regime changes.

---

## 2. Data Setup

We'll generate synthetic player performance data with:
- Pre-injury baseline (Regime 3)
- Post-injury struggling phase (Regime 1)
- Recovery period (transition 1 ‚Üí 2 ‚Üí 3)
- Return to full form (Regime 3)

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Seed for reproducibility
np.random.seed(42)

In [None]:
def generate_injury_recovery_data(n_games=80, injury_game=20):
    """
    Generate synthetic player performance with injury recovery phases.
    
    Regime 1 (Struggling): mean=15, std=3
    Regime 2 (Recovering): mean=22, std=4
    Regime 3 (Healthy): mean=28, std=3
    """
    dates = [datetime(2024, 10, 1) + timedelta(days=2*i) for i in range(n_games)]
    
    points = []
    true_regime = []
    
    for i in range(n_games):
        if i < injury_game:
            # Pre-injury: Healthy (Regime 3)
            points.append(np.random.normal(28, 3))
            true_regime.append(3)
        elif i < injury_game + 15:
            # Post-injury: Struggling (Regime 1)
            points.append(np.random.normal(15, 3))
            true_regime.append(1)
        elif i < injury_game + 35:
            # Recovery: Recovering (Regime 2)
            points.append(np.random.normal(22, 4))
            true_regime.append(2)
        else:
            # Recovered: Healthy (Regime 3)
            points.append(np.random.normal(28, 3))
            true_regime.append(3)
    
    df = pd.DataFrame({
        'game_date': dates,
        'points': points,
        'true_regime': true_regime,
        'game_num': range(1, n_games + 1)
    })
    
    return df, injury_game

# Generate data
recovery_df, injury_game = generate_injury_recovery_data()

print("Player Recovery Data:")
print(recovery_df.head(10))
print(f"\nInjury occurred after game {injury_game}")
print(f"Total games: {len(recovery_df)}")

In [None]:
# Visualize the data
fig, ax = plt.subplots(figsize=(14, 6))

# Plot points with color by regime
regime_colors = {1: 'red', 2: 'orange', 3: 'green'}
for regime in [1, 2, 3]:
    mask = recovery_df['true_regime'] == regime
    regime_label = {1: 'Struggling', 2: 'Recovering', 3: 'Healthy'}[regime]
    ax.scatter(recovery_df.loc[mask, 'game_num'], 
              recovery_df.loc[mask, 'points'],
              c=regime_colors[regime], label=regime_label, alpha=0.6, s=50)

# Mark injury event
ax.axvline(injury_game, color='black', linestyle='--', linewidth=2, 
          label='Injury Event', alpha=0.7)

ax.set_xlabel('Game Number', fontsize=12)
ax.set_ylabel('Points Per Game', fontsize=12)
ax.set_title('Player Performance: Injury Recovery Trajectory', fontsize=14, fontweight='bold')
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nRegime Distribution:")
print(recovery_df['true_regime'].value_counts().sort_index())

## 3. Method 1: Markov Switching Models

### What is a Markov Switching Model?

A **Markov Switching Model** (also called Regime-Switching Model) allows time series to switch between distinct regimes, each with different statistical properties.

### Mathematical Framework

$$y_t = \mu_{s_t} + \epsilon_t, \quad \epsilon_t \sim N(0, \sigma^2_{s_t})$$

Where:
- $s_t \in \{1, 2, 3\}$ is the latent regime at time $t$
- $\mu_{s_t}$ is the mean in regime $s_t$
- $\sigma^2_{s_t}$ is the variance in regime $s_t$

### Transition Probabilities

$$P(s_t = j | s_{t-1} = i) = p_{ij}$$

The **transition matrix** $P$ governs regime switches:

```
         To: Struggling  Recovering  Healthy
From:
Struggling   [  p11        p12        p13   ]
Recovering   [  p21        p22        p23   ]
Healthy      [  p31        p32        p33   ]
```

### Expected Regime Duration

$$E[\text{Duration in regime } i] = \frac{1}{1 - p_{ii}}$$

---

### Fitting Markov Switching Model

We'll use `statsmodels` to fit a 3-regime Markov Switching model.

In [None]:
from statsmodels.tsa.regime_switching.markov_regression import MarkovRegression

# Fit 3-regime Markov Switching model
# Note: This fits intercept-only model (no covariates)
ms_model = MarkovRegression(
    endog=recovery_df['points'],
    k_regimes=3,
    switching_variance=True  # Allow variance to differ by regime
)

print("Fitting 3-regime Markov Switching model...")
print("(This may take 30-60 seconds)\n")

ms_result = ms_model.fit()

print("Model fitted successfully!\n")
print(ms_result.summary())

### Extracting Regime Parameters

In [None]:
# Extract regime parameters
regime_means = ms_result.params[ms_result.param_names.str.contains('const')].values
regime_stds = np.sqrt(ms_result.params[ms_result.param_names.str.contains('sigma')].values)

print("Estimated Regime Parameters:")
print("="*50)
for i in range(3):
    regime_label = ["Struggling", "Recovering", "Healthy"][i]
    print(f"\nRegime {i} ({regime_label}):")
    print(f"  Mean: {regime_means[i]:.2f} points")
    print(f"  Std:  {regime_stds[i]:.2f} points")

# Compare to true values
print("\n" + "="*50)
print("True Parameters (Data Generation):")
print("  Regime 1 (Struggling): mean=15, std=3")
print("  Regime 2 (Recovering): mean=22, std=4")
print("  Regime 3 (Healthy): mean=28, std=3")

### Transition Probabilities

The transition matrix tells us how likely players are to move between regimes.

In [None]:
# Extract transition matrix
transition_matrix = ms_result.regime_transition

print("Transition Probability Matrix:")
print("="*60)
print("\n         To: Regime 0   Regime 1   Regime 2")
print("From:")
for i in range(3):
    regime_label = ["Struggling", "Recovering", "Healthy"][i]
    print(f"{regime_label:>12}   {transition_matrix[i,0]:>8.3f}   {transition_matrix[i,1]:>8.3f}   {transition_matrix[i,2]:>8.3f}")

# Expected regime durations
print("\n" + "="*60)
print("Expected Regime Durations:")
for i in range(3):
    regime_label = ["Struggling", "Recovering", "Healthy"][i]
    expected_duration = 1 / (1 - transition_matrix[i, i])
    print(f"  {regime_label}: {expected_duration:.1f} games")

### Regime Probabilities Over Time

The **smoothed probabilities** tell us the probability of being in each regime at each time point, given the full data.

In [None]:
# Get smoothed probabilities (inferred regime probabilities)
smoothed_probs = ms_result.smoothed_marginal_probabilities

# Determine most likely regime at each time
predicted_regime = smoothed_probs.idxmax(axis=1)

# Add to dataframe
recovery_df['regime_0_prob'] = smoothed_probs.iloc[:, 0].values
recovery_df['regime_1_prob'] = smoothed_probs.iloc[:, 1].values
recovery_df['regime_2_prob'] = smoothed_probs.iloc[:, 2].values
recovery_df['predicted_regime'] = predicted_regime.values

# Show sample
print("Regime Probabilities (Sample):")
print(recovery_df[['game_num', 'points', 'true_regime', 'predicted_regime', 
                   'regime_0_prob', 'regime_1_prob', 'regime_2_prob']].head(25))

In [None]:
# Visualize regime probabilities over time
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Top panel: Performance with predicted regimes
ax1 = axes[0]
ax1.plot(recovery_df['game_num'], recovery_df['points'], 'o-', 
         color='gray', alpha=0.5, label='Actual Points')
ax1.axvline(injury_game, color='black', linestyle='--', linewidth=2, 
           label='Injury Event', alpha=0.7)

# Overlay predicted regime means
for regime in [0, 1, 2]:
    mask = recovery_df['predicted_regime'] == regime
    ax1.scatter(recovery_df.loc[mask, 'game_num'],
               recovery_df.loc[mask, 'points'],
               c=['red', 'orange', 'green'][regime],
               s=100, alpha=0.6, edgecolors='black', linewidth=1,
               label=f"Regime {regime}")

ax1.set_ylabel('Points Per Game', fontsize=12)
ax1.set_title('Predicted Recovery Regimes', fontsize=14, fontweight='bold')
ax1.legend(loc='best', fontsize=10)
ax1.grid(True, alpha=0.3)

# Bottom panel: Regime probabilities
ax2 = axes[1]
ax2.plot(recovery_df['game_num'], recovery_df['regime_0_prob'], 
        'r-', linewidth=2, label='Struggling (0)', alpha=0.7)
ax2.plot(recovery_df['game_num'], recovery_df['regime_1_prob'], 
        color='orange', linewidth=2, label='Recovering (1)', alpha=0.7)
ax2.plot(recovery_df['game_num'], recovery_df['regime_2_prob'], 
        'g-', linewidth=2, label='Healthy (2)', alpha=0.7)
ax2.axvline(injury_game, color='black', linestyle='--', linewidth=2, alpha=0.7)
ax2.axhline(0.5, color='gray', linestyle=':', alpha=0.5)

ax2.set_xlabel('Game Number', fontsize=12)
ax2.set_ylabel('Regime Probability', fontsize=12)
ax2.set_title('Regime Probability Evolution', fontsize=14, fontweight='bold')
ax2.legend(loc='best', fontsize=10)
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 1])

plt.tight_layout()
plt.show()

### Model Accuracy Assessment

In [None]:
# Map predicted regimes to match true regimes (they may be permuted)
# Find best mapping by maximizing agreement
from itertools import permutations

best_accuracy = 0
best_mapping = None

for perm in permutations([0, 1, 2]):
    mapping = dict(zip(perm, [1, 2, 3]))  # Map to true regime labels
    mapped_pred = recovery_df['predicted_regime'].map(mapping)
    accuracy = (mapped_pred == recovery_df['true_regime']).mean()
    if accuracy > best_accuracy:
        best_accuracy = accuracy
        best_mapping = mapping

print(f"Best regime mapping: {best_mapping}")
print(f"Classification accuracy: {best_accuracy:.1%}")

# Apply best mapping
recovery_df['predicted_regime_mapped'] = recovery_df['predicted_regime'].map(best_mapping)

# Confusion matrix
from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(recovery_df['true_regime'], recovery_df['predicted_regime_mapped'])

print("\nConfusion Matrix:")
print("="*40)
print("         Predicted")
print("         Struggle  Recover  Healthy")
print(f"Actual:")
print(f"Struggle    {cm[0,0]:>3}      {cm[0,1]:>3}      {cm[0,2]:>3}")
print(f"Recover     {cm[1,0]:>3}      {cm[1,1]:>3}      {cm[1,2]:>3}")
print(f"Healthy     {cm[2,0]:>3}      {cm[2,1]:>3}      {cm[2,2]:>3}")

print("\nClassification Report:")
print(classification_report(recovery_df['true_regime'], 
                           recovery_df['predicted_regime_mapped'],
                           target_names=['Struggling', 'Recovering', 'Healthy']))

## 4. Method 2: Kalman Filter for Trajectory

### What is a Kalman Filter?

The **Kalman Filter** provides optimal state estimation for systems with noisy observations. It's ideal for tracking player performance trajectories in real-time.

### Model

**State Equation**:
$$x_t = A x_{t-1} + w_t, \quad w_t \sim N(0, Q)$$

**Observation Equation**:
$$y_t = H x_t + v_t, \quad v_t \sim N(0, R)$$

Where:
- $x_t$ is the latent true performance level
- $y_t$ is the observed performance (with noise)
- $Q$ is the process noise (how much performance varies game-to-game)
- $R$ is the observation noise (measurement error)

### Benefits

- **Real-time**: Updates state estimate as new data arrives
- **Smooths noise**: Filters out game-to-game variance
- **Uncertainty quantification**: Provides confidence intervals
- **Forecasting**: Can project future performance

---

In [None]:
from pykalman import KalmanFilter

# Initialize Kalman Filter
# We'll use a simple random walk model for the latent performance
kf = KalmanFilter(
    initial_state_mean=recovery_df['points'].iloc[0],
    n_dim_obs=1,
    n_dim_state=1,
    transition_matrices=[1],  # Random walk: x_t = x_{t-1} + noise
    observation_matrices=[1],  # Direct observation: y_t = x_t + noise
    em_vars=['transition_covariance', 'observation_covariance', 'initial_state_covariance']
)

# Fit using EM algorithm
print("Fitting Kalman Filter...\n")
kf = kf.em(recovery_df['points'].values, n_iter=10)

# Get smoothed state estimates
state_means, state_covariances = kf.smooth(recovery_df['points'].values)

# Extract standard errors
state_stds = np.sqrt(state_covariances.squeeze())

# Add to dataframe
recovery_df['kalman_estimate'] = state_means.squeeze()
recovery_df['kalman_lower'] = recovery_df['kalman_estimate'] - 1.96 * state_stds
recovery_df['kalman_upper'] = recovery_df['kalman_estimate'] + 1.96 * state_stds

print("Kalman Filter Parameters:")
print(f"  Process Noise (Q): {kf.transition_covariance[0,0]:.2f}")
print(f"  Observation Noise (R): {kf.observation_covariance[0,0]:.2f}")
print(f"\nSignal-to-Noise Ratio: {kf.transition_covariance[0,0] / kf.observation_covariance[0,0]:.2f}")

In [None]:
# Visualize Kalman filter trajectory
fig, ax = plt.subplots(figsize=(14, 6))

# Plot observed data
ax.plot(recovery_df['game_num'], recovery_df['points'], 'o', 
       color='gray', alpha=0.4, label='Observed Performance', markersize=6)

# Plot Kalman smoothed estimate
ax.plot(recovery_df['game_num'], recovery_df['kalman_estimate'], 
       'b-', linewidth=3, label='Kalman Smoothed', alpha=0.8)

# Plot confidence interval
ax.fill_between(recovery_df['game_num'], 
               recovery_df['kalman_lower'],
               recovery_df['kalman_upper'],
               color='blue', alpha=0.2, label='95% Confidence Interval')

# Mark injury
ax.axvline(injury_game, color='black', linestyle='--', linewidth=2, 
          label='Injury Event', alpha=0.7)

# Add regime thresholds
ax.axhline(15, color='red', linestyle=':', alpha=0.4, label='Struggling Threshold')
ax.axhline(22, color='orange', linestyle=':', alpha=0.4, label='Recovering Threshold')
ax.axhline(28, color='green', linestyle=':', alpha=0.4, label='Healthy Threshold')

ax.set_xlabel('Game Number', fontsize=12)
ax.set_ylabel('Points Per Game', fontsize=12)
ax.set_title('Kalman Filter: Smoothed Performance Trajectory', fontsize=14, fontweight='bold')
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

### Kalman Filter Insights

The Kalman filter provides:
1. **Smooth trajectory**: Filters out game-to-game noise
2. **Confidence intervals**: Quantifies uncertainty at each point
3. **Real-time updates**: Can be updated incrementally as new games occur
4. **Trend detection**: Identifies upward/downward momentum

## 5. Method 3: Structural Decomposition

### Structural Time Series Models

Decompose performance into interpretable components:

$$y_t = \mu_t + \gamma_t + \epsilon_t$$

Where:
- $\mu_t$ is the **trend** (long-term recovery trajectory)
- $\gamma_t$ is the **seasonal** component (back-to-back games, home/away)
- $\epsilon_t$ is the **irregular** component (random noise)

This helps separate:
- **Systematic recovery** (trend)
- **External factors** (seasonal)
- **Random variation** (irregular)

---

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Perform seasonal decomposition
# Note: We use period=10 (arbitrary) to capture short-term cycles
decomposition = seasonal_decompose(
    recovery_df['points'], 
    model='additive', 
    period=10,
    extrapolate_trend='freq'
)

# Add components to dataframe
recovery_df['trend'] = decomposition.trend
recovery_df['seasonal'] = decomposition.seasonal
recovery_df['residual'] = decomposition.resid

print("Structural Decomposition Complete\n")
print("Component Statistics:")
print(f"  Trend range: [{recovery_df['trend'].min():.1f}, {recovery_df['trend'].max():.1f}]")
print(f"  Seasonal range: [{recovery_df['seasonal'].min():.1f}, {recovery_df['seasonal'].max():.1f}]")
print(f"  Residual std: {recovery_df['residual'].std():.2f}")

In [None]:
# Visualize decomposition
fig, axes = plt.subplots(4, 1, figsize=(14, 12), sharex=True)

# Original series
axes[0].plot(recovery_df['game_num'], recovery_df['points'], 'k-', linewidth=1.5)
axes[0].axvline(injury_game, color='red', linestyle='--', alpha=0.5)
axes[0].set_ylabel('Observed', fontsize=11)
axes[0].set_title('Structural Decomposition of Recovery', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)

# Trend component
axes[1].plot(recovery_df['game_num'], recovery_df['trend'], 'b-', linewidth=2)
axes[1].axvline(injury_game, color='red', linestyle='--', alpha=0.5)
axes[1].axhline(15, color='red', linestyle=':', alpha=0.3)
axes[1].axhline(22, color='orange', linestyle=':', alpha=0.3)
axes[1].axhline(28, color='green', linestyle=':', alpha=0.3)
axes[1].set_ylabel('Trend', fontsize=11)
axes[1].grid(True, alpha=0.3)

# Seasonal component
axes[2].plot(recovery_df['game_num'], recovery_df['seasonal'], 'g-', linewidth=1.5)
axes[2].axvline(injury_game, color='red', linestyle='--', alpha=0.5)
axes[2].axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=1)
axes[2].set_ylabel('Seasonal', fontsize=11)
axes[2].grid(True, alpha=0.3)

# Residual component
axes[3].plot(recovery_df['game_num'], recovery_df['residual'], 'r-', linewidth=1, alpha=0.7)
axes[3].axvline(injury_game, color='red', linestyle='--', alpha=0.5)
axes[3].axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=1)
axes[3].set_ylabel('Residual', fontsize=11)
axes[3].set_xlabel('Game Number', fontsize=12)
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Recovery Timeline Prediction

### Current Status Assessment

We'll create a function to assess recovery status at any point in time.

In [None]:
def assess_recovery_status(game_num, recovery_df):
    """
    Assess player recovery status at a given game number.
    """
    if game_num > len(recovery_df):
        return "Game number exceeds available data"
    
    row = recovery_df.iloc[game_num - 1]
    
    # Get regime probabilities
    prob_struggling = row['regime_0_prob']
    prob_recovering = row['regime_1_prob']
    prob_healthy = row['regime_2_prob']
    
    # Get Kalman estimate
    kalman_est = row['kalman_estimate']
    kalman_ci = (row['kalman_lower'], row['kalman_upper'])
    
    # Determine most likely regime
    probs = [prob_struggling, prob_recovering, prob_healthy]
    regime_names = ['Struggling', 'Recovering', 'Healthy']
    most_likely_regime = regime_names[np.argmax(probs)]
    
    print(f"Recovery Status Assessment - Game {game_num}")
    print("="*60)
    print(f"\nObserved Performance: {row['points']:.1f} points")
    print(f"Kalman Estimate: {kalman_est:.1f} points (95% CI: [{kalman_ci[0]:.1f}, {kalman_ci[1]:.1f}])")
    print(f"\nMost Likely Regime: {most_likely_regime} ({max(probs):.1%} probability)")
    print(f"\nRegime Probabilities:")
    print(f"  Struggling:  {prob_struggling:.1%}")
    print(f"  Recovering:  {prob_recovering:.1%}")
    print(f"  Healthy:     {prob_healthy:.1%}")
    
    # Recovery recommendation
    if prob_healthy > 0.7:
        print(f"\n‚úÖ RECOMMENDATION: Player appears fully recovered")
    elif prob_recovering > 0.5:
        print(f"\n‚ö†Ô∏è  RECOMMENDATION: Player is recovering, monitor closely")
    else:
        print(f"\nüö´ RECOMMENDATION: Player is struggling, consider load management")
    
    return {
        'regime': most_likely_regime,
        'kalman_estimate': kalman_est,
        'probabilities': dict(zip(regime_names, probs))
    }

# Example assessments at key timepoints
print("\n" + "#"*60)
print("# Assessment Immediately Post-Injury")
print("#"*60 + "\n")
assess_recovery_status(injury_game + 5, recovery_df)

print("\n\n" + "#"*60)
print("# Assessment Mid-Recovery")
print("#"*60 + "\n")
assess_recovery_status(injury_game + 25, recovery_df)

print("\n\n" + "#"*60)
print("# Assessment Post-Recovery")
print("#"*60 + "\n")
assess_recovery_status(injury_game + 45, recovery_df)

### Expected Time to Full Recovery

In [None]:
def estimate_recovery_timeline(current_game, recovery_df, transition_matrix, healthy_threshold=0.8):
    """
    Estimate games until full recovery based on current regime and transition probabilities.
    """
    if current_game > len(recovery_df):
        return "Game number exceeds available data"
    
    # Get current regime probabilities
    current_probs = recovery_df.iloc[current_game - 1][['regime_0_prob', 'regime_1_prob', 'regime_2_prob']].values
    
    # Simulate forward using transition matrix
    max_horizon = 50  # Maximum games to simulate
    prob_healthy_over_time = []
    
    state_probs = current_probs.copy()
    
    for t in range(max_horizon):
        # Transition to next period
        state_probs = state_probs @ transition_matrix
        prob_healthy_over_time.append(state_probs[2])  # Regime 2 = Healthy
        
        # Stop if we reach healthy threshold
        if state_probs[2] >= healthy_threshold:
            break
    
    # Find expected time to recovery
    if state_probs[2] >= healthy_threshold:
        games_to_recovery = t + 1
    else:
        games_to_recovery = None  # Did not recover in horizon
    
    print(f"Recovery Timeline Estimate - Starting from Game {current_game}")
    print("="*60)
    print(f"\nCurrent Regime Probabilities:")
    print(f"  Struggling:  {current_probs[0]:.1%}")
    print(f"  Recovering:  {current_probs[1]:.1%}")
    print(f"  Healthy:     {current_probs[2]:.1%}")
    
    if games_to_recovery:
        print(f"\nExpected Games to Recovery (>{healthy_threshold:.0%} healthy prob): {games_to_recovery}")
        print(f"Expected Recovery Date: Game {current_game + games_to_recovery}")
    else:
        print(f"\nRecovery not expected within {max_horizon} games (may need medical intervention)")
    
    return games_to_recovery

# Example timeline estimates
estimate_recovery_timeline(injury_game + 10, recovery_df, transition_matrix)
print("\n")
estimate_recovery_timeline(injury_game + 25, recovery_df, transition_matrix)

## 7. Business Recommendations

### For Medical/Training Staff

1. **Load Management**
   - If `prob_struggling > 0.5`: Reduce minutes, implement rest days
   - If `prob_recovering > 0.5`: Monitor workload, avoid back-to-backs
   - If `prob_healthy > 0.8`: Resume normal workload

2. **Return-to-Play Decisions**
   - Use Kalman filter confidence intervals to assess stability
   - Require 5+ consecutive games with `prob_healthy > 0.7`
   - Monitor for regression (sudden drop in Kalman estimate)

3. **Personalized Recovery Plans**
   - Use player-specific transition matrices (fit per player)
   - Identify players with slow recovery (low $p_{12}$, $p_{23}$)
   - Flag regression risk (high $p_{32}$, $p_{21}$)

---

### For Coaching Staff

4. **Rotation Management**
   - Adjust playing time based on regime probabilities
   - Avoid high-pressure situations during recovering phase
   - Gradually increase usage as confidence in recovery grows

5. **Game Planning**
   - Reduce offensive load (fewer plays called for recovering player)
   - Assign easier defensive matchups
   - Monitor in-game fatigue more closely

---

### For Front Office

6. **Contract Decisions**
   - Players with fast recovery profiles are more valuable
   - Include injury recovery metrics in contract negotiations
   - Structure incentives around health/availability

7. **Trade Deadline**
   - Assess recovery status before trading for injured players
   - Use recovery timeline estimates to value acquisitions
   - Avoid players with high regression risk

8. **Insurance & Salary Cap**
   - Use regime probabilities to trigger insurance claims
   - Document recovery progression for medical exceptions
   - Optimize salary cap relief timing

---

## 8. Production Deployment

### Real-Time Monitoring System

```python
# Pseudo-code for production deployment

class InjuryRecoveryMonitor:
    def __init__(self, player_id, injury_date):
        self.player_id = player_id
        self.injury_date = injury_date
        self.ms_model = None  # Fit Markov Switching model
        self.kf = KalmanFilter()  # Initialize Kalman filter
    
    def update_after_game(self, game_stats):
        """Update models with new game data"""
        # Add new data point
        self.data.append(game_stats)
        
        # Re-fit Markov model (or use online learning)
        self.ms_model.update(game_stats)
        
        # Update Kalman filter
        self.kf.filter_update(game_stats['points'])
        
        # Get current status
        regime_probs = self.get_regime_probabilities()
        kalman_est = self.kf.current_state_estimate
        
        return {
            'regime': regime_probs,
            'kalman_estimate': kalman_est,
            'recommendation': self.generate_recommendation(regime_probs)
        }
    
    def generate_alert(self, regime_probs):
        """Generate alert if regression detected"""
        if regime_probs['struggling'] > 0.3 and self.days_since_injury > 30:
            return Alert(
                severity='HIGH',
                message=f"{self.player_id} showing signs of regression",
                action='Medical evaluation recommended'
            )
```

---

### Dashboard Visualization

**Key Metrics to Display**:
1. Current regime probabilities (gauge chart)
2. Kalman-smoothed performance trajectory (line chart)
3. Days since injury / expected days to recovery (progress bar)
4. Regime transition history (timeline)
5. Comparison to typical recovery curve (benchmark)

---

### Automated Alerts

**Trigger Conditions**:
- Regression detected: `prob_struggling` increases after being < 0.1
- Slow recovery: In `recovering` regime for > expected duration
- Full recovery: `prob_healthy > 0.8` for 5+ consecutive games
- Unexpected setback: Kalman estimate drops by > 2 std devs

---

### Model Retraining Schedule

- **Kalman Filter**: Update after every game (online learning)
- **Markov Switching**: Refit every 10 games to update transition probabilities
- **Regime Parameters**: Refit monthly to capture seasonal effects

---

## 9. Summary

### What We Learned

1. **Markov Switching Models**
   - Detect distinct recovery regimes (struggling, recovering, healthy)
   - Estimate regime-specific parameters (mean, variance)
   - Quantify transition probabilities between regimes
   - Calculate expected regime durations

2. **Kalman Filtering**
   - Smooth noisy performance data
   - Provide real-time state estimates with confidence intervals
   - Ideal for real-time monitoring and forecasting

3. **Structural Decomposition**
   - Separate trend, seasonal, and irregular components
   - Isolate systematic recovery from random variation
   - Identify external factors affecting performance

---

### Business Impact

**Medical Staff**:
- Objective return-to-play decisions
- Early detection of complications
- Personalized recovery timelines

**Coaching Staff**:
- Data-driven rotation management
- Optimized workload during recovery
- Reduced re-injury risk

**Front Office**:
- Better contract valuation
- Informed trade deadline decisions
- Optimized insurance/cap relief timing

---

### Key Metrics

- **Regime Probabilities**: P(Struggling), P(Recovering), P(Healthy)
- **Kalman Estimate**: Smoothed performance level with 95% CI
- **Expected Recovery Time**: Games until P(Healthy) > 0.8
- **Regime Durations**: Average games spent in each regime

---

### When to Use This Method

‚úÖ **Use Markov Switching When**:
- Performance has distinct phases/regimes
- Transitions between regimes are expected
- You need probability of being in each regime

‚úÖ **Use Kalman Filter When**:
- You need real-time monitoring
- Data is noisy but underlying state evolves smoothly
- You want confidence intervals on estimates

‚úÖ **Use Structural Decomposition When**:
- You need to separate trend from noise
- External factors (schedule, rest) affect performance
- You want interpretable components

---

### Next Steps

1. **Extend to Multiple Players**: Fit player-specific models
2. **Add Covariates**: Include injury type, age, position
3. **Compare Injury Types**: ACL vs. ankle vs. shoulder recovery
4. **Survival Analysis**: Model probability of re-injury (see Notebook 2)
5. **Real-Time Dashboard**: Build monitoring system for medical staff

---

### Related Notebooks

- **Notebook 1**: Player Performance Trend Analysis (time series methods)
- **Notebook 2**: Career Longevity Modeling (survival analysis)
- **Notebook 3**: Coaching Change Causal Impact (causal inference)
- **Notebook 5**: Team Chemistry Factor Analysis (dynamic factor models)

---

### Further Reading

- Hamilton (1989): "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle" (Markov Switching)
- Kalman (1960): "A New Approach to Linear Filtering and Prediction Problems"
- Durbin & Koopman (2012): "Time Series Analysis by State Space Methods"
- Kim & Nelson (1999): "State-Space Models with Regime Switching"

---

**End of Notebook 4** üèÄüìä