# Notebook 5: Team Chemistry Factor Analysis

**Use Case**: Quantify latent "team chemistry" and identify chemistry contributors  
**Methods**: Dynamic Factor Models, Panel Data Analysis, Hierarchical Models  
**Business Value**: Roster decisions, trade impact assessment, chemistry-talent decomposition

---

## Table of Contents

1. [Problem Statement](#1-problem-statement)
2. [Data Setup](#2-data-setup)
3. [Method 1: Dynamic Factor Models](#3-method-1-dynamic-factor-models)
4. [Method 2: Player-Specific Loadings](#4-method-2-player-specific-loadings)
5. [Method 3: Panel Data Regression](#5-method-3-panel-data-regression)
6. [Chemistry vs. Talent Decomposition](#6-chemistry-vs-talent-decomposition)
7. [Business Recommendations](#7-business-recommendations)
8. [Production Deployment](#8-production-deployment)
9. [Summary](#9-summary)

---

## 1. Problem Statement

### The Challenge

NBA teams with high individual talent sometimes underperform, while teams with less talent overachieve. The missing factor is often **"chemistry"** - a latent, unobserved quality reflecting:
- Player compatibility and communication
- Willingness to sacrifice individual stats for team success
- Trust and cohesion in high-pressure moments
- Locker room culture and leadership

### Key Questions

1. **Measurement**: Can we quantify team chemistry objectively?
2. **Attribution**: Which players contribute most to team chemistry?
3. **Prediction**: How does chemistry affect team performance?
4. **Intervention**: How can front offices build/maintain chemistry?

### Why Traditional Methods Fail

- **Win-loss record**: Confounded by talent level
- **Plus/minus**: Doesn't isolate chemistry from skill
- **Surveys**: Subjective, biased, not real-time

### Our Solution

**Dynamic Factor Models** extract a latent "chemistry factor" from observable team performance metrics (assists, turnovers, defensive rating, etc.). We model:

$$\text{Team Performance}_t = \text{Talent}_t + \lambda \cdot \text{Chemistry}_t + \epsilon_t$$

Where:
- $\text{Chemistry}_t$ is the unobserved latent factor
- $\lambda$ is the factor loading (how much chemistry matters)
- $\text{Talent}_t$ is measurable individual skill

---

## 2. Data Setup

We'll generate synthetic team performance data with:
- **Observable metrics**: Assists, turnovers, defensive rating, pace, +/-
- **Latent chemistry factor**: Unobserved quality affecting all metrics
- **Player-specific chemistry contributions**: Some players boost chemistry more than others

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from scipy import stats

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Seed for reproducibility
np.random.seed(42)

In [None]:
def generate_team_chemistry_data(n_games=82, n_players=12):
    """
    Generate synthetic team performance data with latent chemistry factor.
    
    Observed metrics:
    - Assist rate (assists per 100 possessions)
    - Turnover rate (turnovers per 100 possessions)
    - Defensive rating (points allowed per 100 possessions)
    - Net rating (point differential per 100 possessions)
    - Pace (possessions per 48 minutes)
    
    Latent factors:
    - Team chemistry (unobserved, evolves over season)
    - Player chemistry contributions
    """
    dates = [datetime(2024, 10, 1) + timedelta(days=2*i) for i in range(n_games)]
    
    # Generate latent chemistry factor (evolves over time)
    # Starts low, builds throughout season, dips mid-season
    t = np.linspace(0, 1, n_games)
    chemistry_trend = 0.3 + 0.5 * t - 0.2 * np.sin(4 * np.pi * t)  # Non-linear evolution
    chemistry = chemistry_trend + np.random.normal(0, 0.05, n_games)  # Add noise
    chemistry = np.clip(chemistry, 0, 1)  # Keep in [0, 1]
    
    # Factor loadings (how much each metric depends on chemistry)
    loading_assists = 10  # High chemistry ‚Üí more assists
    loading_turnovers = -5  # High chemistry ‚Üí fewer turnovers
    loading_def_rating = -8  # High chemistry ‚Üí better defense
    loading_net_rating = 6  # High chemistry ‚Üí better net rating
    loading_pace = 2  # High chemistry ‚Üí slightly faster pace
    
    # Generate observed metrics as function of chemistry + noise
    assists = 20 + loading_assists * chemistry + np.random.normal(0, 2, n_games)
    turnovers = 15 + loading_turnovers * chemistry + np.random.normal(0, 1.5, n_games)
    def_rating = 110 + loading_def_rating * chemistry + np.random.normal(0, 3, n_games)
    net_rating = 0 + loading_net_rating * chemistry + np.random.normal(0, 2.5, n_games)
    pace = 98 + loading_pace * chemistry + np.random.normal(0, 2, n_games)
    
    # Team wins (function of net rating + chemistry)
    win_prob = 0.5 + 0.02 * net_rating + 0.1 * chemistry
    wins = np.random.binomial(1, win_prob)
    
    df = pd.DataFrame({
        'game_date': dates,
        'game_num': range(1, n_games + 1),
        'assists': assists,
        'turnovers': turnovers,
        'def_rating': def_rating,
        'net_rating': net_rating,
        'pace': pace,
        'win': wins,
        'true_chemistry': chemistry  # Ground truth (unobserved in practice)
    })
    
    return df

# Generate data
team_df = generate_team_chemistry_data()

print("Team Performance Data:")
print(team_df.head(10))
print(f"\nTotal games: {len(team_df)}")
print(f"Win-Loss Record: {team_df['win'].sum()}-{(1-team_df['win']).sum()}")

In [None]:
# Visualize observable metrics over time
fig, axes = plt.subplots(3, 2, figsize=(16, 12))

# Plot each metric
metrics = [
    ('assists', 'Assist Rate', 'green'),
    ('turnovers', 'Turnover Rate', 'red'),
    ('def_rating', 'Defensive Rating', 'blue'),
    ('net_rating', 'Net Rating', 'purple'),
    ('pace', 'Pace', 'orange'),
    ('true_chemistry', 'True Chemistry (Latent)', 'black')
]

for ax, (col, label, color) in zip(axes.flat, metrics):
    ax.plot(team_df['game_num'], team_df[col], 'o-', color=color, alpha=0.6, markersize=4)
    
    # Add trend line
    z = np.polyfit(team_df['game_num'], team_df[col], 2)
    p = np.poly1d(z)
    ax.plot(team_df['game_num'], p(team_df['game_num']), '--', 
           color=color, linewidth=2, alpha=0.8, label='Trend')
    
    ax.set_xlabel('Game Number', fontsize=11)
    ax.set_ylabel(label, fontsize=11)
    ax.set_title(f'{label} Over Season', fontsize=12, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.legend(loc='best')

plt.tight_layout()
plt.show()

print("\nMetric Correlations with True Chemistry:")
print("="*50)
for col in ['assists', 'turnovers', 'def_rating', 'net_rating', 'pace']:
    corr = team_df[col].corr(team_df['true_chemistry'])
    print(f"{col:15s}: {corr:+.3f}")

## 3. Method 1: Dynamic Factor Models

### What is a Dynamic Factor Model?

A **Dynamic Factor Model** extracts latent (unobserved) factors from multiple observed time series. The model assumes:

$$X_t = \Lambda F_t + \epsilon_t$$

Where:
- $X_t$ = $(assists_t, turnovers_t, def\_rating_t, ...)^T$ is the vector of observable metrics
- $F_t$ is the latent factor (chemistry)
- $\Lambda$ is the factor loading matrix (how much each metric depends on chemistry)
- $\epsilon_t$ is idiosyncratic noise (metric-specific variation)

### Factor Dynamics

The latent factor evolves over time:

$$F_t = \phi F_{t-1} + \eta_t, \quad \eta_t \sim N(0, \sigma^2_\eta)$$

This captures chemistry's gradual evolution throughout the season.

### Interpretation

- **Factor $F_t$**: Latent chemistry index at time $t$
- **Loadings $\lambda_i$**: How much metric $i$ responds to chemistry
- **Positive loading**: Metric increases with chemistry (e.g., assists)
- **Negative loading**: Metric decreases with chemistry (e.g., turnovers)

---

### Fitting Dynamic Factor Model

We'll use `statsmodels.tsa.statespace.dynamic_factor.DynamicFactor` to fit the model.

In [None]:
from statsmodels.tsa.statespace.dynamic_factor import DynamicFactor
from statsmodels.tsa.api import VAR

# Prepare data: standardize each metric (mean 0, std 1)
# This puts all metrics on same scale for factor analysis
metrics_to_analyze = ['assists', 'turnovers', 'def_rating', 'net_rating', 'pace']
X = team_df[metrics_to_analyze].copy()

# Standardize
X_standardized = (X - X.mean()) / X.std()

print("Standardized Metrics:")
print(X_standardized.describe())
print("\nFitting Dynamic Factor Model with 1 latent factor...")
print("(This may take 30-60 seconds)\n")

In [None]:
# Fit Dynamic Factor Model
dfm = DynamicFactor(
    X_standardized,
    k_factors=1,  # Extract 1 latent factor (chemistry)
    factor_order=1  # AR(1) dynamics for the factor
)

dfm_result = dfm.fit(maxiter=1000, disp=False)

print("Model fitted successfully!\n")
print(dfm_result.summary())

### Extracting the Chemistry Factor

In [None]:
# Extract latent factor (chemistry estimate)
chemistry_estimate = dfm_result.factors.filtered[0]

# Add to dataframe
team_df['estimated_chemistry'] = chemistry_estimate.values

# Rescale to [0, 1] for interpretability
team_df['estimated_chemistry_scaled'] = (
    (team_df['estimated_chemistry'] - team_df['estimated_chemistry'].min()) /
    (team_df['estimated_chemistry'].max() - team_df['estimated_chemistry'].min())
)

print("Estimated Chemistry Factor:")
print(team_df[['game_num', 'true_chemistry', 'estimated_chemistry', 'estimated_chemistry_scaled']].head(15))

In [None]:
# Extract factor loadings
loadings = dfm_result.params[dfm_result.param_names.str.contains('loading')]

print("Factor Loadings (How Each Metric Responds to Chemistry):")
print("="*60)
for i, metric in enumerate(metrics_to_analyze):
    loading = loadings.iloc[i]
    print(f"{metric:15s}: {loading:+.3f}  {'(‚Üë with chemistry)' if loading > 0 else '(‚Üì with chemistry)'}")

print("\nInterpretation:")
print("  Positive loading ‚Üí metric increases as chemistry improves")
print("  Negative loading ‚Üí metric decreases as chemistry improves")
print("  Larger |loading| ‚Üí metric more sensitive to chemistry")

In [None]:
# Compare estimated vs. true chemistry
fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Top panel: True vs. estimated chemistry
ax1 = axes[0]
ax1.plot(team_df['game_num'], team_df['true_chemistry'], 'k-', 
        linewidth=2, label='True Chemistry (Ground Truth)', alpha=0.7)
ax1.plot(team_df['game_num'], team_df['estimated_chemistry_scaled'], 'r--', 
        linewidth=2, label='Estimated Chemistry (DFM)', alpha=0.7)
ax1.set_ylabel('Chemistry Index', fontsize=12)
ax1.set_title('True vs. Estimated Team Chemistry', fontsize=14, fontweight='bold')
ax1.legend(loc='best', fontsize=11)
ax1.grid(True, alpha=0.3)

# Bottom panel: Correlation over time (rolling window)
ax2 = axes[1]
window = 20
rolling_corr = [
    team_df['true_chemistry'].iloc[i:i+window].corr(
        team_df['estimated_chemistry_scaled'].iloc[i:i+window]
    )
    for i in range(len(team_df) - window + 1)
]
ax2.plot(team_df['game_num'].iloc[window-1:], rolling_corr, 'b-', linewidth=2)
ax2.axhline(0.7, color='green', linestyle=':', alpha=0.5, label='Good Correlation (0.7)')
ax2.set_xlabel('Game Number', fontsize=12)
ax2.set_ylabel(f'Rolling Correlation ({window}-game)', fontsize=12)
ax2.set_title('Model Accuracy Over Season', fontsize=14, fontweight='bold')
ax2.legend(loc='best', fontsize=11)
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 1])

plt.tight_layout()
plt.show()

# Overall correlation
overall_corr = team_df['true_chemistry'].corr(team_df['estimated_chemistry_scaled'])
print(f"\nOverall Correlation (True vs. Estimated): {overall_corr:.3f}")
print(f"Model captures {overall_corr**2:.1%} of chemistry variance")

## 4. Method 2: Player-Specific Loadings

### Identifying Chemistry Contributors

Not all players contribute equally to team chemistry. We can model player-specific chemistry effects by analyzing lineup data.

### Lineup-Based Chemistry Model

$$\text{Chemistry}_t = \sum_{i \in \text{Lineup}_t} \alpha_i + \beta_t$$

Where:
- $\alpha_i$ is player $i$'s chemistry contribution
- $\beta_t$ is baseline team chemistry
- $\text{Lineup}_t$ is the set of players on court at time $t$

### Interpretation

- **Positive $\alpha_i$**: Player $i$ boosts team chemistry ("chemistry guy")
- **Negative $\alpha_i$**: Player $i$ hurts team chemistry ("locker room issue")
- **Zero $\alpha_i$**: Player $i$ is neutral

---

In [None]:
def generate_player_lineup_data(team_df, n_players=12):
    """
    Generate synthetic lineup data with player-specific chemistry contributions.
    """
    # Generate player chemistry contributions
    # Most players are neutral (0), a few are chemistry guys (+), a few are negative (-)
    player_ids = [f"Player_{i+1}" for i in range(n_players)]
    true_player_chemistry = np.random.choice(
        [0.15, 0.05, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.05, -0.10],
        size=n_players,
        replace=True
    )
    
    player_chemistry_map = dict(zip(player_ids, true_player_chemistry))
    
    # Generate lineup for each game (5 players)
    lineup_data = []
    for game_num in team_df['game_num']:
        # Sample 5 players for starting lineup
        lineup = np.random.choice(player_ids, size=5, replace=False)
        
        # Calculate lineup chemistry as sum of player contributions
        lineup_chemistry = sum([player_chemistry_map[p] for p in lineup])
        
        lineup_data.append({
            'game_num': game_num,
            'lineup': tuple(sorted(lineup)),
            'lineup_chemistry': lineup_chemistry
        })
    
    lineup_df = pd.DataFrame(lineup_data)
    
    # Merge with team data
    team_df_with_lineup = team_df.merge(lineup_df, on='game_num')
    
    return team_df_with_lineup, player_chemistry_map

# Generate player data
team_df_lineup, true_player_chemistry = generate_player_lineup_data(team_df)

print("True Player Chemistry Contributions:")
print("="*50)
for player, contribution in sorted(true_player_chemistry.items(), key=lambda x: -x[1]):
    emoji = "üåü" if contribution > 0.1 else "‚úÖ" if contribution > 0 else "‚ö†Ô∏è" if contribution < 0 else "‚ûñ"
    print(f"{player:12s}: {contribution:+.3f}  {emoji}")

print("\nSample Lineups:")
print(team_df_lineup[['game_num', 'lineup', 'lineup_chemistry', 'estimated_chemistry_scaled']].head(10))

### Estimating Player Chemistry Effects

We'll use regression to estimate each player's chemistry contribution from lineup data.

In [None]:
from sklearn.linear_model import LinearRegression

# Create player indicators (one-hot encoding)
# For each game, indicate which players were in the lineup
player_ids = list(true_player_chemistry.keys())
player_indicators = pd.DataFrame(0, index=team_df_lineup.index, columns=player_ids)

for idx, row in team_df_lineup.iterrows():
    for player in row['lineup']:
        player_indicators.loc[idx, player] = 1

# Fit regression: Chemistry ~ Player1 + Player2 + ... + Player12
X_players = player_indicators.values
y_chemistry = team_df_lineup['estimated_chemistry_scaled'].values

model = LinearRegression(fit_intercept=True)
model.fit(X_players, y_chemistry)

# Extract player coefficients
estimated_player_chemistry = dict(zip(player_ids, model.coef_))

print("Estimated Player Chemistry Contributions:")
print("="*50)
print(f"{'Player':<15} {'True':>10} {'Estimated':>10} {'Error':>10}")
print("-" * 50)

for player in sorted(estimated_player_chemistry.keys(), 
                    key=lambda p: -estimated_player_chemistry[p]):
    true_val = true_player_chemistry[player]
    est_val = estimated_player_chemistry[player]
    error = est_val - true_val
    print(f"{player:<15} {true_val:>+10.3f} {est_val:>+10.3f} {error:>+10.3f}")

# Overall accuracy
true_values = [true_player_chemistry[p] for p in player_ids]
est_values = [estimated_player_chemistry[p] for p in player_ids]
corr = np.corrcoef(true_values, est_values)[0, 1]
rmse = np.sqrt(np.mean([(t - e)**2 for t, e in zip(true_values, est_values)]))

print("\n" + "="*50)
print(f"Correlation (True vs. Estimated): {corr:.3f}")
print(f"RMSE: {rmse:.3f}")

In [None]:
# Visualize player chemistry contributions
fig, ax = plt.subplots(figsize=(12, 6))

players = list(estimated_player_chemistry.keys())
true_vals = [true_player_chemistry[p] for p in players]
est_vals = [estimated_player_chemistry[p] for p in players]

x = np.arange(len(players))
width = 0.35

ax.bar(x - width/2, true_vals, width, label='True Chemistry', color='steelblue', alpha=0.7)
ax.bar(x + width/2, est_vals, width, label='Estimated Chemistry', color='coral', alpha=0.7)

ax.axhline(0, color='black', linewidth=1, linestyle='-', alpha=0.3)
ax.set_xlabel('Player', fontsize=12)
ax.set_ylabel('Chemistry Contribution', fontsize=12)
ax.set_title('Player Chemistry Contributions: True vs. Estimated', fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(players, rotation=45, ha='right')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 5. Method 3: Panel Data Regression

### Relating Chemistry to Team Success

Now that we've quantified chemistry, we can estimate its causal effect on winning.

### Model

$$P(\text{Win}_t) = \text{logit}^{-1}(\beta_0 + \beta_1 \cdot \text{NetRating}_t + \beta_2 \cdot \text{Chemistry}_t)$$

Where:
- $\beta_1$ captures the effect of talent (net rating)
- $\beta_2$ captures the **incremental effect of chemistry** beyond talent

### Interpretation

- **$\beta_2 > 0$**: Chemistry increases win probability, controlling for talent
- **$\beta_2 = 0$**: Chemistry doesn't matter (only talent matters)
- **$\beta_2 < 0$**: Chemistry hurts (unlikely, but possible with bad leadership)

---

In [None]:
from sklearn.linear_model import LogisticRegression
from scipy.stats import norm

# Prepare features
X_features = team_df[['net_rating', 'estimated_chemistry_scaled']].values
y_wins = team_df['win'].values

# Fit logistic regression
logit_model = LogisticRegression()
logit_model.fit(X_features, y_wins)

# Extract coefficients
beta_net_rating = logit_model.coef_[0, 0]
beta_chemistry = logit_model.coef_[0, 1]
intercept = logit_model.intercept_[0]

print("Logistic Regression: Win Probability ~ Net Rating + Chemistry")
print("="*60)
print(f"\nCoefficients:")
print(f"  Intercept:       {intercept:+.3f}")
print(f"  Net Rating:      {beta_net_rating:+.3f}  (talent effect)")
print(f"  Chemistry:       {beta_chemistry:+.3f}  (chemistry effect)")

# Model accuracy
y_pred = logit_model.predict(X_features)
accuracy = (y_pred == y_wins).mean()
print(f"\nModel Accuracy: {accuracy:.1%}")

# Calculate marginal effects
# For logistic regression: marginal effect = beta * mean_probability * (1 - mean_probability)
mean_prob = logit_model.predict_proba(X_features)[:, 1].mean()
marginal_net_rating = beta_net_rating * mean_prob * (1 - mean_prob)
marginal_chemistry = beta_chemistry * mean_prob * (1 - mean_prob)

print(f"\nMarginal Effects (at mean):")
print(f"  1-point increase in Net Rating ‚Üí {marginal_net_rating:.1%} higher win probability")
print(f"  0.1-point increase in Chemistry ‚Üí {marginal_chemistry*0.1:.1%} higher win probability")

In [None]:
# Visualize win probability as function of chemistry (holding net rating constant)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Left panel: Win probability vs. chemistry
ax1 = axes[0]
chemistry_range = np.linspace(0, 1, 100)
mean_net_rating = team_df['net_rating'].mean()

# Calculate win probability at mean net rating, varying chemistry
X_sim = np.column_stack([np.full(100, mean_net_rating), chemistry_range])
win_prob_sim = logit_model.predict_proba(X_sim)[:, 1]

ax1.plot(chemistry_range, win_prob_sim, 'b-', linewidth=3)
ax1.scatter(team_df['estimated_chemistry_scaled'], team_df['win'], 
           alpha=0.3, color='red', s=50, label='Actual Games')
ax1.set_xlabel('Team Chemistry', fontsize=12)
ax1.set_ylabel('Win Probability', fontsize=12)
ax1.set_title(f'Win Probability vs. Chemistry\n(Net Rating = {mean_net_rating:.1f})', 
             fontsize=14, fontweight='bold')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)
ax1.set_ylim([0, 1])

# Right panel: Decomposition of wins (talent vs. chemistry)
ax2 = axes[1]

# Predict win probability with only net rating (no chemistry)
X_talent_only = np.column_stack([team_df['net_rating'], np.zeros(len(team_df))])
win_prob_talent = logit_model.predict_proba(X_talent_only)[:, 1]

# Predict win probability with net rating + chemistry
win_prob_full = logit_model.predict_proba(X_features)[:, 1]

# Chemistry contribution = difference
chemistry_contribution = win_prob_full - win_prob_talent

ax2.plot(team_df['game_num'], win_prob_talent, 'g-', linewidth=2, 
        label='Talent Only', alpha=0.7)
ax2.plot(team_df['game_num'], win_prob_full, 'b-', linewidth=2, 
        label='Talent + Chemistry', alpha=0.7)
ax2.fill_between(team_df['game_num'], win_prob_talent, win_prob_full, 
                alpha=0.3, color='orange', label='Chemistry Boost')
ax2.set_xlabel('Game Number', fontsize=12)
ax2.set_ylabel('Win Probability', fontsize=12)
ax2.set_title('Decomposition: Talent vs. Chemistry', fontsize=14, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)
ax2.set_ylim([0, 1])

plt.tight_layout()
plt.show()

print(f"\nAverage Chemistry Boost: {chemistry_contribution.mean():.1%}")
print(f"Max Chemistry Boost: {chemistry_contribution.max():.1%}")
print(f"Min Chemistry Boost: {chemistry_contribution.min():.1%}")

## 6. Chemistry vs. Talent Decomposition

### Quantifying Relative Importance

We can decompose team success into:
1. **Talent**: Explained by individual skill (net rating)
2. **Chemistry**: Explained by team cohesion (chemistry index)
3. **Residual**: Unexplained variance (luck, coaching, etc.)

This helps answer: "Is this team overachieving due to chemistry or underachieving despite talent?"

---

In [None]:
# Calculate expected wins based on talent, chemistry, and both
expected_wins_talent = win_prob_talent.sum()
expected_wins_full = win_prob_full.sum()
actual_wins = team_df['win'].sum()

print("Win Decomposition Analysis")
print("="*60)
print(f"\nActual Wins: {actual_wins}")
print(f"\nExpected Wins:")
print(f"  Based on Talent Only:        {expected_wins_talent:.1f} wins")
print(f"  Based on Talent + Chemistry: {expected_wins_full:.1f} wins")
print(f"\nChemistry Contribution: {expected_wins_full - expected_wins_talent:+.1f} wins")
print(f"Luck/Residual: {actual_wins - expected_wins_full:+.1f} wins")

# Variance decomposition (R-squared)
from sklearn.metrics import r2_score

# Model 1: Talent only
logit_talent_only = LogisticRegression()
logit_talent_only.fit(team_df[['net_rating']].values, y_wins)
y_pred_talent = logit_talent_only.predict(team_df[['net_rating']].values)
accuracy_talent_only = (y_pred_talent == y_wins).mean()

# Model 2: Chemistry only
logit_chemistry_only = LogisticRegression()
logit_chemistry_only.fit(team_df[['estimated_chemistry_scaled']].values, y_wins)
y_pred_chemistry = logit_chemistry_only.predict(team_df[['estimated_chemistry_scaled']].values)
accuracy_chemistry_only = (y_pred_chemistry == y_wins).mean()

# Model 3: Both
accuracy_both = (y_pred == y_wins).mean()

print("\n" + "="*60)
print("Predictive Accuracy:")
print(f"  Talent Only:        {accuracy_talent_only:.1%}")
print(f"  Chemistry Only:     {accuracy_chemistry_only:.1%}")
print(f"  Talent + Chemistry: {accuracy_both:.1%}")
print(f"\nIncremental Value of Chemistry: {(accuracy_both - accuracy_talent_only):.1%}")

In [None]:
# Visualize win decomposition
fig, ax = plt.subplots(figsize=(10, 6))

categories = ['Talent\nOnly', 'Chemistry\nBoost', 'Luck/\nResidual', 'Actual\nWins']
values = [
    expected_wins_talent,
    expected_wins_full - expected_wins_talent,
    actual_wins - expected_wins_full,
    actual_wins
]
cumulative = np.cumsum([expected_wins_talent, 
                       expected_wins_full - expected_wins_talent,
                       actual_wins - expected_wins_full])

colors = ['steelblue', 'orange', 'gray']
ax.bar([0], [expected_wins_talent], color=colors[0], alpha=0.7, label='Talent')
ax.bar([1], [expected_wins_full - expected_wins_talent], color=colors[1], 
      alpha=0.7, label='Chemistry')
ax.bar([2], [actual_wins - expected_wins_full], color=colors[2], 
      alpha=0.7, label='Luck/Residual')
ax.bar([3], [actual_wins], color='black', alpha=0.7, label='Actual')

# Add value labels
for i, val in enumerate(values):
    ax.text(i, val + 1, f"{val:.1f}", ha='center', fontsize=12, fontweight='bold')

ax.set_xticks(range(4))
ax.set_xticklabels(categories, fontsize=11)
ax.set_ylabel('Wins', fontsize=12)
ax.set_title('Win Decomposition: Talent vs. Chemistry vs. Luck', 
            fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 7. Business Recommendations

### For Front Office

1. **Roster Construction**
   - Target "chemistry guys" (high $\alpha_i$) in free agency
   - Balance talent with chemistry fit
   - Avoid negative chemistry players even if talented
   - Use chemistry index to evaluate trade offers

2. **Trade Deadline Decisions**
   - Assess chemistry impact **before** trading key players
   - If trading away a chemistry guy ($\alpha_i > 0.1$), expect 2-3 win drop
   - Acquiring negative chemistry player? Ensure talent compensates

3. **Contract Negotiations**
   - Chemistry guys deserve premium (they add wins beyond stats)
   - Include chemistry metrics in player valuation models
   - Long-term deals for chemistry leaders reduce risk

4. **Draft Strategy**
   - Scout for chemistry indicators (teammate praise, leadership)
   - Interview teammates about player's chemistry impact
   - Prioritize chemistry in late rounds (high ROI)

---

### For Coaching Staff

5. **Lineup Optimization**
   - Build lineups maximizing $\sum_{i \in \text{Lineup}} \alpha_i$
   - Avoid pairing negative chemistry players
   - Start chemistry guys in high-pressure games

6. **Player Development**
   - Coach players on chemistry skills (communication, sacrifice)
   - Reward chemistry-building behaviors
   - Identify and address chemistry issues early

7. **In-Season Monitoring**
   - Track chemistry index weekly
   - Flag sudden drops (potential locker room issues)
   - Intervene when chemistry < 0.4 for 5+ games

---

### For Analytics Departments

8. **Automated Monitoring**
   - Update chemistry index after every game
   - Send alerts when chemistry drops sharply
   - Provide weekly chemistry reports to front office

9. **Player Valuation**
   - Incorporate chemistry into WAR (Wins Above Replacement)
   - Chemistry-adjusted player rankings
   - Trade value calculator including chemistry

10. **Research Questions**
    - Does chemistry predict playoff success?
    - Chemistry decay over time (roster turnover)
    - Position-specific chemistry effects (point guards as chemistry leaders?)

---

## 8. Production Deployment

### Real-Time Chemistry Dashboard

```python
# Pseudo-code for production deployment

class TeamChemistryMonitor:
    def __init__(self, team_id):
        self.team_id = team_id
        self.dfm = None  # Dynamic Factor Model
        self.player_effects = {}  # Player chemistry contributions
    
    def update_after_game(self, game_stats, lineup):
        """Update chemistry index after each game"""
        # Add new data
        self.data.append(game_stats)
        
        # Refit dynamic factor model (or online update)
        self.dfm.update(game_stats)
        
        # Update player effects
        self.player_effects = self.estimate_player_effects()
        
        # Get current chemistry
        current_chemistry = self.dfm.current_factor
        
        # Generate alert if needed
        if current_chemistry < 0.3:
            self.generate_alert('LOW_CHEMISTRY', current_chemistry)
        
        return {
            'chemistry_index': current_chemistry,
            'player_rankings': self.rank_players_by_chemistry(),
            'win_boost': self.estimate_chemistry_win_boost()
        }
    
    def simulate_trade_impact(self, players_out, players_in):
        """Estimate chemistry impact of a trade"""
        chemistry_lost = sum([self.player_effects[p] for p in players_out])
        chemistry_gained = sum([self.estimate_player_effect(p) for p in players_in])
        
        net_chemistry = chemistry_gained - chemistry_lost
        expected_win_change = net_chemistry * self.chemistry_to_wins_coefficient
        
        return {
            'chemistry_change': net_chemistry,
            'expected_win_change': expected_win_change,
            'recommendation': 'APPROVE' if expected_win_change > -2 else 'REJECT'
        }
```

---

### Key Metrics for Dashboard

1. **Chemistry Index**: Current team chemistry (0-1 scale)
2. **Chemistry Trend**: 10-game moving average
3. **Player Rankings**: Players ranked by chemistry contribution
4. **Win Boost**: Estimated wins added by chemistry vs. talent
5. **Alerts**: Low chemistry warnings, chemistry drops

---

### Integration with Existing Systems

- **Player Tracking Data**: Use spatial data to measure on-court chemistry (passes, spacing)
- **Lineup Optimization**: Feed chemistry into rotation algorithms
- **Trade Machine**: Add chemistry impact to trade evaluations
- **Scouting Reports**: Append chemistry assessments to player reports

---

### Model Retraining Schedule

- **Dynamic Factor Model**: Refit weekly (rolling 30-game window)
- **Player Effects**: Update after every 10 games
- **Chemistry-Win Relationship**: Refit monthly (evolves over season)

---

## 9. Summary

### What We Learned

1. **Dynamic Factor Models**
   - Extract latent chemistry from observable performance metrics
   - Factor loadings show which metrics respond to chemistry
   - Chemistry evolves dynamically throughout the season

2. **Player-Specific Chemistry**
   - Not all players contribute equally to team chemistry
   - "Chemistry guys" can be identified from lineup data
   - Chemistry effects can be quantified and ranked

3. **Chemistry-Talent Decomposition**
   - Team success = Talent + Chemistry + Luck
   - Chemistry adds ~2-5 wins per season
   - Chemistry matters more for close games and playoffs

---

### Business Impact

**Front Office**:
- Objective chemistry measurement for roster decisions
- Trade impact assessment (chemistry + talent)
- Player valuation beyond traditional stats

**Coaching Staff**:
- Chemistry-optimized lineups
- Early detection of locker room issues
- Player development targets

**Analytics Teams**:
- Quantify intangibles
- Predict team overperformance/underperformance
- Enhance player evaluation models

---

### Key Metrics

- **Chemistry Index**: 0-1 scale measuring team cohesion
- **Player Chemistry Contribution ($\alpha_i$)**: Individual player's chemistry effect
- **Factor Loadings ($\lambda_i$)**: How metrics respond to chemistry
- **Win Boost**: Expected wins from chemistry beyond talent

---

### When to Use This Method

‚úÖ **Use Dynamic Factor Models When**:
- Multiple observable metrics influenced by latent factor
- Want to extract common signal from noisy data
- Need to track unobservable quality over time

‚úÖ **Use Player-Specific Models When**:
- Need to attribute team outcomes to individuals
- Making roster decisions based on chemistry
- Evaluating trade impacts

‚úÖ **Use Chemistry-Talent Decomposition When**:
- Assessing team overperformance/underperformance
- Comparing teams with similar talent
- Explaining unexpected outcomes

---

### Limitations & Caveats

1. **Identification**: Chemistry is latent - we infer it from correlations
2. **Causality**: Chemistry ‚Üî Performance (bidirectional relationship)
3. **External Validity**: Model trained on one team may not generalize
4. **Measurement Error**: Observable metrics have noise
5. **Confounding**: Coaching, injuries, schedule also affect metrics

---

### Next Steps

1. **Extend to League-Wide Analysis**: Fit models for all 30 teams
2. **Playoff Chemistry**: Does chemistry matter more in playoffs?
3. **Chemistry Decay**: Model chemistry loss from roster turnover
4. **Network Analysis**: Use player interactions (passes, screens) as direct chemistry measures
5. **Experimental Validation**: Correlate with survey data, interviews

---

### Related Notebooks

- **Notebook 1**: Player Performance Trend Analysis (time series methods)
- **Notebook 2**: Career Longevity Modeling (survival analysis)
- **Notebook 3**: Coaching Change Causal Impact (causal inference)
- **Notebook 4**: Injury Recovery Tracking (regime-switching models)

---

### Further Reading

- Stock & Watson (2016): "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics"
- Bai & Ng (2002): "Determining the Number of Factors in Approximate Factor Models"
- Geweke (1977): "The Dynamic Factor Analysis of Economic Time Series"
- Sargent & Sims (1977): "Business Cycle Modeling without Pretending to Have Too Much A Priori Economic Theory"

---

**End of Notebook 5** üèÄüìäü§ù