# Causal Inference: Measuring Coaching Change Impact

**Business Question:** Did hiring a new coach actually improve team performance, or was it just chance?

**The Challenge:** Correlation ‚â† Causation. Teams often hire new coaches when struggling, so performance improvements might be:
- Natural regression to the mean
- Player roster changes
- Schedule difficulty changes
- Or... genuine coaching impact?

**What You'll Learn:**
- Propensity Score Matching (PSM) to find comparable control teams
- Difference-in-Differences (DiD) to isolate treatment effects
- Instrumental Variables (IV) to handle endogeneity
- Regression Discontinuity Design (RDD) for natural experiments

**Methods Covered:**
1. `CausalInferenceAnalyzer.propensity_score_matching()`
2. `CausalInferenceAnalyzer.difference_in_differences()`
3. `CausalInferenceAnalyzer.instrumental_variables()`
4. `CausalInferenceAnalyzer.regression_discontinuity()`

**Performance:** All methods <500ms

---

## 1. Setup & Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Import causal inference module
from mcp_server.causal_inference import CausalInferenceAnalyzer

# Set random seed for reproducibility
np.random.seed(42)

print("‚úì Imports successful")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Generate Team Performance Data with Coaching Changes

**Scenario:** 
- 30 NBA teams tracked over 2 seasons (164 games each)
- 5 teams hire new coaches at midseason (after game 41)
- We want to measure if coaching changes caused performance improvements

**Confounders to Control:**
- Pre-treatment performance (struggling teams more likely to change coaches)
- Team payroll (richer teams can hire better coaches)
- Player injuries
- Schedule strength

In [None]:
def generate_team_data(n_teams=30, n_games=164, n_treatment_teams=5):
    """
    Generate synthetic team data with coaching changes.
    """
    data = []
    
    # Select which teams get new coaches (struggling teams more likely)
    treatment_teams = []
    
    for team_id in range(n_teams):
        # Team characteristics (time-invariant)
        team_quality = np.random.normal(50, 10)  # True baseline win%
        payroll = np.random.normal(120, 30)  # Millions
        
        # Struggling teams more likely to change coaches
        coach_change_prob = 0.3 if team_quality < 45 else 0.05
        gets_new_coach = (len(treatment_teams) < n_treatment_teams and 
                         np.random.random() < coach_change_prob)
        
        if gets_new_coach:
            treatment_teams.append(team_id)
        
        # Generate game-by-game data
        for game in range(n_games):
            # Time period (pre/post coaching change)
            post_change = (game >= 41) if gets_new_coach else False
            
            # Performance determinants
            base_performance = team_quality
            payroll_effect = (payroll - 120) * 0.1  # Richer teams slightly better
            injury_shock = np.random.normal(0, 5)  # Random injuries
            
            # TRUE coaching effect (if any)
            # Let's say new coaches improve win% by 5 points on average
            coaching_effect = 5 if post_change else 0
            
            # Regression to mean (struggling teams tend to improve anyway)
            rtm_effect = (50 - team_quality) * 0.1 if game >= 41 else 0
            
            # Observed win% for this game
            win_pct = (base_performance + payroll_effect + injury_shock + 
                      coaching_effect + rtm_effect)
            win_pct = np.clip(win_pct, 0, 100)
            
            # Did they win?
            won = 1 if np.random.random() < (win_pct / 100) else 0
            
            data.append({
                'team_id': team_id,
                'game': game,
                'season_half': 'second' if game >= 41 else 'first',
                'treatment': 1 if gets_new_coach else 0,
                'post': 1 if post_change else 0,
                'won': won,
                'win_pct': win_pct,
                'team_quality': team_quality,
                'payroll': payroll,
                'injury_shock': injury_shock
            })
    
    df = pd.DataFrame(data)
    return df, treatment_teams

# Generate data
df, treatment_teams = generate_team_data(n_teams=30, n_games=82, n_treatment_teams=5)

print(f"Generated data for {df['team_id'].nunique()} teams over {df['game'].nunique()} games")
print(f"\nTeams with coaching changes: {len(treatment_teams)}")
print(f"Treatment team IDs: {treatment_teams}")
print(f"\nFirst few rows:")
print(df.head(10))

# Summary statistics
print(f"\nAverage win% by group:")
summary = df.groupby(['treatment', 'post']).agg({
    'won': 'mean',
    'team_id': 'nunique'
}).round(3)
summary.columns = ['Win Rate', 'N Teams']
print(summary)

## 3. Naive Comparison (Wrong!)

**Common Mistake:** Just compare before/after win rates for teams that changed coaches.

**Problem:** This confounds:
- True coaching effect
- Regression to the mean
- League-wide trends
- Selection bias (struggling teams more likely to change)

In [None]:
# Naive before/after comparison for treatment teams only
treatment_df = df[df['treatment'] == 1].copy()

naive_before = treatment_df[treatment_df['post'] == 0]['won'].mean()
naive_after = treatment_df[treatment_df['post'] == 1]['won'].mean()
naive_effect = naive_after - naive_before

print("‚ùå NAIVE ANALYSIS (Biased):")
print(f"   Win rate BEFORE coaching change: {naive_before:.3f}")
print(f"   Win rate AFTER coaching change:  {naive_after:.3f}")
print(f"   Naive effect estimate: {naive_effect:+.3f} ({naive_effect*82:+.1f} wins per season)")
print(f"\n   ‚ö†Ô∏è  This is BIASED because it ignores:")
print(f"      - Regression to the mean")
print(f"      - League-wide trends")
print(f"      - Selection bias")
print(f"\n   We need causal inference methods to get the TRUE effect!")

## 4. Method 1: Propensity Score Matching (PSM)

**Idea:** Match each treated team with a similar control team based on pre-treatment characteristics.

**How it works:**
1. Estimate probability (propensity) of treatment based on covariates
2. Match treated units to controls with similar propensities
3. Compare outcomes only among matched pairs

**Why it helps:** Controls for selection bias (struggling teams choosing to change coaches)

In [None]:
# Prepare data for PSM (need pre-treatment characteristics)
# Aggregate to team level with pre-treatment covariates
team_level = df.groupby('team_id').agg({
    'treatment': 'first',
    'team_quality': 'first',
    'payroll': 'first',
    'won': lambda x: x.iloc[:41].mean()  # Pre-treatment win rate
}).reset_index()
team_level.columns = ['team_id', 'treatment', 'team_quality', 'payroll', 'pre_win_rate']

# Post-treatment outcomes
post_outcomes = df[df['post'] == 1].groupby('team_id')['won'].mean().reset_index()
post_outcomes.columns = ['team_id', 'post_win_rate']
team_level = team_level.merge(post_outcomes, on='team_id')

print("Team-level data for PSM:")
print(team_level.head())

# Initialize causal inference analyzer
causal_analyzer = CausalInferenceAnalyzer(
    data=team_level,
    treatment_col='treatment',
    outcome_col='post_win_rate'
)

# Run propensity score matching
psm_result = causal_analyzer.propensity_score_matching(
    covariates=['pre_win_rate', 'team_quality', 'payroll'],
    method='nearest',
    caliper=0.2
)

print("\n" + "="*70)
print("‚úì PROPENSITY SCORE MATCHING RESULTS")
print("="*70)
print(f"\nAverage Treatment Effect (ATE): {psm_result['ate']:+.3f}")
print(f"95% Confidence Interval: [{psm_result['ci_lower']:+.3f}, {psm_result['ci_upper']:+.3f}]")
print(f"P-value: {psm_result['p_value']:.4f}")
print(f"\nMatched pairs: {psm_result['n_matched']}")
print(f"\nüìä Interpretation:")
if psm_result['p_value'] < 0.05:
    wins_per_season = psm_result['ate'] * 82
    print(f"   ‚úì Coaching change caused a {psm_result['ate']:+.3f} change in win rate")
    print(f"   ‚úì This translates to {wins_per_season:+.1f} wins per season")
    print(f"   ‚úì Effect is statistically significant (p={psm_result['p_value']:.4f})")
else:
    print(f"   ‚ö†Ô∏è  No significant effect detected (p={psm_result['p_value']:.4f})")
    print(f"   ‚ö†Ô∏è  Could be due to small sample size or genuine null effect")

print(f"\n‚ö° Performance: {psm_result['execution_time']*1000:.1f}ms")

## 5. Method 2: Difference-in-Differences (DiD)

**Idea:** Compare the change over time in treated groups vs. control groups.

**Formula:** 
```
DiD = (Treated_After - Treated_Before) - (Control_After - Control_Before)
```

**Why it helps:** 
- Controls for time-invariant team differences
- Controls for league-wide time trends
- Isolates the treatment effect

**Assumption:** Parallel trends (without treatment, both groups would have trended similarly)

In [None]:
# Prepare data for DiD (need panel structure)
# Aggregate to team-period level
did_data = df.groupby(['team_id', 'treatment', 'post']).agg({
    'won': 'mean',
    'team_quality': 'first',
    'payroll': 'first'
}).reset_index()

print("Panel data for DiD:")
print(did_data.head(10))

# Initialize causal inference analyzer with panel data
did_analyzer = CausalInferenceAnalyzer(
    data=did_data,
    treatment_col='treatment',
    outcome_col='won'
)

# Run difference-in-differences
did_result = did_analyzer.difference_in_differences(
    time_col='post',
    entity_col='team_id',
    covariates=['team_quality', 'payroll']
)

print("\n" + "="*70)
print("‚úì DIFFERENCE-IN-DIFFERENCES RESULTS")
print("="*70)
print(f"\nDiD Estimate: {did_result['did_estimate']:+.3f}")
print(f"Standard Error: {did_result['std_error']:.4f}")
print(f"95% Confidence Interval: [{did_result['ci_lower']:+.3f}, {did_result['ci_upper']:+.3f}]")
print(f"P-value: {did_result['p_value']:.4f}")

# Show the 2x2 table
print(f"\nüìä DiD Decomposition:")
means = did_data.groupby(['treatment', 'post'])['won'].mean().unstack()
print("\n        Before    After     Change")
print(f"Treat:  {means.loc[1, 0]:.3f}    {means.loc[1, 1]:.3f}    {means.loc[1, 1] - means.loc[1, 0]:+.3f}")
print(f"Control:{means.loc[0, 0]:.3f}    {means.loc[0, 1]:.3f}    {means.loc[0, 1] - means.loc[0, 0]:+.3f}")
print(f"                           DiD: {did_result['did_estimate']:+.3f}")

print(f"\nüí° Interpretation:")
if did_result['p_value'] < 0.05:
    wins_per_season = did_result['did_estimate'] * 82
    print(f"   ‚úì After controlling for time trends, coaching change caused")
    print(f"     a {did_result['did_estimate']:+.3f} improvement in win rate")
    print(f"   ‚úì This equals {wins_per_season:+.1f} additional wins per season")
    print(f"   ‚úì Statistically significant (p={did_result['p_value']:.4f})")
else:
    print(f"   ‚ö†Ô∏è  No significant effect after controlling for time trends")

print(f"\n‚ö° Performance: {did_result['execution_time']*1000:.1f}ms")

## 6. Visualize DiD: Parallel Trends

**Key Assumption Check:** Do treatment and control groups have parallel trends before treatment?

If trends diverge before treatment, DiD estimates are biased.

In [None]:
# Calculate rolling win rates over time
rolling_data = []
for team_id in df['team_id'].unique():
    team_data = df[df['team_id'] == team_id].sort_values('game')
    treatment = team_data['treatment'].iloc[0]
    
    # 10-game rolling average
    team_data['rolling_win_rate'] = team_data['won'].rolling(window=10, min_periods=1).mean()
    
    rolling_data.append(team_data)

rolling_df = pd.concat(rolling_data)

# Average by treatment group and game
trends = rolling_df.groupby(['game', 'treatment'])['rolling_win_rate'].mean().reset_index()

# Plot
fig, ax = plt.subplots(figsize=(14, 7))

# Treatment group
treatment_trend = trends[trends['treatment'] == 1]
ax.plot(treatment_trend['game'], treatment_trend['rolling_win_rate'], 
        'b-', linewidth=2.5, label='Treatment (New Coach)', marker='o', markersize=3)

# Control group
control_trend = trends[trends['treatment'] == 0]
ax.plot(control_trend['game'], control_trend['rolling_win_rate'], 
        'r-', linewidth=2.5, label='Control (Same Coach)', marker='s', markersize=3)

# Mark treatment time
ax.axvline(x=41, color='gray', linestyle='--', linewidth=2, label='Coaching Change')
ax.text(41, ax.get_ylim()[1]*0.95, 'Treatment\nStarts', 
        ha='center', va='top', fontsize=11, fontweight='bold')

# Annotations
ax.fill_between([0, 41], *ax.get_ylim(), alpha=0.1, color='gray', label='Pre-Treatment')
ax.fill_between([41, 82], *ax.get_ylim(), alpha=0.1, color='yellow', label='Post-Treatment')

ax.set_xlabel('Game Number', fontsize=12)
ax.set_ylabel('Win Rate (10-game rolling average)', fontsize=12)
ax.set_title('Difference-in-Differences: Parallel Trends Check', fontsize=14, fontweight='bold')
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_ylim([0.3, 0.7])

plt.tight_layout()
plt.show()

print("\nüìä Parallel Trends Assessment:")
print("   ‚úì Before treatment (games 0-40): Check if lines are parallel")
print("   ‚úì After treatment (games 41-82): Any divergence = treatment effect")
print("   ‚ö†Ô∏è  If pre-treatment trends differ, DiD assumption violated")

## 7. Method 3: Instrumental Variables (IV)

**Problem:** What if coaching changes are endogenous?
- Good coaches go to good teams
- Or teams hire new coaches after getting better players

**Solution:** Find an "instrument" - a variable that:
1. Affects treatment (coaching change)
2. Doesn't directly affect outcome (only through treatment)

**Example Instrument:** Coach contract expiration
- Affects likelihood of coaching change (contractual reasons)
- Doesn't directly affect team performance (just timing)

**Note:** This is advanced - often hard to find good instruments!

In [None]:
# Generate instrument: coach contract expiration
# (Random but correlated with coaching changes)
team_level['contract_expiring'] = np.random.binomial(1, 0.3, len(team_level))
# Make instrument correlated with treatment
team_level.loc[team_level['treatment'] == 1, 'contract_expiring'] = np.random.binomial(1, 0.7, 
                                                                                        (team_level['treatment'] == 1).sum())

print("Instrument correlation with treatment:")
print(pd.crosstab(team_level['contract_expiring'], team_level['treatment'], normalize='index'))

# Initialize IV analyzer
iv_analyzer = CausalInferenceAnalyzer(
    data=team_level,
    treatment_col='treatment',
    outcome_col='post_win_rate'
)

# Run instrumental variables estimation
iv_result = iv_analyzer.instrumental_variables(
    instrument='contract_expiring',
    covariates=['pre_win_rate', 'payroll']
)

print("\n" + "="*70)
print("‚úì INSTRUMENTAL VARIABLES (2SLS) RESULTS")
print("="*70)
print(f"\nIV Estimate (LATE): {iv_result['iv_estimate']:+.3f}")
print(f"Standard Error: {iv_result['std_error']:.4f}")
print(f"95% Confidence Interval: [{iv_result['ci_lower']:+.3f}, {iv_result['ci_upper']:+.3f}]")
print(f"P-value: {iv_result['p_value']:.4f}")

print(f"\nüìä First Stage (Instrument ‚Üí Treatment):")
print(f"   F-statistic: {iv_result['first_stage_f']:.2f}")
if iv_result['first_stage_f'] > 10:
    print(f"   ‚úì Strong instrument (F > 10)")
else:
    print(f"   ‚ö†Ô∏è  Weak instrument (F < 10) - IV estimates unreliable")

print(f"\nüí° Interpretation:")
print(f"   Local Average Treatment Effect (LATE):")
print(f"   For teams induced to change coaches due to contract expiration,")
print(f"   the effect is {iv_result['iv_estimate']:+.3f} change in win rate")
print(f"   ({iv_result['iv_estimate']*82:+.1f} wins per season)")

print(f"\n‚ö° Performance: {iv_result['execution_time']*1000:.1f}ms")

## 8. Method 4: Regression Discontinuity Design (RDD)

**Scenario:** Teams fire coaches if win rate falls below a threshold (e.g., 40%).

**Idea:** Compare teams just above vs. just below the threshold.
- Teams at 39% vs. 41% win rate are very similar
- But one group gets treatment (coaching change), other doesn't
- Acts as a "natural experiment"

**Key Assumption:** No manipulation of the running variable (win rate)

**Advantage:** Very credible if threshold exists

In [None]:
# Create "firing threshold" scenario
# Teams below 40% win rate in first half get new coaches
team_level['running_var'] = team_level['pre_win_rate']  # This is our running variable
cutoff = 0.40

# Simulate sharp RDD: Treatment = 1 if below cutoff
team_level['rdd_treatment'] = (team_level['running_var'] < cutoff).astype(int)

print(f"Regression Discontinuity Setup:")
print(f"   Cutoff: {cutoff:.2f} (40% win rate)")
print(f"   Teams below cutoff: {team_level['rdd_treatment'].sum()}")
print(f"   Teams above cutoff: {(1-team_level['rdd_treatment']).sum()}")

# Initialize RDD analyzer
rdd_analyzer = CausalInferenceAnalyzer(
    data=team_level,
    treatment_col='rdd_treatment',
    outcome_col='post_win_rate'
)

# Run regression discontinuity
rdd_result = rdd_analyzer.regression_discontinuity(
    running_var='running_var',
    cutoff=cutoff,
    bandwidth=0.10  # Only use teams within 10 percentage points of cutoff
)

print("\n" + "="*70)
print("‚úì REGRESSION DISCONTINUITY RESULTS")
print("="*70)
print(f"\nRDD Estimate (LATE at cutoff): {rdd_result['rdd_estimate']:+.3f}")
print(f"Standard Error: {rdd_result['std_error']:.4f}")
print(f"95% Confidence Interval: [{rdd_result['ci_lower']:+.3f}, {rdd_result['ci_upper']:+.3f}]")
print(f"P-value: {rdd_result['p_value']:.4f}")

print(f"\nBandwidth: {rdd_result['bandwidth']:.3f}")
print(f"Observations in bandwidth: {rdd_result['n_obs']}")

print(f"\nüí° Interpretation:")
print(f"   At the firing threshold ({cutoff:.0%} win rate),")
print(f"   getting a new coach causes a {rdd_result['rdd_estimate']:+.3f} change")
print(f"   in subsequent win rate ({rdd_result['rdd_estimate']*82:+.1f} wins per season)")
print(f"\n   ‚úì Highly credible if threshold is real and not manipulated")

print(f"\n‚ö° Performance: {rdd_result['execution_time']*1000:.1f}ms")

## 9. Visualize RDD: Discontinuity at Cutoff

In [None]:
fig, ax = plt.subplots(figsize=(14, 7))

# Scatter plot: pre-treatment win rate vs. post-treatment win rate
below_cutoff = team_level[team_level['running_var'] < cutoff]
above_cutoff = team_level[team_level['running_var'] >= cutoff]

ax.scatter(below_cutoff['running_var'], below_cutoff['post_win_rate'], 
          color='red', s=100, alpha=0.6, label='Treatment (New Coach)', marker='o')
ax.scatter(above_cutoff['running_var'], above_cutoff['post_win_rate'], 
          color='blue', s=100, alpha=0.6, label='Control (Same Coach)', marker='s')

# Fit lines on each side of cutoff
from numpy.polynomial import Polynomial

# Below cutoff
below_x = below_cutoff['running_var'].values
below_y = below_cutoff['post_win_rate'].values
if len(below_x) > 2:
    p_below = Polynomial.fit(below_x, below_y, deg=1)
    x_below = np.linspace(below_x.min(), cutoff, 100)
    ax.plot(x_below, p_below(x_below), 'r-', linewidth=3, alpha=0.8)

# Above cutoff
above_x = above_cutoff['running_var'].values
above_y = above_cutoff['post_win_rate'].values
if len(above_x) > 2:
    p_above = Polynomial.fit(above_x, above_y, deg=1)
    x_above = np.linspace(cutoff, above_x.max(), 100)
    ax.plot(x_above, p_above(x_above), 'b-', linewidth=3, alpha=0.8)

# Mark cutoff and discontinuity
ax.axvline(x=cutoff, color='black', linestyle='--', linewidth=2, label='Firing Threshold')

# Show jump at cutoff
if len(below_x) > 2 and len(above_x) > 2:
    y_below_at_cutoff = p_below(cutoff)
    y_above_at_cutoff = p_above(cutoff)
    jump = y_above_at_cutoff - y_below_at_cutoff
    
    ax.plot([cutoff, cutoff], [y_below_at_cutoff, y_above_at_cutoff], 
           'g-', linewidth=4, label=f'RDD Effect = {jump:+.3f}')

ax.set_xlabel('Pre-Treatment Win Rate (Running Variable)', fontsize=12)
ax.set_ylabel('Post-Treatment Win Rate (Outcome)', fontsize=12)
ax.set_title('Regression Discontinuity: Coaching Change Effect at Firing Threshold', 
            fontsize=14, fontweight='bold')
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nüìä RDD Visualization:")
print("   ‚úì Discontinuity (jump) at cutoff = causal effect")
print("   ‚úì If no jump, no effect of coaching change")
print("   ‚úì Slope changes = heterogeneous treatment effects")

## 10. Compare All Methods

**Which method is best?**

It depends on:
- Your data structure
- Available instruments
- Plausibility of assumptions
- Threat of endogeneity

In [None]:
# Create comparison table
comparison = pd.DataFrame([
    {
        'Method': 'Naive Comparison',
        'Estimate': naive_effect,
        'Wins/Season': naive_effect * 82,
        'P-value': 'N/A',
        'Bias': '‚ùå High',
        'Assumptions': 'None (wrong)'
    },
    {
        'Method': 'PSM',
        'Estimate': psm_result['ate'],
        'Wins/Season': psm_result['ate'] * 82,
        'P-value': f"{psm_result['p_value']:.4f}",
        'Bias': '‚úì Low',
        'Assumptions': 'Selection on observables'
    },
    {
        'Method': 'DiD',
        'Estimate': did_result['did_estimate'],
        'Wins/Season': did_result['did_estimate'] * 82,
        'P-value': f"{did_result['p_value']:.4f}",
        'Bias': '‚úì Low',
        'Assumptions': 'Parallel trends'
    },
    {
        'Method': 'IV (2SLS)',
        'Estimate': iv_result['iv_estimate'],
        'Wins/Season': iv_result['iv_estimate'] * 82,
        'P-value': f"{iv_result['p_value']:.4f}",
        'Bias': '‚úì Low',
        'Assumptions': 'Valid instrument'
    },
    {
        'Method': 'RDD',
        'Estimate': rdd_result['rdd_estimate'],
        'Wins/Season': rdd_result['rdd_estimate'] * 82,
        'P-value': f"{rdd_result['p_value']:.4f}",
        'Bias': '‚úì‚úì Very Low',
        'Assumptions': 'No manipulation at cutoff'
    }
])

print("="*90)
print("CAUSAL INFERENCE METHOD COMPARISON")
print("="*90)
print(comparison.to_string(index=False))

print("\n" + "="*90)
print("üìä KEY INSIGHTS")
print("="*90)

print("\n1Ô∏è‚É£  NAIVE COMPARISON (‚ùå DON'T USE):")
print("   Confounded by regression to mean, selection bias, time trends")
print(f"   Overestimates effect: {naive_effect:+.3f} vs true ~0.05")

print("\n2Ô∏è‚É£  PSM (‚úì Good for Cross-Sectional Data):")
print("   Controls for selection on observables")
print("   Assumes: No unobserved confounders")
print("   Best for: Post-treatment comparisons with rich covariates")

print("\n3Ô∏è‚É£  DiD (‚úì‚úì Gold Standard for Panel Data):")
print("   Controls for time-invariant confounders + time trends")
print("   Assumes: Parallel trends (testable!)")
print("   Best for: Before/after with control group")

print("\n4Ô∏è‚É£  IV (‚úì Handles Endogeneity):")
print("   Corrects for unobserved confounders")
print("   Assumes: Valid instrument exists (hard to find!)")
print("   Best for: When treatment is endogenous")
print(f"   Check: First-stage F = {iv_result['first_stage_f']:.1f} (need >10)")

print("\n5Ô∏è‚É£  RDD (‚úì‚úì‚úì Most Credible if Threshold Exists):")
print("   Uses discontinuity as natural experiment")
print("   Assumes: No manipulation of running variable")
print("   Best for: Clear cutoff rules (contracts, performance thresholds)")
print("   Limitation: Only estimates effect AT the cutoff (LATE)")

print("\n" + "="*90)
print("üéØ RECOMMENDATION")
print("="*90)
print("Use MULTIPLE methods as robustness checks:")
print("  ‚Ä¢ If estimates agree ‚Üí Confident in causal effect")
print("  ‚Ä¢ If estimates differ ‚Üí Investigate why (violations of assumptions?)")
print("  ‚Ä¢ Always prefer DiD or RDD when applicable")
print("  ‚Ä¢ Report all methods for transparency")
print("="*90)

## 11. Business Recommendations

**Question:** Should we fire our coach if the team is struggling?

**Analysis Summary:**

In [None]:
print("="*70)
print("EXECUTIVE SUMMARY: COACHING CHANGE IMPACT ANALYSIS")
print("="*70)

avg_causal_effect = np.mean([psm_result['ate'], did_result['did_estimate'], 
                             iv_result['iv_estimate'], rdd_result['rdd_estimate']])
avg_wins = avg_causal_effect * 82

print(f"\nüìä CAUSAL EFFECT ESTIMATE:")
print(f"   Average across 4 methods: {avg_causal_effect:+.3f} change in win rate")
print(f"   Translates to: {avg_wins:+.1f} wins per 82-game season")

if avg_causal_effect > 0.03:
    print(f"\n‚úÖ RECOMMENDATION: COACHING CHANGES ARE EFFECTIVE")
    print(f"   ‚Ä¢ Hiring a new coach improves performance by ~{avg_wins:.0f} wins/season")
    print(f"   ‚Ä¢ Cost-benefit: If coach salary < value of {avg_wins:.0f} wins, do it!")
    print(f"   ‚Ä¢ Effect is statistically significant across multiple methods")
elif avg_causal_effect < -0.03:
    print(f"\n‚ö†Ô∏è  WARNING: COACHING CHANGES MAY HURT PERFORMANCE")
    print(f"   ‚Ä¢ Teams lose ~{abs(avg_wins):.0f} wins after coaching changes")
    print(f"   ‚Ä¢ Possible reasons: Disruption, learning curve, wrong hires")
    print(f"   ‚Ä¢ RECOMMENDATION: Avoid mid-season changes unless absolutely necessary")
else:
    print(f"\n‚öñÔ∏è  FINDING: COACHING CHANGES HAVE MINIMAL EFFECT")
    print(f"   ‚Ä¢ Effect is small and possibly not statistically significant")
    print(f"   ‚Ä¢ Much of the 'improvement' after coach firings is regression to mean")
    print(f"   ‚Ä¢ RECOMMENDATION: Focus on player development and roster changes instead")

print(f"\nüéØ KEY TAKEAWAYS:")
print(f"   1. Naive comparisons OVERESTIMATE coaching effects")
print(f"   2. Must control for: selection bias, time trends, regression to mean")
print(f"   3. Use multiple causal inference methods for robust conclusions")
print(f"   4. Check assumptions (parallel trends, instrument validity, etc.)")

print(f"\nüíº BUSINESS IMPLICATIONS:")
if avg_wins > 5:
    print(f"   ‚Ä¢ Strong case for coaching change if underperforming")
    print(f"   ‚Ä¢ {avg_wins:.0f} extra wins could mean playoffs (~$10-20M revenue)")
    print(f"   ‚Ä¢ Invest in thorough coach search process")
elif avg_wins > 2:
    print(f"   ‚Ä¢ Moderate benefit from coaching changes")
    print(f"   ‚Ä¢ Worth doing if coach relationship is broken")
    print(f"   ‚Ä¢ But don't expect miracles - roster quality still dominates")
else:
    print(f"   ‚Ä¢ Minimal or no benefit from coaching changes")
    print(f"   ‚Ä¢ Better to invest in player development, scouting, analytics")
    print(f"   ‚Ä¢ Only change coach if culture/relationship issues exist")

print("\n" + "="*70)
print("‚úì Causal inference reveals the TRUE effect, beyond naive correlation")
print("="*70)

## 12. Summary: When to Use Each Method

| Method | Best Use Case | Key Assumption | Difficulty |
|--------|--------------|----------------|------------|
| **PSM** | Cross-sectional comparison | No unobserved confounders | ‚≠ê‚≠ê Easy |
| **DiD** | Panel data with treatment timing | Parallel trends | ‚≠ê‚≠ê Easy |
| **IV** | Endogenous treatment | Valid instrument | ‚≠ê‚≠ê‚≠ê‚≠ê Hard |
| **RDD** | Sharp cutoff rule exists | No manipulation at cutoff | ‚≠ê‚≠ê‚≠ê Medium |

### Performance
All methods run in **<500ms** - suitable for interactive analysis.

### Next Steps
- Try survival analysis for career longevity (`notebooks/05_survival_analysis.ipynb`)
- Explore ensemble forecasting for playoff predictions
- See `docs/QUICK_REFERENCE.md` for all available methods

---

**Key Lesson:** Always think causally! Correlation is not causation. Use proper causal inference methods to make better decisions.