# NBA Timeout Effect Analysis

This notebook analyzes the effect of timeouts on opponent scoring momentum in NBA games. We investigate whether timeouts effectively "stop the bleeding" by examining changes in offensive efficiency before and after a timeout is called during an opponent's scoring run.

## Research Question
Do timeouts called during an opponent's scoring run actually disrupt momentum by decreasing their offensive efficiency?

## Hypothesis
- **Null Hypothesis (H₀):** When the opponent team makes a scoring run and a timeout is called, the opponent's average offensive efficiency from the start of the period to the timeout equals its average offensive efficiency from the timeout to the end of the period.
- **Alternative Hypothesis (H₁):** Under the same scenario, the opponent's average offensive efficiency changes after a timeout is called.

## 1. Setup and Imports

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os
import warnings

# Suppress warnings for clean output
warnings.filterwarnings('ignore')

# Create folder for figures if it doesn't exist
os.makedirs("outputs/figures", exist_ok=True)

## 2. Data Loading and Cleaning

First, we load the timeout analysis data that was collected using the `data_collector.py` script. This dataset contains information about timeouts called during opponent scoring runs, including offensive efficiency metrics before and after each timeout.

In [None]:
def load_and_clean_data(file_path):
    """Load and clean timeout analysis data"""
    
    if not os.path.exists(file_path):
        print(f"Error: Data file {file_path} not found")
        return None
    
    print(f"Loading data from {file_path}...")
    
    # Load data
    df = pd.read_csv(file_path)
    
    # Check for empty DataFrame
    if df.empty:
        print("Data file is empty!")
        return df
    
    print(f"Loaded {len(df)} timeout records")
    
    # Data cleaning
    # Convert boolean columns
    if 'effective' in df.columns and df['effective'].dtype != bool:
        df['effective'] = df['effective'].astype(bool)
    if 'run_terminated' in df.columns and df['run_terminated'].dtype != bool:
        df['run_terminated'] = df['run_terminated'].astype(bool)
    
    # Handle missing values
    for col in df.columns:
        if df[col].dtype in [np.float64, np.int64]:
            missing = df[col].isna().sum()
            if missing > 0:
                print(f"Found {missing} missing values in '{col}' column, filled with zeros")
                df[col] = df[col].fillna(0)
    
    # Check and handle outliers
    for col in ['pre_timeout_oe', 'post_timeout_oe', 'efficiency_change']:
        if col in df.columns:
            mean = df[col].mean()
            std = df[col].std()
            outliers = df[(df[col] > mean + 3*std) | (df[col] < mean - 3*std)]
            if len(outliers) > 0:
                df[col] = df[col].clip(mean - 3*std, mean + 3*std)
    
    # Create run size categories
    if 'run_points' in df.columns and 'run_size_bin' not in df.columns:
        df['run_size_bin'] = pd.cut(
            df['run_points'],
            bins=[5, 8, 10, 12, 15, 20, 100],
            labels=['6-7', '8-9', '10-11', '12-14', '15-19', '20+']
        )
    
    # Create score difference bins
    if 'score_diff' in df.columns:
        df['score_situation'] = pd.cut(
            df['score_diff'],
            bins=[-100, -20, -10, -5, 0, 5, 10, 20, 100],
            labels=['Down 20+', 'Down 10-19', 'Down 5-9', 'Down 1-4', 
                   'Up 1-4', 'Up 5-9', 'Up 10-19', 'Up 20+']
        )
    
    return df

In [None]:
# Try both possible file paths
if os.path.exists('outputs/timeout_analysis_results.csv'):
    file_path = 'outputs/timeout_analysis_results.csv'
elif os.path.exists('outputs/timeout_analysis_results_partial.csv'):
    file_path = 'outputs/timeout_analysis_results_partial.csv'
else:
    file_path = 'timeout_analysis_results.csv'
results_df = load_and_clean_data(file_path)

In [None]:
# Display first few rows
results_df.head()

In [None]:
# Display summary statistics
results_df.describe()

## 3. Statistical Analysis
Now we'll perform statistical analysis on the data to determine if timeouts significantly affect opponent offensive efficiency.

In [None]:
from scipy import stats
def perform_statistical_analysis(df):
    analysis_results = {}
    total = len(df)
    effective = df['effective'].sum()
    analysis_results['overall'] = {'total_timeouts': total, 'effective_timeouts': effective, 'effectiveness_rate': effective/total}
    t_stat, p_val = stats.ttest_1samp(df['efficiency_change'], 0)
    analysis_results['t_test'] = {'t_statistic': t_stat, 'p_value': p_val, 'significant': p_val < 0.05}
    return analysis_results
analysis_results = perform_statistical_analysis(results_df)
analysis_results