# NBA Timeout Effectiveness Analysis

## Project Overview

### Course Information
- **Course**: DSA 210 Introduction to Data Science
- **Term**: 2024-2025 Spring
- **Institution**: [Your University Name]

### Project Team
- **Lead Researcher**: Orkun Yucel
- **Contributors**: [Add any team members]

### Abstract
This research investigates the strategic effectiveness of timeouts in NBA basketball games. By leveraging advanced data science techniques, we aim to quantify and predict the impact of timeouts on game momentum, providing insights for coaches and sports analysts.

### Research Objectives
1. Quantify the effectiveness of NBA timeouts
2. Identify key factors influencing timeout success
3. Develop a predictive model for timeout effectiveness

## 1. Data Preparation and Environment Setup

### 1.1 Required Libraries and Dependencies

In [None]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Statistical and machine learning libraries
from scipy import stats
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    classification_report, 
    confusion_matrix, 
    roc_curve, 
    roc_auc_score, 
    precision_recall_curve,
    f1_score
)

# Visualization and styling
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12

### 1.2 Data Loading and Initial Exploration

In [None]:
# Load dataset
def load_and_preprocess_data(filepath='dsa project/ml_ready_timeout_data.csv'):
    """
    Load and preprocess the NBA timeout effectiveness dataset
    """
    # Read the CSV file
    df = pd.read_csv(filepath)
    
    # Convert target variable to integer
    df['effective'] = df['effective'].astype(int)
    
    # Print dataset metadata
    print("🏀 Dataset Metadata 🏀")
    print(f"Total Observations: {len(df)}")
    print(f"Number of Features: {len(df.columns)}")
    
    # Display target variable distribution
    print("\n📊 Timeout Effectiveness Distribution:")
    effectiveness_dist = df['effective'].value_counts(normalize=True)
    print(effectiveness_dist)
    
    return df

## 2. Exploratory Data Analysis (EDA)

### 2.1 Descriptive Statistics

In [None]:
# Load data
df = load_and_preprocess_data()

# Detailed descriptive statistics
print("\n📈 Comprehensive Descriptive Statistics:")
desc_stats = df.describe()
print(desc_stats)

# Feature correlation matrix
plt.figure(figsize=(20, 16))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt='.2f', square=True)
plt.title('Feature Correlation Matrix', fontsize=16)
plt.tight_layout()
plt.savefig('dsa project/outputs/figures/correlation_matrix.png')
plt.close()

### 2.2 Comprehensive Visualization

In [None]:
def create_comprehensive_visualizations(df):
    """
    Generate a comprehensive set of visualizations for timeout effectiveness analysis
    """
    # Set up the figure grid
    fig, axes = plt.subplots(2, 3, figsize=(24, 16))
    fig.suptitle('NBA Timeout Effectiveness: Multidimensional Analysis', fontsize=20, fontweight='bold')
    
    # 1. Timeout Effectiveness Distribution
    effectiveness_counts = df['effective'].value_counts()
    axes[0, 0].bar(effectiveness_counts.index.map({0: 'Ineffective', 1: 'Effective'}), 
                    effectiveness_counts.values, 
                    color=['coral', 'teal'])
    axes[0, 0].set_title('Timeout Effectiveness Distribution')
    axes[0, 0].set_ylabel('Count')
    
    # 2. Box Plot of Key Features
    key_features = ['pre_timeout_oe', 'timeout_pressure_index']
    df_melted = df.melt(id_vars='effective', value_vars=key_features, 
                        var_name='Feature', value_name='Value')
    sns.boxplot(x='Feature', y='Value', hue='effective', 
                data=df_melted, ax=axes[0, 1], palette='Set2')
    axes[0, 1].set_title('Key Features by Timeout Effectiveness')
    
    # 3. Score Difference Distribution
    sns.violinplot(x='effective', y='score_diff', data=df, 
                   ax=axes[0, 2], palette='Set2')
    axes[0, 2].set_title('Score Difference Distribution')
    
    # 4. Pre-Timeout Metrics Scatter
    scatter = axes[1, 0].scatter(df['pre_timeout_fg_pct'], df['pre_timeout_ts'], 
                                  c=df['effective'], cmap='viridis', alpha=0.7)
    axes[1, 0].set_title('Pre-Timeout Field Goal vs True Shooting')
    axes[1, 0].set_xlabel('Pre-Timeout Field Goal %')
    axes[1, 0].set_ylabel('Pre-Timeout True Shooting %')
    plt.colorbar(scatter, ax=axes[1, 0], label='Effectiveness')
    
    # 5. Timeout Pressure Index Distribution
    sns.kdeplot(data=df, x='timeout_pressure_index', hue='effective', 
                fill=True, common_norm=False, ax=axes[1, 1], palette='Set2')
    axes[1, 1].set_title('Timeout Pressure Index Distribution')
    
    # 6. Quarter-wise Effectiveness
    quarter_effectiveness = df.groupby('quarter')['effective'].mean()
    quarter_effectiveness.plot(kind='bar', ax=axes[1, 2])
    axes[1, 2].set_title('Timeout Effectiveness by Quarter')
    axes[1, 2].set_xlabel('Quarter')
    axes[1, 2].set_ylabel('Effectiveness Rate')
    
    plt.tight_layout()
    plt.savefig('dsa project/outputs/figures/comprehensive_analysis.png')
    plt.close()

# Generate visualizations
create_comprehensive_visualizations(df)

## 3. Machine Learning Model

### 3.1 Model Training and Evaluation

In [None]:
def train_timeout_effectiveness_model(df):
    """
    Train and evaluate a Random Forest Classifier for timeout effectiveness prediction
    """
    # Prepare features and target
    features = [col for col in df.columns if col not in ['effective', 'efficiency_change']]
    X = df[features]
    y = df['effective']
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42, stratify=y
    )
    
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Train Random Forest
    rf_classifier = RandomForestClassifier(
        n_estimators=300, max_depth=10, random_state=42
    )
    rf_classifier.fit(X_train_scaled, y_train)
    
    # Predictions
    y_pred = rf_classifier.predict(X_test_scaled)
    y_proba = rf_classifier.predict_proba(X_test_scaled)[:, 1]
    
    # Print model evaluation metrics
    print("\n🏀 Model Performance Metrics 🏀")
    print(classification_report(y_test, y_pred))
    
    # ROC Curve
    plt.figure(figsize=(10, 8))
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    roc_auc = roc_auc_score(y_test, y_proba)
    
    plt.plot(fpr, tpr, color='darkorange', lw=2,
             label=f'ROC Curve (AUC = {roc_auc:.2f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend(loc="lower right")
    plt.savefig('dsa project/outputs/figures/roc_curve.png')
    plt.close()
    
    # Feature Importance
    feature_importance = pd.DataFrame({
        'feature': features,
        'importance': rf_classifier.feature_importances_
    }).sort_values('importance', ascending=False)
    
    plt.figure(figsize=(12, 8))
    plt.barh(feature_importance['feature'][:10], 
             feature_importance['importance'][:10],
             color='steelblue')
    plt.title('Top 10 Features Predicting Timeout Effectiveness')
    plt.xlabel('Feature Importance')
    plt.tight_layout()
    plt.savefig('dsa project/outputs/figures/feature_importance.png')
    plt.close()
    
    return rf_classifier, feature_importance

# Train the model
model, feature_importance = train_timeout_effectiveness_model(df)

### 4.1 Research Insights
```python
# Summarize key research findings
def summarize_findings(df, feature_importance, model):
    """
    Compile and print comprehensive research insights
    """
    print("\n🔍 Comprehensive Research Insights 🔍")
    
    # Timeout Effectiveness Overview
    total_timeouts = len(df)
    effective_timeouts = df['effective'].sum()
    effectiveness_rate = effective_timeouts / total_timeouts * 100
    
    print(f"\n1. Timeout Effectiveness Metrics:")
    print(f"   - Total Timeouts Analyzed: {total_timeouts}")
    print(f"   - Effective Timeouts: {effective_timeouts}")
    print(f"   - Overall Effectiveness Rate: {effectiveness_rate:.2f}%")
    
    # Top Predictive Features
    print("\n2. Top 5 Predictive Features:")
    for i, (feature, importance) in enumerate(feature_importance.head().values, 1):
        print(f"   {i}. {feature}: {importance*100:.2f}%")
    
    # Contextual Insights
    print("\n3. Contextual Insights:")
    # Quarter-wise effectiveness
    quarter_effectiveness = df.groupby('quarter')['effective'].mean() * 100
    print("   Quarter-wise Effectiveness Rates:")
    for quarter, rate in quarter_effectiveness.items():
        print(f"      - Quarter {quarter}: {rate:.2f}%")

# Generate insights
summarize_findings(df, feature_importance, model)
```

## 5. Practical Implications and Recommendations

### 5.1 Recommendations for Coaches
1. **Strategic Timeout Deployment**
   - Prioritize timeouts during critical momentum shifts
   - Pay special attention to timeout strategies in early quarters

2. **Performance Indicators**
   - Monitor pre-timeout offensive efficiency
   - Consider timeout pressure index when making decisions

### 5.2 Limitations and Future Research
- **Data Limitations**
  - Analysis based on specific NBA seasons
  - Potential unaccounted game complexities

- **Future Research Directions**
  1. Expand dataset across more seasons
  2. Incorporate player-level performance data
  3. Develop real-time timeout effectiveness prediction tool

## 6. Conclusion
This research provides data-driven insights into NBA timeout effectiveness, demonstrating the critical role of strategic decision-making in basketball. By quantifying the impact of timeouts, coaches can make more informed decisions to disrupt opponent momentum and potentially influence game outcomes.

## 7. References
1. [List any academic or sports analytics references]

## 8. Appendix
- Detailed code repository available at: [GitHub Repository Link]
- Raw data and additional visualizations available upon request

## Project Metadata
- **Course**: DSA 210 Introduction to Data Science
- **Term**: 2024-2025 Spring
- **Developed by**: Orkun Yucel
- **Date**: May 2025