# Defensive Blitz Prediction - Proof of Concept

This notebook demonstrates that we can predict when NFL defenses will blitz with **statistical significance**.

**Key Question:** Can we predict defensive blitz plays better than just guessing the majority class?

**What we'll show:**
1. Load 106,796 real NFL plays
2. Build a baseline model (always predict "no blitz")
3. Build a real model (Random Forest)
4. Compare accuracy, precision, recall
5. Visualize what features matter most
6. Show real predictions on actual game situations

## 1. Setup & Load Data

In [1]:
import sys
from pathlib import Path
sys.path.insert(0, str(Path.cwd().parent))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, roc_auc_score, classification_report
)

# Style settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("‚úì Libraries loaded")

‚úì Libraries loaded


In [6]:
# Load the cleaned data
data_path = Path("../data/processed/blitz_data_cleaned.csv")
df = pd.read_csv(data_path)

print(f"Dataset loaded: {df.shape[0]:,} plays √ó {df.shape[1]} features")
print(f"\nTarget variable distribution:")
print(f"  No Blitz (0): {(df['blitz'] == 0).sum():,} plays ({(df['blitz'] == 0).sum() / len(df) * 100:.1f}%)")
print(f"  Blitz (1):    {(df['blitz'] == 1).sum():,} plays ({(df['blitz'] == 1).sum() / len(df) * 100:.1f}%)")
print(f"\nFeatures: {list(df.columns)}")

Dataset loaded: 106,796 plays √ó 12 features

Target variable distribution:
  No Blitz (0): 89,747 plays (84.0%)
  Blitz (1):    17,049 plays (16.0%)

Features: ['down', 'ydstogo', 'yardline_100', 'quarter', 'game_seconds_remaining', 'score_differential', 'offense_personnel', 'defense_personnel', 'formation', 'shotgun', 'motion', 'blitz']


In [7]:
# Handle remaining NaN values
initial_rows = len(df)
df = df.dropna()
rows_dropped = initial_rows - len(df)

print(f"‚úì Dropped {rows_dropped:,} rows with NaN values")
print(f"  Dataset now: {len(df):,} plays √ó {df.shape[1]} features")

‚úì Dropped 411 rows with NaN values
  Dataset now: 106,385 plays √ó 12 features


## 2. Prepare Data for Modeling

In [8]:
# Separate target and features
y = df['blitz']
X = df.drop('blitz', axis=1)

# Encode categorical variables
categorical_cols = X.select_dtypes(include=['object']).columns
label_encoders = {}

for col in categorical_cols:
    le = LabelEncoder()
    X[col] = le.fit_transform(X[col].astype(str))
    label_encoders[col] = le

print(f"‚úì Encoded {len(categorical_cols)} categorical features: {list(categorical_cols)}")

# Split data: 80% train, 20% test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTrain/Test Split:")
print(f"  Training set: {len(X_train):,} plays")
print(f"  Test set:     {len(X_test):,} plays")

‚úì Encoded 3 categorical features: ['offense_personnel', 'defense_personnel', 'formation']

Train/Test Split:
  Training set: 85,108 plays
  Test set:     21,277 plays


## 3. Baseline Model (The Dumb Approach)

**What is the baseline?** A model that always predicts the majority class ("no blitz").

Since 84% of plays are no-blitz, a dumb model that always guesses "no blitz" gets 84% accuracy.

**Our goal:** Beat this baseline.

In [9]:
# Baseline: always predict the majority class (0 = no blitz)
y_test_baseline = np.zeros_like(y_test)

baseline_accuracy = accuracy_score(y_test, y_test_baseline)
baseline_precision = precision_score(y_test, y_test_baseline, zero_division=0)
baseline_recall = recall_score(y_test, y_test_baseline, zero_division=0)

print("BASELINE MODEL: Always predict 'No Blitz'")
print("=" * 50)
print(f"Accuracy:  {baseline_accuracy:.1%}")
print(f"Precision: {baseline_precision:.1%} (not applicable - never predicts blitz)")
print(f"Recall:    {baseline_recall:.1%} (not applicable - never predicts blitz)")
print("\n‚Üí This is our floor. Our real model must beat this.")

BASELINE MODEL: Always predict 'No Blitz'
Accuracy:  84.0%
Precision: 0.0% (not applicable - never predicts blitz)
Recall:    0.0% (not applicable - never predicts blitz)

‚Üí This is our floor. Our real model must beat this.


## 4. Train Models

We'll train two models:
1. **Logistic Regression** - simple, interpretable
2. **Random Forest** - more complex, usually more accurate

In [10]:
# Train Logistic Regression
print("Training Logistic Regression...")
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
y_pred_proba_lr = lr_model.predict_proba(X_test)[:, 1]

print("‚úì Logistic Regression trained")

Training Logistic Regression...
‚úì Logistic Regression trained


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT

Increase the number of iterations to improve the convergence (max_iter=1000).
You might also want to scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [None]:
# Train Random Forest
print("Training Random Forest...")
rf_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
y_pred_proba_rf = rf_model.predict_proba(X_test)[:, 1]

print("‚úì Random Forest trained")

## 5. Compare Models

In [None]:
# Calculate metrics for both models
lr_accuracy = accuracy_score(y_test, y_pred_lr)
lr_precision = precision_score(y_test, y_pred_lr, zero_division=0)
lr_recall = recall_score(y_test, y_pred_lr, zero_division=0)
lr_f1 = f1_score(y_test, y_pred_lr, zero_division=0)
lr_auc = roc_auc_score(y_test, y_pred_proba_lr)

rf_accuracy = accuracy_score(y_test, y_pred_rf)
rf_precision = precision_score(y_test, y_pred_rf, zero_division=0)
rf_recall = recall_score(y_test, y_pred_rf, zero_division=0)
rf_f1 = f1_score(y_test, y_pred_rf, zero_division=0)
rf_auc = roc_auc_score(y_test, y_pred_proba_rf)

# Build comparison table
comparison = pd.DataFrame({
    'Model': ['Baseline', 'Logistic Regression', 'Random Forest'],
    'Accuracy': [baseline_accuracy, lr_accuracy, rf_accuracy],
    'Precision': [0, lr_precision, rf_precision],
    'Recall': [0, lr_recall, rf_recall],
    'F1 Score': [0, lr_f1, rf_f1],
    'ROC-AUC': [0.5, lr_auc, rf_auc]
})

print("\nMODEL COMPARISON")
print("=" * 80)
print(comparison.to_string(index=False))
print("\nüìä What these metrics mean:")
print("  - Accuracy: Overall correctness (but misleading with imbalanced data)")
print("  - Precision: Of predicted blitz, how many actually were blitz?")
print("  - Recall: Of actual blitz plays, how many did we catch?")
print("  - F1 Score: Harmonic mean of precision and recall")
print("  - ROC-AUC: How well does the model rank blitz vs no-blitz?")

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy comparison
ax1 = axes[0]
models = ['Baseline', 'Logistic\nRegression', 'Random\nForest']
accuracies = [baseline_accuracy, lr_accuracy, rf_accuracy]
colors = ['#ff9999', '#ffcc99', '#99cc99']

bars1 = ax1.bar(models, accuracies, color=colors, edgecolor='black', linewidth=2)
ax1.axhline(y=baseline_accuracy, color='red', linestyle='--', linewidth=2, label='Baseline')
ax1.set_ylabel('Accuracy', fontsize=12, fontweight='bold')
ax1.set_title('Model Accuracy Comparison', fontsize=14, fontweight='bold')
ax1.set_ylim([0.7, 0.95])
for i, v in enumerate(accuracies):
    ax1.text(i, v + 0.01, f'{v:.1%}', ha='center', fontweight='bold')

# Precision vs Recall
ax2 = axes[1]
precision_vals = [lr_precision, rf_precision]
recall_vals = [lr_recall, rf_recall]
model_names = ['Logistic Regression', 'Random Forest']

x_pos = np.arange(len(model_names))
width = 0.35

bars2 = ax2.bar(x_pos - width/2, precision_vals, width, label='Precision', color='#99ccff', edgecolor='black', linewidth=1.5)
bars3 = ax2.bar(x_pos + width/2, recall_vals, width, label='Recall', color='#cc99ff', edgecolor='black', linewidth=1.5)

ax2.set_ylabel('Score', fontsize=12, fontweight='bold')
ax2.set_title('Precision vs Recall (Blitz Detection)', fontsize=14, fontweight='bold')
ax2.set_xticks(x_pos)
ax2.set_xticklabels(model_names)
ax2.legend(fontsize=11)
ax2.set_ylim([0, 1])

for i, (p, r) in enumerate(zip(precision_vals, recall_vals)):
    ax2.text(i - width/2, p + 0.02, f'{p:.1%}', ha='center', fontweight='bold', fontsize=10)
    ax2.text(i + width/2, r + 0.02, f'{r:.1%}', ha='center', fontweight='bold', fontsize=10)

plt.tight_layout()
plt.show()

print("‚úì Visualization complete")

## 6. Feature Importance (What Actually Predicts Blitz?)

**Question:** Which of our 11 features actually matter for predicting blitz?

In [None]:
# Get feature importance from Random Forest
feature_importance = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nFEATURE IMPORTANCE (Random Forest)")
print("=" * 50)
for idx, row in feature_importance.iterrows():
    print(f"{row['Feature']:.<25} {row['Importance']:.4f} {'‚ñà' * int(row['Importance'] * 100)}")

print("\nüìç Key Insight: These features drive blitz predictions:")
for idx, row in feature_importance.head(5).iterrows():
    print(f"  {idx+1}. {row['Feature']}")

In [None]:
# Visualize feature importance
fig, ax = plt.subplots(figsize=(10, 6))

top_features = feature_importance.head(10)
colors_importance = plt.cm.viridis(np.linspace(0.3, 0.9, len(top_features)))

bars = ax.barh(range(len(top_features)), top_features['Importance'].values, color=colors_importance, edgecolor='black', linewidth=1.5)
ax.set_yticks(range(len(top_features)))
ax.set_yticklabels(top_features['Feature'].values)
ax.set_xlabel('Importance Score', fontsize=12, fontweight='bold')
ax.set_title('Top 10 Features Predicting Blitz Defense', fontsize=14, fontweight='bold')
ax.invert_yaxis()

for i, v in enumerate(top_features['Importance'].values):
    ax.text(v + 0.005, i, f'{v:.4f}', va='center', fontweight='bold')

plt.tight_layout()
plt.show()

print("‚úì Feature importance visualization complete")

## 7. Confusion Matrix (What Are We Getting Right/Wrong?)

In [None]:
# Calculate confusion matrix for Random Forest
cm = confusion_matrix(y_test, y_pred_rf)

print("\nCONFUSION MATRIX (Random Forest on Test Set)")
print("=" * 50)
print(f"\n  True Negatives:  {cm[0,0]:>6,}  (correctly predicted NO BLITZ)")
print(f"  False Positives: {cm[0,1]:>6,}  (predicted BLITZ but wasn't)")
print(f"  False Negatives: {cm[1,0]:>6,}  (missed actual BLITZ)")
print(f"  True Positives:  {cm[1,1]:>6,}  (correctly predicted BLITZ)")

# Calculate rates
true_neg_rate = cm[0,0] / (cm[0,0] + cm[0,1])
false_pos_rate = cm[0,1] / (cm[0,0] + cm[0,1])
false_neg_rate = cm[1,0] / (cm[1,0] + cm[1,1])
true_pos_rate = cm[1,1] / (cm[1,0] + cm[1,1])

print(f"\n  True Neg Rate:   {true_neg_rate:.1%}  (correctly identified NO BLITZ)")
print(f"  False Pos Rate:  {false_pos_rate:.1%}  (false alarms)")
print(f"  False Neg Rate:  {false_neg_rate:.1%}  (missed blitzes)")
print(f"  True Pos Rate:   {true_pos_rate:.1%}  (correctly identified BLITZ)")

In [None]:
# Visualize confusion matrix
fig, ax = plt.subplots(figsize=(8, 7))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False, 
            xticklabels=['No Blitz', 'Blitz'],
            yticklabels=['No Blitz', 'Blitz'],
            annot_kws={'size': 14, 'weight': 'bold'},
            ax=ax,
            cbar_kws={'label': 'Count'})

ax.set_ylabel('Actual', fontsize=12, fontweight='bold')
ax.set_xlabel('Predicted', fontsize=12, fontweight='bold')
ax.set_title('Confusion Matrix: Random Forest Model', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

print("‚úì Confusion matrix visualization complete")

## 8. Real Predictions on Actual Game Situations

Let's look at real plays from our test set and see what our model predicted.

In [None]:
# Get some random test samples
np.random.seed(42)
sample_indices = np.random.choice(len(X_test), 5, replace=False)

print("\nREAL PREDICTION EXAMPLES")
print("=" * 80)

correct_count = 0

for i, idx in enumerate(sample_indices, 1):
    actual = y_test.iloc[idx]
    pred = y_pred_rf[idx]
    confidence = y_pred_proba_rf[idx]
    is_correct = actual == pred
    correct_count += is_correct
    
    # Get feature values for this play
    play_features = X_test.iloc[idx]
    
    actual_label = "üî¥ BLITZ" if actual == 1 else "üü¢ NO BLITZ"
    pred_label = "üî¥ BLITZ" if pred == 1 else "üü¢ NO BLITZ"
    result = "‚úÖ CORRECT" if is_correct else "‚ùå WRONG"
    
    print(f"\nPlay #{i}: {result}")
    print("-" * 80)
    print(f"  Predicted: {pred_label} ({confidence:.1%} confidence)")
    print(f"  Actual:    {actual_label}")
    print(f"\n  Game Situation:")
    print(f"    ‚Ä¢ Down: {int(play_features['down'])}")
    print(f"    ‚Ä¢ Yards to Go: {int(play_features['ydstogo'])}")
    print(f"    ‚Ä¢ Yardline: {int(play_features['yardline_100'])}")
    print(f"    ‚Ä¢ Quarter: {int(play_features['quarter'])}")
    print(f"    ‚Ä¢ Score Differential: {int(play_features['score_differential'])}")

print(f"\n{'=' * 80}")
print(f"\nAccuracy on these 5 examples: {correct_count}/5 ({correct_count/5:.0%})")

## 9. Final Verdict: Proof of Concept

**Can we predict defensive blitz with statistical significance?**

In [None]:
print("\n" + "=" * 80)
print("PROOF OF CONCEPT SUMMARY")
print("=" * 80)

improvement = rf_accuracy - baseline_accuracy

print(f"\nüìä METRICS COMPARISON:")
print(f"  Baseline (always predict no-blitz):")
print(f"    ‚Ä¢ Accuracy: {baseline_accuracy:.1%}")
print(f"\n  Best Model (Random Forest):")
print(f"    ‚Ä¢ Accuracy: {rf_accuracy:.1%}")
print(f"    ‚Ä¢ Precision: {rf_precision:.1%} (of predicted blitz, this % are correct)")
print(f"    ‚Ä¢ Recall: {rf_recall:.1%} (we catch this % of actual blitz plays)")
print(f"    ‚Ä¢ F1 Score: {rf_f1:.3f}")
print(f"    ‚Ä¢ ROC-AUC: {rf_auc:.3f}")

print(f"\nüéØ IMPROVEMENT:")
print(f"  {improvement:+.1%} over baseline")

print(f"\nüîë KEY FINDINGS:")
print(f"  1. Top 3 predictors: {', '.join(feature_importance.head(3)['Feature'].tolist())}")
print(f"  2. We catch {rf_recall:.0%} of actual blitz plays")
print(f"  3. When we predict blitz, we're right {rf_precision:.0%} of the time")

poc_success = (improvement > 0.05) and (rf_recall > 0.60)

print(f"\n{'=' * 80}")
if poc_success:
    print("\n‚úÖ PROOF OF CONCEPT: SUCCESSFUL")
    print(f"\nWe have successfully demonstrated that:")
    print(f"  ‚Ä¢ Our model beats the baseline by {improvement:.1%}")
    print(f"  ‚Ä¢ We can identify defensive blitz plays with {rf_recall:.0%} recall")
    print(f"  ‚Ä¢ Game situation features (down, yards, field position) matter")
    print(f"\nThis is ready for next phase: production deployment & real-time prediction")
else:
    print("\n‚ö†Ô∏è PROOF OF CONCEPT: NEEDS REFINEMENT")
    print(f"\nModel shows promise but needs:")
    print(f"  ‚Ä¢ Better feature engineering")
    print(f"  ‚Ä¢ Hyperparameter tuning")
    print(f"  ‚Ä¢ Additional data or features")

print(f"\n{'=' * 80}")

## Next Steps

If POC is successful:
1. ‚úÖ Deploy model as API endpoint
2. ‚úÖ Create real-time prediction dashboard
3. ‚úÖ Monitor model performance over time
4. ‚úÖ Collect feedback from coaches/analysts
5. ‚úÖ Iterate on features and model

If POC needs work:
1. üîç Analyze failure cases
2. üîç Engineer new features
3. üîç Try different models/hyperparameters
4. üîç Get more data
5. üîç Re-test