# Decision-Making Under Pressure: Evidence from the National Football League

## Abstract

How does time pressure influence the quality of human decision-making? This study leverages detailed play-by-play data from the 2024 National Football League (NFL) season to model how decision outcomes vary under different levels of temporal constraint, situational stress, and risk. Using machine learning models trained on variables such as time_to_throw, quarter_seconds_remaining, score_differential, down, ydstogo, and was_pressure, we estimate the probability of a "successful" play—defined by positive expected points added (EPA) or first-down conversion—conditional on the decision context and available time.

Our central research question asks: To what extent does decision quality degrade or improve as available decision time decreases, and under what conditions do individuals or organizations perform optimally under stress? By examining NFL play-calling and execution as a repeated, high-stakes, time-constrained decision environment, we aim to isolate the behavioral and structural determinants of performance under pressure.

## Policy Relevance

The findings will inform public policy domains where rapid, high-impact decision-making is critical—such as emergency response, military command, public safety, and crisis management. Policymakers often assume that training or hierarchical command can offset stress-induced performance degradation, but evidence on this relationship remains limited and context-dependent.

For instance:

- Emergency management agencies could use insights from this study to calibrate training scenarios that simulate the optimal level of time stress, improving coordination under pressure.

- Law enforcement and defense institutions could refine standard operating procedures to allow decision autonomy or time buffers when reaction time is linked to error rates.

- Fiscal policy planners could apply these findings to understand how decision compression near budget deadlines affects accuracy or risk aversion.

In practical terms, the study provides a behavioral and data-driven framework for designing systems that preserve decision quality when time is scarce—a persistent governance challenge from disaster response to financial regulation.

## Theoretical and Disciplinary Contribution

Within the broader social science and behavioral decision-making literature, this research advances the empirical study of bounded rationality and performance under time constraints. Prior work on time pressure often relies on laboratory experiments with limited ecological validity. By contrast, this project introduces a novel, real-world, high-frequency dataset that captures thousands of independent decision episodes—each with quantifiable outcomes, risk levels, and contextual stressors.

It contributes to the literature on:

- **Behavioral economics:** testing how stress and framing affect risk preferences.

- **Organizational behavior:** modeling how teams coordinate information and execute plans under compressed timelines.

- **Public administration:** translating insights from competitive sport into the design of resilient, high-performance bureaucracies.

In doing so, the project situates sports analytics as a powerful natural laboratory for understanding institutional decision processes at scale.

## Why NFL-Level Data Is Uniquely Suited

NFL play-by-play data provides a uniquely rich empirical context to study decision-making under pressure:

- **High frequency of structured, consequential decisions** — each play represents a controlled decision with clear inputs, timing, and outcomes.

- **Quantifiable stress indicators** — the data contain proxies for time pressure (quarter_seconds_remaining, score_differential, down), situational difficulty (ydstogo, defense_personnel), and physical/psychological stressors (was_pressure, temperature, crowd noise where available).

- **Repeated trials under consistent rules** — thousands of similar decisions allow for causal inference and model training across standardized conditions.

- **Observable outcomes** — each decision has an objective result (EPA, success/failure), enabling direct measurement of decision quality.

- **Human and organizational parallels** — NFL coaching and player decision-making occur within complex hierarchies, teamwork constraints, and real-time communication systems analogous to public institutions under stress.

Thus, professional football offers a rare "natural experiment" for modeling decision-making dynamics that are otherwise difficult to observe in real policy settings—where stakes are high, conditions are dynamic, and data are scarce.

## In Summary

This project bridges sports analytics and behavioral public policy, demonstrating how machine learning applied to NFL decision data can generate actionable insights for improving human and institutional performance under stress. By quantifying the relationship between time pressure and decision quality, the study contributes both to academic theory and to the practical design of systems that safeguard judgment in moments when it matters most.

# Predictor Variables (Features)

We can categorize predictors into five conceptual groups — contextual, temporal, risk-based, environmental, and human factors.

## A. Temporal / Pressure Features

Capture how much time or situational stress was present:

- `quarter_seconds_remaining`
- `half_seconds_remaining`
- `game_seconds_remaining`
- `score_differential`
- `quarter` (proxy for rising time pressure)
- `timeout_team` or number of `timeouts_remaining`
- `time_to_throw` (for QB decisions)
- `play_clock` (if available)
- `crowd_noise` (can be appended from stadium / attendance data)

**Interpretation:** Analogous to how decision-making in crisis situations worsens as time pressure increases.

## B. Situational Context Features

Reflect the strategic environment or difficulty:

- `down`
- `ydstogo`
- `yardline_100` (distance from end zone)
- `goal_to_go`
- `posteam` vs. `defteam` strength indicators (can be appended from team EPA averages)
- `score_differential_post`

**Interpretation:** How situational constraints interact with decision pressure.

## C. Decision Characteristics

What kind of action was chosen:

- `play_type` (categorical: pass, run, field_goal, etc.)
- `qb_dropback`, `shotgun`, `no_huddle`
- `pass_length`, `pass_location`, `run_gap`, `run_location`
- `offense_personnel` (number of RBs, TEs, WRs on field)
- `defense_personnel` or `number_of_pass_rushers`

**Interpretation:** The structure of the decision — strategy under stress.

## D. Risk and Aggression Indicators

Proxies for risk appetite:

- `air_yards` (longer passes = riskier)
- `fg_prob`, `td_prob`, `no_score_prob`
- `fourth_down_converted` / `fourth_down_failed` (rare risk decisions)
- `xpass` and `pass_oe` (model-based pass probability and over-expectation)

**Interpretation:** How risk preferences evolve under pressure (risk aversion vs. overconfidence).

## E. Environmental / Exogenous Variables

Contextual conditions outside decision-maker control:

- `temp`, `wind`, `roof`, `surface`, `weather`
- `home_team` / `away_team` (home-field stress factor)
- `stadium` or `stadium_id` (for fixed effects)
- `week` (season progression, fatigue)

**Interpretation:** How external conditions moderate performance under stress.



# What the Predictors Represent in Football Terms

The model's predictors might look football-specific, but conceptually they map cleanly onto universal features of human decision-making:
| Variable | Football Meaning | Decision Science Analogue | Policy Analogue |
|----------|------------------|---------------------------|-----------------|
| `time_to_throw` | Time (in seconds)<br>QB takes before<br>making a decision | Decision latency:<br>time pressure vs<br>deliberation | Time available before<br>making a judgment —<br>e.g. emergency responder<br>deciding whether to evacuate |
| `quarter_seconds_remaining` | Time remaining<br>in game/half | Temporal scarcity | How deadline proximity<br>affects risk tolerance<br>(budget cycles,<br>crisis escalation) |
| `score_differential` | Margin between<br>teams | Performance feedback /<br>situational stress | How perceived success<br>or failure affects risk<br>behavior (fiscal surpluses<br>vs deficits) |
| `down` and `ydstogo` | Options and<br>constraints on<br>next decision | Constraint<br>complexity | How limited options<br>(few resources, limited<br>tools) shape policy<br>decisions |
| `was_pressure` | Binary indicator<br>of defensive<br>pressure | Acute stress<br>event | External shocks<br>(market crash, natural<br>disaster, hostile event)<br>that reduce cognitive<br>bandwidth |
| `pass_length`, `air_yards` | Aggressiveness<br>of decision | Risk appetite | How decision-makers<br>adjust between conservative<br>vs bold actions under<br>uncertainty |
| `home_away` | Environment | Familiarity /<br>environmental<br>comfort | Bureaucrats working<br>in home vs foreign<br>context; institutional<br>familiarity |
| `temperature`, `wind`, `roof` | Physical<br>conditions | Environmental<br>stressors | Fatigue, heat, or<br>information overload<br>that degrades<br>performance |

These predictors — while drawn from the NFL — are really structured proxies for the general dimensions that drive human performance under pressure: time pressure, environmental stress, feedback loops, resource constraints, and risk-taking vs. conservatism.

---

# What Policymakers Could Learn from the Model

The insight isn't how football players behave, but how humans make complex, time-bounded, feedback-driven decisions when stakes are high.

## A. Understanding the "decision-speed vs. quality" trade-off

If the model finds that shorter `time_to_throw` correlates with lower EPA/WPA only under certain stress conditions (e.g., trailing by a small margin), it mirrors findings in crisis response: faster isn't always better — speed improves outcomes up to a point, then sharply degrades quality.

**Policy takeaway:**
- In emergency operations, allow structured seconds for deliberation when possible.
- Overly rigid time targets (e.g., "decide within 30 seconds") can reduce optimal outcomes.

## B. Stress and situational awareness degradation

The variable `was_pressure` quantifies plays made under acute stress. If the model shows significantly worse decision outcomes when `was_pressure = 1`, even controlling for time and score, it suggests stress distinctly erodes performance, beyond just time pressure.

**Policy takeaway:**
- Crisis training should separate stress resilience from decision timing — they are not interchangeable.
- For example, FEMA or military training could prioritize realistic stress inoculation over purely procedural drills.

## C. Feedback loops and overcorrection

When behind (`score_differential < 0`), QBs often take more risks (`pass_length` ↑), but the model could show that these don't necessarily improve outcomes (`wpa` ↓).

**Policy analogy:**
- Leaders facing "deficits" (budgetary, political, or reputational) may overcorrect by making riskier policy bets that statistically worsen long-term outcomes.
- Encourages institutional design that buffers against overreaction to short-term losses.

## D. Adaptive risk calibration

Some QBs consistently make better decisions (higher `epa`/`wpa`) under similar conditions — this measures expert adaptability.

**Policy takeaway:**
- Identify and train for "decision composure profiles" — people who maintain consistent judgment under stress should be cultivated for roles like crisis negotiators, emergency managers, and diplomats.

In [27]:
# !pip install --upgrade --force-reinstall nfl_data_py

# Setup



In [28]:
import nfl_data_py as nfl
import pandas as pd

# # Load the NFL play-by-play data for the specified years
df = nfl.import_pbp_data([2024, 2023, 2022]) # Load data for 2024, 2023, and 2022
print("Successfully loaded data for 2024, 2023, and 2022.")

# df.to_csv('data.csv')
# df = pd.read_csv('/content/data.csv')
# df['season'].value_counts()


2024 done.
2023 done.
2022 done.
Downcasting floats.
Successfully loaded data for 2024, 2023, and 2022.


**Create "Thus Far" Indicators for each game**

In [29]:
EVENTS_FOR_OFFENSE = [
    # special teams – negative/rare events for the punting/kicking offense
    "punt_blocked",

    # downs/outcomes (offense-oriented)
    "first_down_rush","first_down_pass","first_down_penalty",
    "third_down_converted","third_down_failed",
    "fourth_down_converted","fourth_down_failed",

    # pass outcomes
    "incomplete_pass",

    # kick/punt results credited to the kicking team (posteam)
    "touchback",              # we’ll scope it to punts/kickoffs for the kicking team
    "punt_inside_twenty","punt_in_endzone","punt_out_of_bounds","punt_downed","punt_fair_catch",

    # ball security / turnovers (against the offense)
    "interception",
    "fumble_forced","fumble_not_forced","fumble_out_of_bounds","fumble_lost","fumble",

    # defensive production against the offense (on offense snaps)
    "solo_tackle","assist_tackle","tackled_for_loss","qb_hit","sack",

    # penalties (offense only)
    "penalty",

    # scoring / conversions (offense)
    "touchdown","pass_touchdown","rush_touchdown","return_touchdown",
    "extra_point_attempt","two_point_attempt","field_goal_attempt",

    # special teams usage (offense as kicking team)
    "kickoff_attempt","punt_attempt",

    # usage (offense snaps)
    "rush_attempt","pass_attempt","complete_pass",

    # laterals (offense context)
    "lateral_reception","lateral_rush",

    # safety against offense
    "safety",
]

# numeric “running totals” (sum) attributed to the offense
NUMERIC_FOR_OFFENSE = [
    "return_yards",   # only when return_team == posteam
    "penalty_yards",  # only when penalty_team == posteam
]

In [30]:
def add_offense_so_far(
    df: pd.DataFrame,
    event_cols = EVENTS_FOR_OFFENSE,
    numeric_cols = NUMERIC_FOR_OFFENSE,
    include_current: bool = False,
    exclude_kneels_spikes_from_usage: bool = True,
):
    """
    Add cumulative '..._so_far' columns for the possessing offense (posteam) within each game.
    Context-aware attribution rules as discussed.
    """
    out = df.copy()

    # ---- helpers (float, never int) ----
    def _ind(name: str) -> pd.Series:
        if name in out.columns:
            # coerce weird types -> numeric 0/1; treat missing as 0
            return pd.to_numeric(out[name], errors="coerce").fillna(0.0).astype(float)
        return pd.Series(0.0, index=out.index)

    def _bool(name: str) -> pd.Series:
        return (out.get(name, 0)).fillna(0).astype(bool)

    def _str(name: str) -> pd.Series:
        return out.get(name, "").astype("string")

    def _num(name: str) -> pd.Series:
        return pd.to_numeric(out.get(name, 0), errors="coerce").fillna(0.0).astype(float)

    def _eq_team(series_name: str, team_series: pd.Series) -> pd.Series:
        if series_name not in out.columns:
            return pd.Series(False, index=out.index)
        return _str(series_name) == team_series.fillna("")

    def _any_eq_team(cols: list[str], team_series: pd.Series) -> pd.Series:
        m = pd.Series(False, index=out.index)
        for c in cols:
            if c in out.columns:
                m |= (_str(c) == team_series.fillna(""))
        return m

    # ---- ordering (for deterministic output) ----
    if "order_sequence" in out.columns:
        out = out.sort_values(["game_id","posteam","order_sequence"])
    else:
        out = out.sort_values(["game_id","posteam","game_seconds_remaining","play_id"],
                              ascending=[True, True, False, True])

    posteam = _str("posteam")
    defteam = _str("defteam")

    play_type = _str("play_type").str.lower()
    is_run = play_type.eq("run")
    is_pass = play_type.eq("pass")
    is_punt = play_type.eq("punt") | (_ind("punt_attempt").gt(0))
    is_kickoff = play_type.eq("kickoff") | (_ind("kickoff_attempt").gt(0))

    return_team_is_off = _eq_team("return_team", posteam)
    on_offense_scrimmage = is_run | is_pass

    valid_usage = pd.Series(True, index=out.index)
    if exclude_kneels_spikes_from_usage:
        valid_usage &= (~_bool("qb_kneel")) & (~_bool("qb_spike"))

    # ---- offense-attributed indicators (float 0/1) ----
    off = {}

    off["punt_blocked"] = (_ind("punt_blocked") * ((is_punt | is_kickoff).astype(float)))

    for c in ["first_down_rush","first_down_pass","first_down_penalty",
              "third_down_converted","third_down_failed",
              "fourth_down_converted","fourth_down_failed",
              "incomplete_pass"]:
        if c in event_cols:
            off[c] = _ind(c)

    if "touchback" in event_cols:
        off["touchback"] = _ind("touchback") * ((is_punt | is_kickoff).astype(float))
    for c in ["punt_inside_twenty","punt_in_endzone","punt_out_of_bounds","punt_downed","punt_fair_catch"]:
        if c in event_cols:
            off[c] = _ind(c) * (is_punt.astype(float))

    off["interception"] = _ind("interception")

    fumbled_by_off = _any_eq_team([c for c in ["fumbled_1_team","fumbled_2_team"] if c in out.columns], posteam)
    off["fumble"] = _ind("fumble") * (fumbled_by_off.astype(float))
    off["fumble_lost"] = (
        _ind("fumble_lost")
        * (fumbled_by_off & _any_eq_team(
            [c for c in ["fumble_recovery_1_team","fumble_recovery_2_team"] if c in out.columns], defteam
          )).astype(float)
    )
    forced_by_def = _any_eq_team([c for c in ["forced_fumble_player_1_team","forced_fumble_player_2_team"] if c in out.columns], defteam)
    off["fumble_forced"] = _ind("fumble_forced") * ((forced_by_def & (on_offense_scrimmage | return_team_is_off)).astype(float))
    off["fumble_not_forced"] = _ind("fumble_not_forced") * (fumbled_by_off.astype(float))
    off["fumble_out_of_bounds"] = _ind("fumble_out_of_bounds") * (fumbled_by_off.astype(float))

    solo_def = _any_eq_team([c for c in ["solo_tackle_1_team","solo_tackle_2_team"] if c in out.columns], defteam)
    assist_def = _any_eq_team(
        [c for c in ["assist_tackle_1_team","assist_tackle_2_team","assist_tackle_3_team","assist_tackle_4_team"] if c in out.columns],
        defteam,
    )
    off["solo_tackle"]   = (_ind("solo_tackle").gt(0) & solo_def & (on_offense_scrimmage | return_team_is_off)).astype(float)
    off["assist_tackle"] = (_ind("assist_tackle").gt(0) & assist_def & (on_offense_scrimmage | return_team_is_off)).astype(float)
    off["tackled_for_loss"] = _ind("tackled_for_loss") * (on_offense_scrimmage.astype(float))
    off["qb_hit"] = _ind("qb_hit") * (on_offense_scrimmage.astype(float))
    off["sack"]   = _ind("sack") * (on_offense_scrimmage.astype(float))

    off["penalty"] = _ind("penalty") * (_eq_team("penalty_team", posteam).astype(float))

    for c in ["touchdown","pass_touchdown","rush_touchdown","return_touchdown",
              "extra_point_attempt","two_point_attempt","field_goal_attempt"]:
        if c in event_cols:
            if c == "return_touchdown":
                off[c] = _ind(c) * (_eq_team("return_team", posteam).astype(float))
            elif c == "touchdown":
                off[c] = _ind(c) * (
                    (_eq_team("td_team", posteam) | _ind("pass_touchdown").gt(0) | _ind("rush_touchdown").gt(0)).astype(float)
                )
            else:
                off[c] = _ind(c)

    off["kickoff_attempt"] = _ind("kickoff_attempt")
    off["punt_attempt"]    = _ind("punt_attempt")

    off["rush_attempt"] = _ind("rush_attempt") * (valid_usage.astype(float))
    off["pass_attempt"] = _ind("pass_attempt") * (valid_usage.astype(float))
    off["complete_pass"] = _ind("complete_pass")

    off["lateral_reception"] = _ind("lateral_reception") * (is_pass.astype(float))
    off["lateral_rush"]      = _ind("lateral_rush") * (is_run.astype(float))

    off["safety"] = _ind("safety")

    # ---- numeric running totals (float) ----
    num = {}
    if "return_yards" in numeric_cols:
        num["return_yards"] = _num("return_yards").where(return_team_is_off, 0.0)
    if "penalty_yards" in numeric_cols:
        num["penalty_yards"] = _num("penalty_yards").where(_eq_team("penalty_team", posteam), 0.0)

    # ---- cumulative per (game_id, posteam) without integer casting ----
    so_far_cols = []

    # helper: groupby cumsum for a Series (not necessarily in out)
    def gb_cumsum(series: pd.Series) -> pd.Series:
        return series.groupby([out["game_id"], out["posteam"]]).cumsum()

    # events
    for c in event_cols:
        ser = off.get(c, _ind(c))  # offense-attributed if we built it; else raw indicator
        ser = ser.fillna(0.0)
        cum = gb_cumsum(ser)
        if not include_current:
            cum = cum - ser
        out[f"{c}_so_far"] = cum.fillna(0.0).astype(float)
        so_far_cols.append(f"{c}_so_far")

    # numeric sums
    for c, ser in num.items():
        ser = ser.fillna(0.0)
        cum = gb_cumsum(ser)
        if not include_current:
            cum = cum - ser
        out[f"{c}_so_far"] = cum.fillna(0.0).astype(float)
        so_far_cols.append(f"{c}_so_far")

    # print the list so you can paste into your feature set
    print("New cumulative columns created:")
    print(so_far_cols)

    return out

In [31]:
df = add_offense_so_far(df)

New cumulative columns created:
['punt_blocked_so_far', 'first_down_rush_so_far', 'first_down_pass_so_far', 'first_down_penalty_so_far', 'third_down_converted_so_far', 'third_down_failed_so_far', 'fourth_down_converted_so_far', 'fourth_down_failed_so_far', 'incomplete_pass_so_far', 'touchback_so_far', 'punt_inside_twenty_so_far', 'punt_in_endzone_so_far', 'punt_out_of_bounds_so_far', 'punt_downed_so_far', 'punt_fair_catch_so_far', 'interception_so_far', 'fumble_forced_so_far', 'fumble_not_forced_so_far', 'fumble_out_of_bounds_so_far', 'fumble_lost_so_far', 'fumble_so_far', 'solo_tackle_so_far', 'assist_tackle_so_far', 'tackled_for_loss_so_far', 'qb_hit_so_far', 'sack_so_far', 'penalty_so_far', 'touchdown_so_far', 'pass_touchdown_so_far', 'rush_touchdown_so_far', 'return_touchdown_so_far', 'extra_point_attempt_so_far', 'two_point_attempt_so_far', 'field_goal_attempt_so_far', 'kickoff_attempt_so_far', 'punt_attempt_so_far', 'rush_attempt_so_far', 'pass_attempt_so_far', 'complete_pass_so_

In [32]:
import numpy as np
df["completion_rate_so_far"] = df["complete_pass_so_far"] / df["pass_attempt_so_far"].replace(0, np.nan)
df["pass_rate_so_far"] = df["pass_attempt_so_far"] / (df["rush_attempt_so_far"] + df["pass_attempt_so_far"]).replace(0, np.nan)

**Create "Last Drive" Indicators**

In [33]:
import numpy as np

# 1) Last play of each (game, team, drive)
last_idx = (
    df.dropna(subset=["posteam"])
      .groupby(["game_id","posteam","drive"])["order_sequence"]
      .idxmax()
)
ends = df.loc[last_idx, ["game_id","posteam","drive","fixed_drive_result"]].copy()
v = ends["fixed_drive_result"].fillna("").str.lower()

# 2) Drive-ending flags for that drive
ends["drive_td"]   = v.str.contains("touchdown").astype(int)
ends["drive_fg"]   = v.str.contains("field goal").astype(int)
ends["drive_punt"] = v.str.contains("punt").astype(int)
ends["drive_tod"]  = v.str.contains("downs").astype(int)     # turnover on downs

# 3) Team-relative drive number (1,2,3,…) using earliest play in that drive
starts = (
    df.dropna(subset=["posteam"])
      .groupby(["game_id","posteam","drive"])["order_sequence"]
      .min()
      .reset_index(name="drive_start_order")
      .sort_values(["game_id","posteam","drive_start_order"])
)
starts["team_drive_number"] = starts.groupby(["game_id","posteam"]).cumcount() + 1

# attach team_drive_number to drive ends
ends = ends.merge(starts, on=["game_id","posteam","drive"], how="left")

# 4) Previous-drive flags → shift forward one drive number
prev = (
    ends[["game_id","posteam","team_drive_number","drive_td","drive_fg","drive_punt","drive_tod"]]
      .rename(columns={
          "drive_td":"prev_drive_td", "drive_fg":"prev_drive_fg",
          "drive_punt":"prev_drive_punt", "drive_tod":"prev_drive_tod"
      })
)
prev["team_drive_number"] += 1  # these belong to the *next* drive

# 5) Put team_drive_number on every play row, then merge prev-drive flags
df = df.merge(
    starts[["game_id","posteam","drive","team_drive_number"]],
    on=["game_id","posteam","drive"], how="left"
).merge(
    prev, on=["game_id","posteam","team_drive_number"], how="left"
)

# 6) First drive has no previous drive → zeros
df[["prev_drive_td","prev_drive_fg","prev_drive_punt","prev_drive_tod"]] = \
    df[["prev_drive_td","prev_drive_fg","prev_drive_punt","prev_drive_tod"]].fillna(0).astype(int)

**Calculate Avg Passing Yard Statistics**

In [34]:
f = (
    (df["play_type"] == "pass")
    & (df.get("play_deleted", 0) == 0)
    & (df.get("aborted_play", 0) == 0)
)
cols = ["season", "defteam", "game_id", "passing_yards", "sack", "yards_gained"]
p = df.loc[f, cols].copy()

per_game = (
    p.groupby(["season", "defteam", "game_id"])["passing_yards"]
     .sum(min_count=1)              # sums completed-pass yards; NaNs ignored
     .fillna(0)                     # all-incomplete games become 0
     .reset_index(name="pass_yards_allowed")
)
avg_pass_allowed = (
    per_game.groupby(["season", "defteam"])["pass_yards_allowed"]
            .mean()
            .reset_index(name="avg_pass_yards_allowed_per_game")
)
# Filter to offensive pass plays (ignore deleted/aborted)
mask = (
    (df["play_type"] == "pass")
    & (df.get("play_deleted", 0) == 0)
    & (df.get("aborted_play", 0) == 0)
)

# Keep only relevant columns
cols = ["season", "posteam", "game_id", "passing_yards"]
p = df.loc[mask, cols].copy()

# Sum passing yards per game for each offense
per_game = (
    p.groupby(["season", "posteam", "game_id"])["passing_yards"]
     .sum(min_count=1)
     .fillna(0)
     .reset_index(name="pass_yards_gained")
)

# Average per team, per season
avg_pass_gained = (
    per_game.groupby(["season", "posteam"])["pass_yards_gained"]
            .mean()
            .reset_index(name="avg_pass_yards_gained_per_game")
)
df = df.merge(
    avg_pass_gained,
    how="left",
    left_on=["season", "posteam"],
    right_on=["season", "posteam"]
).merge(
    avg_pass_allowed,
    how="left",
    left_on=["season", "defteam"],
    right_on=["season", "defteam"]
).rename(
    columns={
        "avg_pass_yards_gained_per_game": "team_avg_pass_yards_gained_per_game",
        "avg_pass_yards_allowed_per_game": "team_avg_pass_yards_allowed_per_game"
    }
)


**Create Continuous Sucess Measure**

In [35]:
# # Replace 0 in 'ydstogo' with a small number to avoid division by zero
# df['sucess_outcome'] = df['yards_gained'] / df['yardline_100']

Bounded & interpretable: 0 = worst (turnover, safety, or no forward progress), 1 = TD or full distance to the goal, everything else is “% of field to go that you gained.”

Handles nasty cases: caps long gains at 1, floors losses at 0, protects against yardline_100 == 0, and doesn’t punish intentional spikes/kneels (NaN by default).

QB-relevant overrides: INT / fumble lost / 4th-down fail / safety are hard 0’s; offensive penalties are 0; defensive penalties count as positive yardage.

Non-scrimmage is out of scope by default (you can include them if you want, but the interpretation changes).

In [36]:
def compute_success_outcome(
    df: pd.DataFrame,
    drop_non_scrimmage: bool = True,
    treat_spike_kneel_as_na: bool = True,
) -> pd.Series:
    """
    0..1 success for the possessing offense on this snap.

    Base: share of field-to-go gained on the snap: max(yards_gained, 0) / max(yardline_100, 1)
      • Caps at 1 (TD-level gain), floors at 0 (no forward progress or loss).
    Overrides (QB-intuitive):
      • Offensive TD (pass/rush/td_team==posteam) -> 1.0
      • Turnover against offense (interception OR fumble_lost by posteam OR 4th_down_failed OR safety) -> 0.0
      • Offensive penalty (accepted) -> 0.0
      • Defensive penalty (accepted) -> use max(gain, penalty_yards) for the base share
      • Spikes/Kneels -> NaN (don’t teach the model they’re “bad”); toggle via treat_spike_kneel_as_na
      • Non-scrimmage (kickoff/punt/FG/XP/2pt/no_play) -> NaN by default; toggle via drop_non_scrimmage
    """

    # helpers that always return Series aligned to df.index
    def col(name, fill=0, dtype=None):
        s = df[name] if name in df.columns else pd.Series(fill, index=df.index)
        return s.astype(dtype) if dtype is not None else s

    posteam = col("posteam", "", dtype="string")
    defteam = col("defteam", "", dtype="string")

    # play type handling (nflfastR 'play_type' is lower-case in canonical data; normalize just in case)
    play_type = col("play_type", "", dtype="string").str.lower()
    is_scrimmage = play_type.isin(["run", "pass"])
    if not drop_non_scrimmage:
        is_scrimmage = pd.Series(True, index=df.index)

    # spikes / kneels
    is_spike = col("qb_spike", 0).fillna(0).astype(bool)
    is_kneel = col("qb_kneel", 0).fillna(0).astype(bool)
    make_na = (is_spike | is_kneel) if treat_spike_kneel_as_na else pd.Series(False, index=df.index)

    # penalties
    penalty = col("penalty", 0).fillna(0).astype(bool)
    penalty_team = col("penalty_team", "", dtype="string")
    pen_yds = col("penalty_yards", 0).fillna(0)

    off_pen = penalty & (penalty_team == posteam)
    def_pen = penalty & (penalty_team == defteam)

    # turnovers against offense
    interception = col("interception", 0).fillna(0).astype(bool)
    fumble_lost = col("fumble_lost", 0).fillna(0).astype(bool)
    f1 = col("fumbled_1_team", "", dtype="string")
    f2 = col("fumbled_2_team", "", dtype="string")
    fumbled_by_off = (f1.eq(posteam) | f2.eq(posteam))
    fourth_failed = col("fourth_down_failed", 0).fillna(0).astype(bool)
    safety = col("safety", 0).fillna(0).astype(bool)
    turnover = interception | (fumble_lost & fumbled_by_off) | fourth_failed | safety

    # touchdowns for offense
    td_any = col("touchdown", 0).fillna(0).astype(bool)
    pass_td = col("pass_touchdown", 0).fillna(0).astype(bool)
    rush_td = col("rush_touchdown", 0).fillna(0).astype(bool)
    td_team = col("td_team", "", dtype="string")
    td_off = td_any & ((td_team == posteam) | pass_td | rush_td)

    # base share-of-field gained (bounded 0..1)
    pre = col("yardline_100").fillna(100).clip(lower=1)  # avoid div/0; treat unknown as far from goal
    gain = col("yards_gained").fillna(0)
    base_share = (gain.clip(lower=0).divide(pre)).clip(0, 1)

    # if defensive penalty, let penalty yards count if larger than recorded gain
    pen_share = (pd.Series(np.maximum(gain, pen_yds), index=df.index)
                 .divide(pre).clip(0, 1))
    y = base_share.where(~def_pen, pen_share).astype(float)
    y.name = "success_outcome"

    # apply overrides (keep as Series to avoid numpy scalar/assignment issues)
    y = y.where(~turnover, 0.0)   # turnovers -> 0
    y = y.where(~off_pen, 0.0)    # offensive penalty -> 0
    y = y.where(~td_off, 1.0)     # offensive TD -> 1

    # mask out non-scrimmage and spikes/kneels if desired
    y = y.where(is_scrimmage)
    y = y.where(~make_na)

    # also drop explicit no_play rows if present
    no_play = col("play_type", "", dtype="string").str.lower().eq("no_play")
    y = y.where(~no_play)

    return y

df["success_outcome"] = compute_success_outcome(df)

df["success_outcome"] = (
    df["yards_gained"].clip(lower=0, upper=df["yardline_100"])   # no overshoot
    .divide(df["yardline_100"].clip(lower=1))                    # avoid /0
    .clip(0, 1)                                                  # enforce bounds
)

s = (
    df["yards_gained"].clip(lower=0, upper=df["yardline_100"])
    / df["yardline_100"].clip(lower=1)
).clip(0, 1)

# hard overrides
s.loc[(df["interception"]==1) | (df["fumble_lost"]==1)] = 0
s.loc[df["touchdown"]==1] = 1
s.loc[df["fourth_down_failed"]==1] = 0
s.loc[df["safety"]==1] = 0

df["success_outcome"] = s


In [37]:
df['success_outcome'].describe()

count    137851.000000
mean          0.124383
std           0.223858
min           0.000000
25%           0.000000
50%           0.021739
75%           0.142857
max           1.000000
Name: success_outcome, dtype: float64

**Recode Categorical**

In [38]:
df['under_two_minute_warning'] = (df['half_seconds_remaining'] < 120).astype(int)

In [39]:
df["offense_drive_number"] = (
    df.sort_values(["game_id", "drive"])
      .groupby(["game_id", "posteam"])["drive"]
      .transform(lambda x: pd.factorize(x)[0] + 1)
)

In [40]:
df['down'].value_counts()

1.0    49983
2.0    37793
3.0    24167
4.0    13069
Name: down, dtype: int64

In [41]:
df['route'].value_counts()

                      53347
QUICK OUT              7247
HITCH/CURL             6550
SCREEN                 5930
GO                     5100
IN/DIG                 3579
SLANT                  3567
HITCH                  2937
DEEP OUT               2770
SHALLOW CROSS/DRAG     2629
FLAT                   2601
POST                   2532
OUT                    2528
CORNER                 2285
CROSS                  2076
SWING                  1659
IN                     1178
ANGLE                   694
WHEEL                   626
TEXAS/ANGLE             414
Name: route, dtype: int64

In [42]:
# Make dummies
cols_to_dummy = ["pass_location", "game_half",'roof','surface','offense_formation','route','defense_man_zone_type','defense_coverage_type']

all_dummy_cols = []

for col in cols_to_dummy:
    # make dummies
    df[col] = df[col].replace("", pd.NA)
    dummies = pd.get_dummies(df[col], prefix=col, drop_first=False)

    # record the new columns
    all_dummy_cols.extend(dummies.columns.tolist())

    # replace the original column with the dummies
    df = pd.concat([df.drop(columns=[col]), dummies], axis=1)

# print everything in clean list format
print("\nDummy columns created:")
print(all_dummy_cols)

# Recode Hometeam
df["posteam_home"] = (df["posteam_type"] == "home").astype(int)

#
df["season_postseason"] = (df["season_type"] == "POST").astype(int)



Dummy columns created:
['pass_location_left', 'pass_location_middle', 'pass_location_right', 'game_half_Half1', 'game_half_Half2', 'game_half_Overtime', 'roof_closed', 'roof_dome', 'roof_open', 'roof_outdoors', 'surface_a_turf', 'surface_astroturf', 'surface_fieldturf', 'surface_grass', 'surface_matrixturf', 'surface_sportturf', 'offense_formation_EMPTY', 'offense_formation_I_FORM', 'offense_formation_JUMBO', 'offense_formation_PISTOL', 'offense_formation_SHOTGUN', 'offense_formation_SINGLEBACK', 'offense_formation_UNDER CENTER', 'offense_formation_WILDCAT', 'route_ANGLE', 'route_CORNER', 'route_CROSS', 'route_DEEP OUT', 'route_FLAT', 'route_GO', 'route_HITCH', 'route_HITCH/CURL', 'route_IN', 'route_IN/DIG', 'route_OUT', 'route_POST', 'route_QUICK OUT', 'route_SCREEN', 'route_SHALLOW CROSS/DRAG', 'route_SLANT', 'route_SWING', 'route_TEXAS/ANGLE', 'route_WHEEL', 'defense_man_zone_type_MAN_COVERAGE', 'defense_man_zone_type_ZONE_COVERAGE', 'defense_coverage_type_2_MAN', 'defense_coverage

  df["season_postseason"] = (df["season_type"] == "POST").astype(int)


**Time Columns**

In [43]:
import pandas as pd
import numpy as np

NFL_TEAM_TZ = {
    "BUF":"America/New_York","MIA":"America/New_York","NE":"America/New_York","NYJ":"America/New_York",
    "BAL":"America/New_York","CIN":"America/New_York","CLE":"America/New_York","PIT":"America/New_York",
    "HOU":"America/Chicago","IND":"America/Indiana/Indianapolis","JAX":"America/New_York","TEN":"America/Chicago",
    "KC":"America/Chicago","LV":"America/Los_Angeles","LAC":"America/Los_Angeles","DEN":"America/Denver",
    "DAL":"America/Chicago","NYG":"America/New_York","PHI":"America/New_York","WAS":"America/New_York","WSH":"America/New_York",
    "CHI":"America/Chicago","DET":"America/Detroit","GB":"America/Chicago","MIN":"America/Chicago",
    "ATL":"America/New_York","CAR":"America/New_York","NO":"America/Chicago","TB":"America/New_York",
    "ARI":"America/Phoenix","SF":"America/Los_Angeles","SEA":"America/Los_Angeles",
    "LA":"America/Los_Angeles","LAR":"America/Los_Angeles"
}

def add_time_features(
    df: pd.DataFrame,
    time_col: str = "time_of_day",
    tz_key_col: str = "home_team",
    prefix: str = "tod_",
    print_cols: bool = True
) -> pd.DataFrame:
    """
    Parse ISO 8601 timestamps -> UTC; convert to local (by tz_key_col) -> tz-naive;
    derive hour/weekday/flags + cyclical features. Avoids .dt errors.
    """
    out = df.copy()

    # 1) Parse to UTC (handles 'Z' and offsets; invalid -> NaT)
    ts_utc = pd.to_datetime(out.get(time_col), errors="coerce", utc=True)
    out[f"{prefix}utc"] = ts_utc

    # 2) Map to timezone names; fallback to UTC
    tz_series = out.get(tz_key_col).astype("string").map(NFL_TEAM_TZ).fillna("UTC")
    out[f"{prefix}tz_name"] = tz_series

    # 3) Convert per-timezone group, then DROP tz (tz-naive) to keep dtype uniform
    local_naive = pd.Series(pd.NaT, index=out.index, dtype="datetime64[ns]")
    for tz_name, idx in out.groupby(f"{prefix}tz_name").groups.items():
        if len(idx) == 0:
            continue
        # convert UTC -> local tz, then drop tz (tz_localize(None))
        loc = ts_utc.loc[idx].dt.tz_convert(tz_name).dt.tz_localize(None)
        local_naive.loc[idx] = loc.values
    out[f"{prefix}local"] = local_naive  # dtype: datetime64[ns] (uniform -> .dt works)

    # 4) Derive features from local time (safe .dt access)
    ts_loc = out[f"{prefix}local"]

    out[f"{prefix}hour"] = ts_loc.dt.hour
    out[f"{prefix}minute"] = ts_loc.dt.minute
    out[f"{prefix}weekday"] = ts_loc.dt.weekday  # Mon=0..Sun=6
    out[f"{prefix}is_weekend"] = out[f"{prefix}weekday"].isin([5, 6]).astype("int8")
    out[f"{prefix}is_night"] = ((out[f"{prefix}hour"] >= 20) | (out[f"{prefix}hour"] <= 6)).astype("int8")

    # Seconds since local midnight (float; NaN if ts_loc is NaT)
    sec_midnight = (
        ts_loc.dt.hour.fillna(0)*3600
        + ts_loc.dt.minute.fillna(0)*60
        + ts_loc.dt.second.fillna(0)
    ).astype(float)
    out[f"{prefix}sec_midnight"] = sec_midnight

    # Cyclical encodings (time-of-day)
    angle = 2 * np.pi * (sec_midnight / 86400.0)
    out[f"{prefix}sin_time"] = np.sin(angle)
    out[f"{prefix}cos_time"] = np.cos(angle)

    # Optional seasonality & epoch seconds
    out[f"{prefix}month"] = ts_loc.dt.month
    # epoch seconds as float (handles NaT -> NaN cleanly)
    out[f"{prefix}unix"] = (ts_loc - pd.Timestamp("1970-01-01")).dt.total_seconds()

    new_cols = [
        f"{prefix}utc", f"{prefix}tz_name", f"{prefix}local",
        f"{prefix}hour", f"{prefix}minute", f"{prefix}weekday",
        f"{prefix}is_weekend", f"{prefix}is_night",
        f"{prefix}sec_midnight", f"{prefix}sin_time", f"{prefix}cos_time",
        f"{prefix}month", f"{prefix}unix"
    ]
    if print_cols:
        print("Time feature columns (copy-paste):\n[\n  '" + "',\n  '".join(new_cols) + "'\n]")

    return out
df = add_time_features(df, time_col="time_of_day", tz_key_col="home_team")


Time feature columns (copy-paste):
[
  'tod_utc',
  'tod_tz_name',
  'tod_local',
  'tod_hour',
  'tod_minute',
  'tod_weekday',
  'tod_is_weekend',
  'tod_is_night',
  'tod_sec_midnight',
  'tod_sin_time',
  'tod_cos_time',
  'tod_month',
  'tod_unix'
]


**Clean Weather**

In [44]:
def add_weather_flags(df: pd.DataFrame) -> pd.DataFrame:
    w = df["weather"].fillna("").str.lower()
    df["is_precip"] = w.str.contains("rain|snow|sleet|shower|wet").astype(int)
    df["is_windy"]  = (w.str.contains("wind") | (df["wind"].fillna(0) > 15)).astype(int)
    df["is_clear"]  = ((w.str.contains("clear|sun")) & (df["is_precip"] == 0)).astype(int)
    return df
df = add_weather_flags(df)

**Clean Air Yards**

In [45]:
def add_airyards_clean(df: pd.DataFrame) -> pd.DataFrame:
    out = df.copy()

    # text field for fallback matching
    desc = out.get("desc", pd.Series("", index=out.index)).fillna("").str.lower()

    # basic play-type flags
    is_pass = out.get("pass_attempt", 0).fillna(0).astype(bool)
    is_spike = out.get("qb_spike", 0).fillna(0).astype(bool)

    # heuristic throwaway: incomplete pass with no receiver + throwaway language
    no_receiver = out.get("receiver_player_id", pd.Series(np.nan, index=out.index)).isna()
    incomplete = out.get("complete_pass", 0).fillna(0).astype(bool) == False
    is_throwaway = is_pass & incomplete & no_receiver & (
        desc.str.contains("throwaway|thrown away|threw it away|out of bounds")
    )

    # Some feeds also mark "batted" or "spiked"—exclude those from 'intended depth'
    is_batted = desc.str.contains("batted|tipped") & is_pass & incomplete

    # single control flag
    out["is_throwaway_or_spike"] = (is_throwaway | is_spike).astype(int)

    # Keep true air_yards for real targets; null them for throwaways/spikes
    air = out.get("air_yards", pd.Series(np.nan, index=out.index))
    out["air_yards_clean"] = air.where(~out["is_throwaway_or_spike"].astype(bool))

    # For tree models: provide a numeric-imputed version + the flag
    # (RF can't take NaN; the flag preserves information)
    out["air_yards_for_model"] = out["air_yards_clean"].fillna(0)

    # Optional: cap extreme tails but KEEP negatives (screens)
    out["air_yards_for_model"] = out["air_yards_for_model"].clip(lower=-15, upper=60)

    # Optional extras the QB would know:
    out["is_batted_pass"] = is_batted.astype(int)
    out["is_screen_like"]  = (air <= -1).fillna(False).astype(int)  # behind LOS

    return out
df = add_airyards_clean(df)

**Filter Only Plays We Care to Observe**

In [46]:
# Filter the DataFrame to find observations where yards_gained is greater than yrds_to_ez
weird_plays = df[df['yards_gained'] > df['yardline_100']]
# Drop the weird_plays from the original DataFrame
df = df.drop(weird_plays.index)
df = df.query('special_teams_play==0')
df = df.query("play_type_nfl == 'PASS'")
df['time_to_throw'].dropna()

0         3.370
1         1.768
4         1.802
7         2.068
9         2.536
          ...  
148549    7.900
148552    3.500
148557    1.600
148559    2.500
148575    2.100
Name: time_to_throw, Length: 55115, dtype: float32

In [47]:
df = df.sort_values(['game_id', 'drive', 'play_id'])
df['drive_play_index'] = df.groupby(['game_id', 'drive']).cumcount() + 1

In [48]:
X = ['posteam_home',
     'under_two_minute_warning',
     'season_postseason',
     'week',
     'quarter_seconds_remaining',
     'half_seconds_remaining',
     'game_seconds_remaining',
     'offense_drive_number',
     'qtr',
     'down',
     'yards_after_catch',
     'posteam_timeouts_remaining',
     'defteam_timeouts_remaining',
     'posteam_score',
     'defteam_score',
     'score_differential',
     'air_yards_for_model',
     'is_throwaway_or_spike',
     'is_batted_pass',
     'is_screen_like',

    #In-Game Counts
     'punt_blocked_so_far', 'first_down_rush_so_far', 'first_down_pass_so_far', 'first_down_penalty_so_far', 'third_down_converted_so_far', 'third_down_failed_so_far', 'fourth_down_converted_so_far', 'fourth_down_failed_so_far', 'incomplete_pass_so_far', 'touchback_so_far', 'punt_inside_twenty_so_far', 'punt_in_endzone_so_far', 'punt_out_of_bounds_so_far', 'punt_downed_so_far', 'punt_fair_catch_so_far', 'interception_so_far', 'fumble_forced_so_far', 'fumble_not_forced_so_far', 'fumble_out_of_bounds_so_far', 'fumble_lost_so_far', 'fumble_so_far', 'solo_tackle_so_far', 'assist_tackle_so_far', 'tackled_for_loss_so_far', 'qb_hit_so_far', 'sack_so_far', 'penalty_so_far', 'touchdown_so_far', 'pass_touchdown_so_far', 'rush_touchdown_so_far', 'return_touchdown_so_far', 'extra_point_attempt_so_far', 'two_point_attempt_so_far', 'field_goal_attempt_so_far', 'kickoff_attempt_so_far', 'punt_attempt_so_far', 'rush_attempt_so_far', 'pass_attempt_so_far', 'complete_pass_so_far', 'lateral_reception_so_far', 'lateral_rush_so_far', 'safety_so_far', 'return_yards_so_far', 'penalty_yards_so_far',

    #In-Game Rates
     'completion_rate_so_far',
     'pass_rate_so_far',

     'season',
     'tod_hour',
     'tod_minute',
     'tod_weekday',
     'tod_is_weekend',
     'tod_is_night',
     'tod_sec_midnight',
     'tod_sin_time',
     'tod_cos_time',
     'tod_month',
     'tod_unix',
     'is_precip',
     'is_windy',
     'is_clear',
     'drive_play_index',
     'temp',
     'wind',
     'defenders_in_box',
     'number_of_pass_rushers',

     #Things of Interest
     'time_to_throw',
     'was_pressure',

     #Dummies
     'pass_location_left', 'pass_location_middle', 'pass_location_right', 'game_half_Half1', 'game_half_Half2', 'game_half_Overtime', 'roof_closed', 'roof_dome', 'roof_open', 'roof_outdoors', 'surface_a_turf', 'surface_astroturf', 'surface_fieldturf', 'surface_grass', 'surface_matrixturf', 'surface_sportturf', 'offense_formation_EMPTY', 'offense_formation_I_FORM', 'offense_formation_JUMBO', 'offense_formation_PISTOL', 'offense_formation_SHOTGUN', 'offense_formation_SINGLEBACK', 'offense_formation_UNDER CENTER', 'offense_formation_WILDCAT', 'route_ANGLE', 'route_CORNER', 'route_CROSS', 'route_DEEP OUT', 'route_FLAT', 'route_GO', 'route_HITCH', 'route_HITCH/CURL', 'route_IN', 'route_IN/DIG', 'route_OUT', 'route_POST', 'route_QUICK OUT', 'route_SCREEN', 'route_SHALLOW CROSS/DRAG', 'route_SLANT', 'route_SWING', 'route_TEXAS/ANGLE', 'route_WHEEL', 'defense_man_zone_type_MAN_COVERAGE', 'defense_man_zone_type_ZONE_COVERAGE', 'defense_coverage_type_2_MAN', 'defense_coverage_type_BLOWN', 'defense_coverage_type_COMBO', 'defense_coverage_type_COVER_0', 'defense_coverage_type_COVER_1', 'defense_coverage_type_COVER_2', 'defense_coverage_type_COVER_3', 'defense_coverage_type_COVER_4', 'defense_coverage_type_COVER_6', 'defense_coverage_type_COVER_9', 'defense_coverage_type_PREVENT']

y = 'success_outcome'

In [49]:
# Drop any rows missing the target
df = df.dropna(subset=["success_outcome"])

# Ensure all X columns exist and are numeric
for col in X:
    if col not in df.columns:
        print(f"Missing feature: {col}")
X_data = df[X].apply(pd.to_numeric, errors="coerce").fillna(0).astype(float)
y_data = df["success_outcome"].astype(float)

In [50]:
df.to_csv('cleaned_data.csv')

In [51]:
# from google.colab import drive
# drive.mount('/content/drive')
# OUTDIR = "/content/drive/MyDrive/qb_success_results"  # change if you like

In [52]:
# import os, json, joblib, numpy as np, pandas as pd, matplotlib.pyplot as plt
# from pathlib import Path

# from sklearn.model_selection import GroupKFold, GroupShuffleSplit, RandomizedSearchCV
# from sklearn.pipeline import Pipeline
# from sklearn.compose import ColumnTransformer
# from sklearn.preprocessing import FunctionTransformer
# from sklearn.impute import SimpleImputer
# from sklearn.feature_selection import SelectFromModel
# from sklearn.ensemble import RandomForestRegressor, HistGradientBoostingRegressor
# from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error, make_scorer
# from sklearn.inspection import permutation_importance

# import shap

# # ---------------------------
# # Utils
# # ---------------------------
# def _ensure_numeric_frame(df: pd.DataFrame, cols: list[str]) -> pd.DataFrame:
#     cols_existing = [c for c in cols if c in df.columns]
#     if len(cols_existing) < len(cols):
#         missing = sorted(set(cols) - set(cols_existing))
#         print(f"[warn] missing features dropped: {missing[:10]}{' ...' if len(missing)>10 else ''}")
#     X = df[cols_existing].copy()
#     X = X.apply(pd.to_numeric, errors="coerce")
#     return X

# def _train_val_split_by_game(df: pd.DataFrame, test_size=0.2, random_state=42):
#     splitter = GroupShuffleSplit(n_splits=1, test_size=test_size, random_state=random_state)
#     idx_tr, idx_val = next(splitter.split(df, groups=df["game_id"]))
#     return idx_tr, idx_val

# def _mkdir(path):
#     Path(path).mkdir(parents=True, exist_ok=True)

# # ---------------------------
# # Main builder (overnight-ready)
# # ---------------------------
# def build_and_train_qb_success_model(
#     df: pd.DataFrame,
#     feature_cols: list[str],
#     target_col: str = "success_outcome",
#     group_col: str = "game_id",
#     outdir: str = "./qb_success_results",
#     random_state: int = 42,
#     n_iter_search: int = 40,
#     shap_sample: int = 5000,      # subsample for SHAP to keep runtime reasonable
#     perm_repeats: int = 10,       # permutation importance repeats
#     verbose: int = 3,             # more logs from RandomizedSearchCV
# ):
#     _mkdir(outdir)

#     # ---------------- Sanity
#     if target_col not in df.columns:
#         raise KeyError(f"Target column '{target_col}' not found")
#     df = df.dropna(subset=[target_col]).copy()

#     # ---------------- Data
#     X_raw = _ensure_numeric_frame(df, feature_cols)
#     y = pd.to_numeric(df[target_col], errors="coerce").astype(float)
#     groups = df[group_col]

#     # ---------------- Pipeline
#     num_selector = FunctionTransformer(lambda X: X, feature_names_out="one-to-one")
#     pre = ColumnTransformer(
#         transformers=[("num", num_selector, X_raw.columns)],
#         remainder="drop",
#         sparse_threshold=0.0,
#         verbose_feature_names_out=False,
#     )

#     imputer = SimpleImputer(strategy="median", add_indicator=True)

#     selector = SelectFromModel(
#         RandomForestRegressor(
#             n_estimators=300, max_depth=None, n_jobs=-1, random_state=random_state
#         ),
#         threshold="median",
#         prefit=False
#     )

#     hgb = HistGradientBoostingRegressor(
#         random_state=random_state, early_stopping=True, validation_fraction=0.1, max_iter=300
#     )

#     pipe = Pipeline([
#         ("pre", pre),
#         ("imputer", imputer),
#         ("selector", selector),
#         ("model", hgb),
#     ])

#     # ---------------- Search space
#     param_dist = {
#         "selector__threshold": ["median","0.75*median","1.0*median","1.25*median","2*median"],
#         "model__learning_rate":  np.geomspace(0.01, 0.2, 10),
#         "model__max_depth":      [None, 4, 6, 8, 10],
#         "model__max_leaf_nodes": [None, 31, 63, 127],
#         "model__min_samples_leaf": [10, 20, 50, 100],
#         "model__l2_regularization": np.geomspace(1e-6, 1e-1, 8),
#     }

#     gkf = GroupKFold(n_splits=5)
#     neg_mae = make_scorer(mean_absolute_error, greater_is_better=False)

#     search = RandomizedSearchCV(
#         estimator=pipe,
#         param_distributions=param_dist,
#         n_iter=n_iter_search,
#         scoring={"neg_mae": neg_mae, "r2": "r2"},
#         refit="neg_mae",
#         cv=gkf.split(X_raw, y, groups),
#         n_jobs=-1,
#         verbose=verbose,
#         random_state=random_state,
#         return_train_score=True,
#     )

#     print("[info] starting RandomizedSearchCV...")
#     search.fit(X_raw, y)
#     print("[info] search complete.")

#     # save cv results
#     cv_df = pd.DataFrame(search.cv_results_)
#     cv_df.to_csv(os.path.join(outdir, "cv_results.csv"), index=False)
#     joblib.dump(search, os.path.join(outdir, "search.pkl"))

#     # ---------------- Hold-out eval
#     idx_tr, idx_val = _train_val_split_by_game(df, test_size=0.2, random_state=random_state)
#     X_tr, X_val = X_raw.iloc[idx_tr], X_raw.iloc[idx_val]
#     y_tr, y_val = y.iloc[idx_tr], y.iloc[idx_val]

#     best_pipe = search.best_estimator_
#     best_pipe.fit(X_tr, y_tr)

#     y_pred = best_pipe.predict(X_val)

#     r2  = r2_score(y_val, y_pred)
#     mae = mean_absolute_error(y_val, y_pred)
#     mse = mean_squared_error(y_val, y_pred)
#     rmse = np.sqrt(mse)

#     metrics = {
#         "val_r2": r2,
#         "val_mae": mae,
#         "val_mse": mse,
#         "val_rmse": rmse,
#         "cv_best_neg_mae": float(search.best_score_),
#         "cv_best_params": search.best_params_,
#         "n_features_input": int(X_raw.shape[1]),
#     }
#     print("[metrics]", json.dumps(metrics, indent=2))
#     with open(os.path.join(outdir, "metrics.json"), "w") as f:
#         json.dump(metrics, f, indent=2)

#     # save model
#     joblib.dump(best_pipe, os.path.join(outdir, "best_pipeline.pkl"))

#     # ---------------- Permutation importance
#     print("[info] computing permutation importance ...")
#     perm = permutation_importance(best_pipe, X_val, y_val,
#                                   n_repeats=perm_repeats, random_state=random_state, n_jobs=-1)
#     pre_names = best_pipe.named_steps["pre"].get_feature_names_out()
#     n_orig = len(pre_names)
#     indicator_names = [f"{n}_missing" for n in pre_names]
#     imputed_feature_names = list(pre_names) + indicator_names
#     support_mask = best_pipe.named_steps["selector"].get_support()
#     selected_feature_names = [f for f, keep in zip(imputed_feature_names, support_mask) if keep]

#     pi = pd.Series(perm.importances_mean, index=selected_feature_names).sort_values(ascending=False)
#     pi.to_csv(os.path.join(outdir, "permutation_importance.csv"))
#     print("[top PI]\n", pi.head(25))

#     # ---------------- Residual diagnostics
#     resid = y_val - y_pred
#     diag = pd.DataFrame({"y_true": y_val, "y_pred": y_pred, "residual": resid}, index=X_val.index)
#     diag.to_csv(os.path.join(outdir, "residuals_val.csv"))

#     plt.figure()
#     plt.scatter(y_pred, resid, s=4, alpha=0.3)
#     plt.axhline(0, color="black", linewidth=1)
#     plt.xlabel("Predicted success_outcome"); plt.ylabel("Residual")
#     plt.title("Residuals vs Predicted")
#     plt.tight_layout()
#     plt.savefig(os.path.join(outdir, "residuals_vs_pred.png"), dpi=300)
#     plt.close()

#     # ---------------- SHAP (summaries + plots)
#     # transform X_val through pre+imputer+selector to align with model inputs
#     print("[info] computing SHAP on validation subsample ...")
#     # pipeline slicing: everything except last step (model)
#     pre_to_sel = Pipeline(best_pipe.steps[:-1])
#     Xt_val = pre_to_sel.transform(X_val)
#     # align names to selected features
#     Xt_feature_names = selected_feature_names

#     # subsample for speed
#     if shap_sample and Xt_val.shape[0] > shap_sample:
#         samp_idx = np.random.RandomState(random_state).choice(Xt_val.shape[0], shap_sample, replace=False)
#         Xt_val_shap = Xt_val[samp_idx]
#     else:
#         Xt_val_shap = Xt_val

#     # Tree-based explainer works with HGB
#     model_core = best_pipe.named_steps["model"]
#     explainer = shap.Explainer(model_core, feature_names=Xt_feature_names)
#     shap_values = explainer(Xt_val_shap)

#     # save raw shap values summary (mean |shap|)
#     mean_abs_shap = np.abs(shap_values.values).mean(axis=0)
#     shap_summary = pd.Series(mean_abs_shap, index=Xt_feature_names).sort_values(ascending=False)
#     shap_summary.to_csv(os.path.join(outdir, "shap_mean_abs.csv"))

#     # plots
#     shap.plots.bar(shap_values, max_display=30, show=False)
#     plt.tight_layout()
#     plt.savefig(os.path.join(outdir, "shap_bar_top30.png"), dpi=300)
#     plt.close()

#     shap.plots.beeswarm(shap_values, max_display=30, show=False)
#     plt.tight_layout()
#     plt.savefig(os.path.join(outdir, "shap_beeswarm_top30.png"), dpi=300)
#     plt.close()

#     # store a small table for notebook printing
#     top20 = shap_summary.head(20)
#     top20.to_csv(os.path.join(outdir, "shap_top20.csv"))

#     # ---------------- Return summary
#     report = {
#         "metrics": metrics,
#         "permutation_importance_head": pi.head(25),
#         "selected_features": selected_feature_names,
#         "cv_results_file": os.path.join(outdir, "cv_results.csv"),
#         "model_file": os.path.join(outdir, "best_pipeline.pkl"),
#         "shap_bar": os.path.join(outdir, "shap_bar_top30.png"),
#         "shap_beeswarm": os.path.join(outdir, "shap_beeswarm_top30.png"),
#         "residuals_plot": os.path.join(outdir, "residuals_vs_pred.png"),
#     }
#     return search, best_pipe, report
