## Predictive Modeling 
This notebook series builds predictive modeling datasets for **each round and game** of the NBA Playoffs from 2015–2025. Each stage of the playoffs presents unique strategic and statistical contexts, so we generate customized features for every round and game number.

#### Modeling Strategy by Round and Game
Each playoff round and game required a tailored dataset based on game context and available stats. 

- **Round 1**:
  - *Game 1*: Regular season stats, seeding, head-to-head records.
  - *Game 2*: Adds Game 1 results and momentum.
  - *Games 3–7*: Uses rolling stats and current series score.

- **Rounds 2–4**:
  - *Game 1*: Combines regular season and prior round averages, series length, and rest advantage.
  - *Games 2–7*: Adds rolling stats from current round and series progress.

This structure ensures that features reflect real-world conditions as teams advance through the playoffs.

#### Why Separate Models?

By treating each round and game as a unique scenario, this approach:

- Avoids data leakage from future games
- Accounts for strategic changes as series progress
- Improves model realism and interpretability

This modular approach allows flexibility in choosing the most appropriate features and models depending on the playoff context.

#### Build Modeling Dataset - Round 1 Game 1
This section generates the feature dataset for predicting outcomes of **Round 1 Game 1** matchups in the NBA Playoffs. It constructs team-level inputs by combining regular season stats, team seedings, head-to-head records, and home-court advantage. 

##### Overview of Steps:
1. **Import Data**: Read `team_statistics_regular_season.csv`and `team_statistics_playoff_games.csv` files.
2. **Filter for Round 1, Game 1**: Select only playoff games that occurred in Round 1, Game 1 between 2015-2025.
3. **Merge Regular Season Statistics**: Add regular season statistics for both the home and away teams using the Four Factors.
4. **Rename Columns for Clarity**: Append `_home_season` or `_away_season` suffixes for clear team stat tracking.
5. **Drop Redundant Columns**: Remove team name columns no longer needed after merging.
6. **Select Features for Modeling**: Keep only relevant identifiers, game context, and statistical features.
7. **Merge Head-to-Head Statistics**: Join in regular season head-to-head matchup data between the home and away teams.

In [3]:
# Import libraries
import pandas as pd

# Load Regular Season Game Team Statstics and Playoff Games Statistics Dataset
playoffs_path = "../data/processed/playoffs_games.csv"
playoff_games_df = pd.read_csv(playoffs_path)

regular_season_path = "../data/processed/team_statistics_regular_season.csv"
regular_season_stats_df = pd.read_csv(regular_season_path)

# Filter for Round 1, Game 1
round1_game1_df = playoff_games_df[
    (playoff_games_df["round"] == 1) &
    (playoff_games_df["gameNumber"] == 1) &
    (playoff_games_df["season"].between(2015, 2025))
].copy()

# Merge Home Team Regular Season Statistics
round1_game1_df = round1_game1_df.merge(
    regular_season_stats_df,
    left_on=["season", "homeTeam"],
    right_on=["season", "teamName"],
    how="left",
    suffixes=("", "_home")
)

# Rename Home Team Columns
round1_game1_df = round1_game1_df.rename(columns={
    "eFG%": "eFG%_home_season",
    "TOV%": "TOV%_home_season",
    "ORB%": "ORB%_home_season",
    "DRB%": "DRB%_home_season",
    "FT/FGA": "FT/FGA_home_season"
})

# Merge Away Team Regular Season Statistics
round1_game1_df = round1_game1_df.merge(
    regular_season_stats_df,
    left_on=["season", "awayTeam"],
    right_on=["season", "teamName"],
    how="left",
    suffixes=("", "_away")
)

# Rename Away Team Columns
round1_game1_df = round1_game1_df.rename(columns={
    "eFG%": "eFG%_away_season",
    "TOV%": "TOV%_away_season",
    "ORB%": "ORB%_away_season",
    "DRB%": "DRB%_away_season",
    "FT/FGA": "FT/FGA_away_season"
})

# Drop unnecessary columns
round1_game1_df = round1_game1_df.drop(columns=["teamName", "teamName_away"])

# Select features for modeling
model_df = round1_game1_df[[
    "gameId", "season", "homeTeam", "awayTeam",
    "homeSeed", "awaySeed", "homeCourt", "homeWin", "gameNumber", "round",
    "eFG%_home_season", "TOV%_home_season", "ORB%_home_season", "DRB%_home_season", "FT/FGA_home_season",
    "eFG%_away_season", "TOV%_away_season", "ORB%_away_season", "DRB%_away_season", "FT/FGA_away_season"
]]

# Load Head-to-Head Regular Season Dataset
h2h_path = "../data/processed/regular_season_head_to_head_stats.csv"
h2h_df = pd.read_csv(h2h_path)

# Rename columns for clarity
h2h_df = h2h_df.rename(columns={
    "teamName": "homeTeam",
    "opponentName": "awayTeam",
    "wins": "h2h_home_wins",
    "games_played": "h2h_games_played",
    "win_rate": "h2h_home_winrate"
})

# Merge Head-to-Head Statistics into Round 1 Game 1 Dataset
model_df = model_df.merge(
    h2h_df[["season", "homeTeam", "awayTeam", "h2h_games_played", "h2h_home_wins", "h2h_home_winrate"]],
    on=["season", "homeTeam", "awayTeam"],
    how="left"
)

# Export the final dataset
model_df.to_csv("../data/processed/modeling_round1_game1.csv", index=False)

#### Build Modeling Dataset - Round 1 Game 2
This section builds the feature dataset used to predict **Round 1 Game 2** outcomes in the NBA Playoffs. It incorporates playoff game statistics, regular season team performance, seeding, home-court advantage, and the results of **Game 1**.

##### Overview of Steps:
1. **Import Data**: Read `team_statistics_regular_season.csv`, `playoffs_games.csv`, and `team_statistics_playoff_games.csv` files.
2. **Filter for Round 1 Game 2**: Subset the playoff games to keep only Round 1 Game 2 matchups.
3. **Get Round 1 Game 1 Results**: Identify Game 1 matchups, then pull and merge the home and away team statistics from those games.
4. **Attach Game 1 Stats to Game 2 Matchups**: Merge the Game 1 team stats (e.g., eFG%, ORB%, score) into the Game 2 dataframe.
5. **Create Momentum Feature**: Add a binary feature (`homeLostRound1Game1`) to indicate whether the home team lost Game 1.
6. **Merge Regular Season Stats**: Attach regular season Four Factor statistics for both home and away teams.

In [6]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")

# Filter for seasons 2015-2025
playoff_games = playoff_games[
    (playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)
]
playoff_stats = playoff_stats[
    (playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)
]
regular_season_stats = regular_season_stats[
    (regular_season_stats["season"] >= 2015) & (regular_season_stats["season"] <= 2025)
]

# Filter for Round 1, Game 2
round1_game2_df = playoff_games[
    (playoff_games["round"] == 1) & (playoff_games["gameNumber"] == 2)
].copy()

# Pull Game 1 IDs and Teams
game1_df = playoff_games[
    (playoff_games["round"] == 1) & (playoff_games["gameNumber"] == 1)
][["gameId", "season", "homeTeam", "awayTeam"]].copy()

# Merge Game 1 Home Team Statistics 
home_g1 = game1_df.merge(
    playoff_stats,
    left_on=["gameId", "homeTeam"],
    right_on=["gameId", "teamName"],
    how="left"
).rename(columns={
    "eFG%": "eFG%_home_r1g1",
    "TOV%": "TOV%_home_r1g1",
    "FT/FGA": "FT/FGA_home_r1g1",
    "ORB%": "ORB%_home_r1g1",
    "DRB%": "DRB%_home_r1g1",
    "homeScore": "homeScore_r1g1",
    "awayScore": "awayScore_r1g1"
}).drop(columns=["teamName"])

# Merge Game 1 Away Team Statistics
away_g1 = home_g1.merge(
    playoff_stats,
    left_on=["gameId", "awayTeam"],
    right_on=["gameId", "teamName"],
    how="left"
).rename(columns={
    "eFG%": "eFG%_away_r1g1",
    "TOV%": "TOV%_away_r1g1",
    "FT/FGA": "FT/FGA_away_r1g1",
    "ORB%": "ORB%_away_r1g1",
    "DRB%": "DRB%_away_r1g1"
}).drop(columns=["teamName"])

# Merge Game 1 Statistics into Game 2 Matchups
round1_game2_df = round1_game2_df.merge(
    away_g1[
        ["season", "homeTeam", "awayTeam",
         "eFG%_home_r1g1", "TOV%_home_r1g1", "FT/FGA_home_r1g1", "ORB%_home_r1g1", "DRB%_home_r1g1",
         "eFG%_away_r1g1", "TOV%_away_r1g1", "FT/FGA_away_r1g1", "ORB%_away_r1g1", "DRB%_away_r1g1",
         "homeScore_r1g1", "awayScore_r1g1"
        ]
    ],
    on=["season", "homeTeam", "awayTeam"],
    how="left"
)

# Add Momentum Feature - homeLostRound1Game1
round1_game2_df["homeLostRound1Game1"] = (
    round1_game2_df["homeScore_r1g1"] < round1_game2_df["awayScore_r1g1"]
).astype(int)

# Merge Regular Season Statistics (Home Team and Away Team)
round1_game2_df = round1_game2_df.merge(
    regular_season_stats,
    left_on=["season", "homeTeam"],
    right_on=["season", "teamName"],
    how="left"
).rename(columns={
    "eFG%": "eFG%_home_season",
    "TOV%": "TOV%_home_season",
    "FT/FGA": "FT/FGA_home_season",
    "ORB%": "ORB%_home_season",
    "DRB%": "DRB%_home_season"
}).drop(columns=["teamName"])

round1_game2_df = round1_game2_df.merge(
    regular_season_stats,
    left_on=["season", "awayTeam"],
    right_on=["season", "teamName"],
    how="left"
).rename(columns={
    "eFG%": "eFG%_away_season",
    "TOV%": "TOV%_away_season",
    "FT/FGA": "FT/FGA_away_season",
    "ORB%": "ORB%_away_season",
    "DRB%": "DRB%_away_season"
}).drop(columns=["teamName"])

# Select Final Columns
round1_game2_df["round"] = 1
round1_game2_df["gameNumber"] = 2

final_cols = [
    "gameId", "season", "homeTeam", "awayTeam", "homeSeed", "awaySeed", "homeCourt", "homeWin", "round", "gameNumber",

    # Regular Season Stats
    "eFG%_home_season", "TOV%_home_season", "FT/FGA_home_season", "ORB%_home_season", "DRB%_home_season",
    "eFG%_away_season", "TOV%_away_season", "FT/FGA_away_season", "ORB%_away_season", "DRB%_away_season",

    # Game 1 Stats
    "eFG%_home_r1g1", "TOV%_home_r1g1", "FT/FGA_home_r1g1", "ORB%_home_r1g1", "DRB%_home_r1g1",
    "eFG%_away_r1g1", "TOV%_away_r1g1", "FT/FGA_away_r1g1", "ORB%_away_r1g1", "DRB%_away_r1g1",

    # Momentum Features
    "homeScore_r1g1", "awayScore_r1g1", "homeLostRound1Game1"
]

round1_game2_df = round1_game2_df[final_cols].copy()

# Export the final dataset
round1_game2_df.to_csv("../data/processed/modeling_round1_game2.csv", index=False)


#### Build Modeling Dataset - Round 1 Games 3 to 7

This section constructs the modeling dataset used to predict outcomes of **Round 1 Games 3–7** in the NBA Playoffs. It incorporates updated playoff performance from Games 1 and 2, momentum, rolling averages from earlier games in the same series, as well as regular season team statistics.

Key idea: We use game-level rolling averages and evolving series context (e.g., series score, home-court shifts) to generate realistic features as the series progresses.

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_regular_season.csv`, `team_statistics_playoff_games.csv`, and `playoffs_games.csv` files.
2. **Filter for Round 1 Games 3 to 7**: Subset the playoff games dataframe to only include Round 1 matchups for Games 3 through 7.
3. **Generate Rolling Playoff Stats**: For each game, compute rolling averages for each team using prior games in the same series (e.g., eFG%, ORB%, DRB%, TOV%, FT/FGA, points scored).
4. **Track Series Score**: Create features representing current series status (e.g., `homeTeam_series_wins`, `awayTeam_series_wins`, `series_score_diff`).
5. **Identify Home-Court Shifts**: Determine home-court changes across the series and encode them 
6. **Attach Regular Season Stats**: Merge Four Factor statistics from the regular season for both home and away teams using `team_statistics_regular_season`.
7. **Attach Team Information**: Merge team seeds, season identifiers, and other contextual info from `playoff_games`.

In [9]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")

# Merge Playoffs Games Statistics with Playoff Games Team Statistics
playoff_stats = playoff_stats.merge(
    playoff_games[["gameId", "season", "round", "gameNumber", "homeTeam", "awayTeam"]],
    on="gameId", how="left"
)

# Add Team Score and Opponent Score Columns
playoff_stats["teamScore"] = playoff_stats.apply(
    lambda row: row["homeScore"] if row["teamName"] == row["homeTeam"] else row["awayScore"],
    axis=1
)
playoff_stats["opponentScore"] = playoff_stats.apply(
    lambda row: row["awayScore"] if row["teamName"] == row["homeTeam"] else row["homeScore"],
    axis=1
)

# Filter for Round 1, Games 3-7 
games3to7_df = playoff_games[
    (playoff_games["round"] == 1) & (playoff_games["gameNumber"] >= 3)
].copy()

# Initialize Dataset
final_rows = []

# Loop Through Games and Build Feature Rows
for idx, row in games3to7_df.iterrows():
    game_id = row["gameId"]
    season = row["season"]
    home_team = row["homeTeam"]
    away_team = row["awayTeam"]
    game_num = row["gameNumber"]
    round_num = row["round"]

    # Previous games in same series
    prev_games = playoff_games[
        (playoff_games["round"] == 1) &
        (playoff_games["season"] == season) &
        (playoff_games["gameNumber"] < game_num) &
        (
            ((playoff_games["homeTeam"] == home_team) & (playoff_games["awayTeam"] == away_team)) |
            ((playoff_games["homeTeam"] == away_team) & (playoff_games["awayTeam"] == home_team))
        )
    ]["gameId"].tolist()

    prior_stats = playoff_stats[
        (playoff_stats["gameId"].isin(prev_games)) &
        (playoff_stats["teamName"].isin([home_team, away_team]))
    ].copy()

    team_rolling = prior_stats.groupby("teamName")[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean().reset_index()
    team_rolling = team_rolling.rename(columns={
        "eFG%": "eFG%_roll", "TOV%": "TOV%_roll", "FT/FGA": "FT/FGA_roll",
        "ORB%": "ORB%_roll", "DRB%": "DRB%_roll"
    })

    home_season = regular_season_stats[
        (regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == home_team)
    ]
    away_season = regular_season_stats[
        (regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == away_team)
    ]

    if home_season.empty or away_season.empty:
        continue

    home_row = team_rolling[team_rolling["teamName"] == home_team].merge(home_season, on="teamName")
    away_row = team_rolling[team_rolling["teamName"] == away_team].merge(away_season, on="teamName")

    if home_row.empty or away_row.empty:
        continue

    home_wins = prior_stats[
        (prior_stats["teamName"] == home_team) & (prior_stats["teamScore"] > prior_stats["opponentScore"])
    ].shape[0]

    away_wins = prior_stats[
        (prior_stats["teamName"] == away_team) & (prior_stats["teamScore"] > prior_stats["opponentScore"])
    ].shape[0]

# Append All Features for Each Game
    final_rows.append({
        "gameId": game_id,
        "season": season,
        "homeTeam": home_team,
        "awayTeam": away_team,
        "homeSeed": row["homeSeed"],
        "awaySeed": row["awaySeed"],
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "gameNumber": game_num,
        "round": round_num,

        # Rolling Stats
        "eFG%_home_r1_roll": home_row["eFG%_roll"].values[0],
        "TOV%_home_r1_roll": home_row["TOV%_roll"].values[0],
        "FT/FGA_home_r1_roll": home_row["FT/FGA_roll"].values[0],
        "ORB%_home_r1_roll": home_row["ORB%_roll"].values[0],
        "DRB%_home_r1_roll": home_row["DRB%_roll"].values[0],

        "eFG%_away_r1_roll": away_row["eFG%_roll"].values[0],
        "TOV%_away_r1_roll": away_row["TOV%_roll"].values[0],
        "FT/FGA_away_r1_roll": away_row["FT/FGA_roll"].values[0],
        "ORB%_away_r1_roll": away_row["ORB%_roll"].values[0],
        "DRB%_away_r1_roll": away_row["DRB%_roll"].values[0],

        # Regular Season Stats
        "eFG%_home_season": home_row["eFG%"].values[0],
        "TOV%_home_season": home_row["TOV%"].values[0],
        "FT/FGA_home_season": home_row["FT/FGA"].values[0],
        "ORB%_home_season": home_row["ORB%"].values[0],
        "DRB%_home_season": home_row["DRB%"].values[0],

        "eFG%_away_season": away_row["eFG%"].values[0],
        "TOV%_away_season": away_row["TOV%"].values[0],
        "FT/FGA_away_season": away_row["FT/FGA"].values[0],
        "ORB%_away_season": away_row["ORB%"].values[0],
        "DRB%_away_season": away_row["DRB%"].values[0],

        # Series Score
        "homeWins": home_wins,
        "awayWins": away_wins
    })

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round1_games3to7.csv", index=False)

#### Build Modeling Dataset - Round 2 Game 1

This section builds the modeling dataset to predict **Round 2 Game 1** outcomes in the NBA Playoffs. It uses playoff performance from Round 1, regular season statistics, rest/fatigue metrics, and series context.

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_regular_season.csv`, `team_statistics_playoff_games.csv`, and `playoffs_games.csv` files.
2. **Filter for Round 2 Games 1**: Keep only Round 2 Game 1 matchups.
3. **Attach Round 1 Performance**: Add aggregated team statistics from Round 1 (e.g., eFG%, ORB%, TOV%, FT/FGA, DRB%).
4. **Attach Regular Season Statistics**: Create features representing current series status (e.g., `homeTeam_series_wins`, `awayTeam_series_wins`, `series_score_diff`).
5. **Identify Home-Court Shifts**: Determine home-court changes across the series and encode them 
6. **Attach Regular Season Stats**:  Merge Four Factor statistics from the regular season for both teams.
7. **Calculate Rest Advantage**: Compute days of rest between Round 1 Game 7 and Round 2 Game 1.

In [12]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")

# Filter Round 2 Game 1 Matchups
round2_game1 = playoff_games[(playoff_games["round"] == 2) & (playoff_games["gameNumber"] == 1)].copy()

# Get all Round 1 games for each team
round1_meta = playoff_games[playoff_games["round"] == 1][["gameId", "season", "round", "gameNumber", "homeTeam", "awayTeam"]].copy()

round1_home = round1_meta[["gameId", "homeTeam"]].rename(columns={"homeTeam": "teamName"})
round1_away = round1_meta[["gameId", "awayTeam"]].rename(columns={"awayTeam": "teamName"})
round1_long = pd.concat([round1_home, round1_away], axis=0).drop_duplicates()

round1_long = round1_long.merge(playoff_games[["gameId", "season"]], on="gameId", how="left")

# Merge team statistics with Round 1 games
round1_stats = playoff_stats.merge(round1_long, on=["gameId", "teamName"], how="inner")

# Ensure 'season' exists in round1_stats
if "season" not in round1_stats.columns:
    round1_stats = round1_stats.merge(round1_long, on=["gameId", "teamName"], how="left")

# Compute average Round 1 stats per team
round1_avg_stats = round1_stats.groupby(["season", "teamName"])[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean().reset_index()

# Merge with Round 2 Game 1 Teams
final_rows = []

for _, row in round2_game1.iterrows():
    season = row["season"]
    game_id = row["gameId"]
    home_team = row["homeTeam"]
    away_team = row["awayTeam"]
    round_num = row["round"]
    game_num = row["gameNumber"]

    # Get Round 1 average stats
    home_r1 = round1_avg_stats[(round1_avg_stats["season"] == season) & (round1_avg_stats["teamName"] == home_team)]
    away_r1 = round1_avg_stats[(round1_avg_stats["season"] == season) & (round1_avg_stats["teamName"] == away_team)]

    # Get regular season stats
    home_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == home_team)]
    away_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == away_team)]

    if home_r1.empty or away_r1.empty or home_reg.empty or away_reg.empty:
        continue

    # Compute series lengths
    home_games = round1_long[(round1_long["season"] == season) & (round1_long["teamName"] == home_team)]
    away_games = round1_long[(round1_long["season"] == season) & (round1_long["teamName"] == away_team)]
    home_series_len = home_games["gameId"].nunique()
    away_series_len = away_games["gameId"].nunique()
    rest_diff = away_series_len - home_series_len

    final_rows.append({
        "gameId": game_id,
        "season": season,
        "homeTeam": home_team,
        "awayTeam": away_team,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "round": round_num,
        "gameNumber": game_num,

        # Round 1 averages
        "eFG%_home_r1_avg": home_r1["eFG%"].values[0],
        "TOV%_home_r1_avg": home_r1["TOV%"].values[0],
        "FT/FGA_home_r1_avg": home_r1["FT/FGA"].values[0],
        "ORB%_home_r1_avg": home_r1["ORB%"].values[0],
        "DRB%_home_r1_avg": home_r1["DRB%"].values[0],

        "eFG%_away_r1_avg": away_r1["eFG%"].values[0],
        "TOV%_away_r1_avg": away_r1["TOV%"].values[0],
        "FT/FGA_away_r1_avg": away_r1["FT/FGA"].values[0],
        "ORB%_away_r1_avg": away_r1["ORB%"].values[0],
        "DRB%_away_r1_avg": away_r1["DRB%"].values[0],

        # Regular season
        "eFG%_home_season": home_reg["eFG%"].values[0],
        "TOV%_home_season": home_reg["TOV%"].values[0],
        "FT/FGA_home_season": home_reg["FT/FGA"].values[0],
        "ORB%_home_season": home_reg["ORB%"].values[0],
        "DRB%_home_season": home_reg["DRB%"].values[0],

        "eFG%_away_season": away_reg["eFG%"].values[0],
        "TOV%_away_season": away_reg["TOV%"].values[0],
        "FT/FGA_away_season": away_reg["FT/FGA"].values[0],
        "ORB%_away_season": away_reg["ORB%"].values[0],
        "DRB%_away_season": away_reg["DRB%"].values[0],

        # Rest/fatigue
        "homeSeriesLength": home_series_len,
        "awaySeriesLength": away_series_len,
        "restAdvantage": rest_diff
    })

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round2_game1.csv", index=False)

#### Build Modeling Dataset - Round 2 Games 2–7

This section builds the modeling dataset to predict **Round 2 Game 2–7** outcomes in the NBA Playoffs. It uses rolling Round 2 playoff statistics, Round 1 performance, regular season stats, rest/fatigue metrics, and evolving series context.

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_regular_season.csv`, `team_statistics_playoff_games.csv`, and `playoffs_games.csv` files.
2. **Filter for Round 2 Games 2–7**: Keep only Round 2 Game 2 through Game 7 matchups.
3. **Generate Rolling Round 2 Stats**: For each team, calculate rolling averages from earlier games in Round 2 (e.g., eFG%, ORB%, TOV%, FT/FGA, DRB%, points).
4. **Attach Series Status**: Add contextual features like `homeTeam_series_wins`, `awayTeam_series_wins`, and `series_score_diff`.
5. **Attach Round 1 Performance**: Merge in aggregated team statistics from Round 1.
6. **Attach Regular Season Stats**: Merge Four Factor statistics from the regular season for both teams.
7. **Identify Home-Court Shifts**: Determine if home-court advantage has shifted mid-series and encode this change.
8. **Calculate Rest Advantage**: Compute days of rest since each team's previous Round 2 game.

In [15]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
# Regular season statistics can still be included but weighted less because playoff performance is more predictive.
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")

# Filter for 2015–2025 seasons
playoff_games = playoff_games[(playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)]
playoff_stats = playoff_stats[(playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)]

# Filter Round 2 Games 2–7
round2_g2to7 = playoff_games[(playoff_games["round"] == 2) & (playoff_games["gameNumber"] > 1)].copy()

# Compute Round 1 Average Stats
r1_games = playoff_games[playoff_games["round"] == 1][["gameId", "homeTeam", "awayTeam", "season"]].copy()
r1_home = r1_games[["gameId", "homeTeam", "season"]].rename(columns={"homeTeam": "teamName"})
r1_away = r1_games[["gameId", "awayTeam", "season"]].rename(columns={"awayTeam": "teamName"})
r1_long = pd.concat([r1_home, r1_away], ignore_index=True)

r1_stats = r1_long.merge(playoff_stats, on=["gameId", "teamName"], how="inner")
if "season" not in r1_stats.columns:
    r1_stats = r1_stats.merge(r1_long, on=["gameId", "teamName"], how="left")
r1_stats["season"] = r1_stats["season"].astype(int)
r1_avg = r1_stats.groupby(["season", "teamName"])[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean().reset_index()

# Build rolling stats for Round 2 Games 2–7
final_rows = []

for _, row in round2_g2to7.iterrows():
    game_id = row["gameId"]
    season = row["season"]
    game_num = row["gameNumber"]
    home_team = row["homeTeam"]
    away_team = row["awayTeam"]
    round_num = row["round"]

    # Prior games between the same teams in Round 2
    prior_games = playoff_games[
        (playoff_games["season"] == season) &
        (playoff_games["round"] == 2) &
        (playoff_games["gameNumber"] < game_num) &
        (
            ((playoff_games["homeTeam"] == home_team) & (playoff_games["awayTeam"] == away_team)) |
            ((playoff_games["homeTeam"] == away_team) & (playoff_games["awayTeam"] == home_team))
        )
    ]["gameId"].tolist()

    # Stats from those games
    prior_stats = playoff_stats[
        (playoff_stats["season"] == season) &
        (playoff_stats["gameId"].isin(prior_games)) &
        (playoff_stats["teamName"].isin([home_team, away_team]))
    ]

    if prior_stats.empty:
        continue

    # Rolling stats
    rolling = prior_stats.groupby("teamName")[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean().reset_index()

    # Series score
    prior_stats = prior_stats.merge(playoff_games[["gameId", "homeTeam", "awayTeam", "homeWin"]], on="gameId", how="left")
    prior_stats["teamWin"] = prior_stats.apply(
        lambda x: (x["teamName"] == x["homeTeam"] and x["homeWin"] == 1) or
                  (x["teamName"] == x["awayTeam"] and x["homeWin"] == 0),
        axis=1
    )
    home_wins = prior_stats[(prior_stats["teamName"] == home_team) & (prior_stats["teamWin"])].shape[0]
    away_wins = prior_stats[(prior_stats["teamName"] == away_team) & (prior_stats["teamWin"])].shape[0]

    # R1 average
    home_r1 = r1_avg[(r1_avg["season"] == season) & (r1_avg["teamName"] == home_team)]
    away_r1 = r1_avg[(r1_avg["season"] == season) & (r1_avg["teamName"] == away_team)]

    # Regular season
    home_reg = regular_season_stats[
        (regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == home_team)
    ]
    away_reg = regular_season_stats[
        (regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == away_team)
    ]

    # Build final row
    row_data = {
        "gameId": game_id,
        "season": season,
        "gameNumber": game_num,
        "homeTeam": home_team,
        "awayTeam": away_team,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "homeWins": home_wins,
        "awayWins": away_wins,
        "round": round_num,
    }

    for team, prefix in zip([home_team, away_team], ["home", "away"]):
        team_stats = rolling[rolling["teamName"] == team]
        if not team_stats.empty:
            for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
                row_data[f"{col}_{prefix}_r2_roll"] = team_stats[col].values[0]

        r1_stats_team = home_r1 if prefix == "home" else away_r1
        if not r1_stats_team.empty:
            for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
                row_data[f"{col}_{prefix}_r1_avg"] = r1_stats_team[col].values[0]

        reg_stats_team = home_reg if prefix == "home" else away_reg
        if not reg_stats_team.empty:
            for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
                row_data[f"{col}_{prefix}_season"] = reg_stats_team[col].values[0]

    final_rows.append(row_data)

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round2_game2to7.csv", index=False)

#### Build Modeling Dataset - Round 3 Game 1
This section builds the modeling dataset to predict **Round 3 Game 1** outcomes in the NBA Playoffs. It uses cumulative playoff performance from Rounds 1 and 2, regular season stats, rest/fatigue metrics, and updated series context.

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_regular_season.csv`, `team_statistics_playoff_games.csv`, and `playoffs_games.csv` files.  
2. **Filter for Round 3 Game 1**: Keep only Round 3 Game 1 matchups.  
3. **Aggregate Rounds 1 & 2 Playoff Stats**: For each team, calculate average performance from Rounds 1 and 2 combined (e.g., eFG%, ORB%, TOV%, FT/FGA, DRB%, points).  
4. **Attach Regular Season Stats**: Merge Four Factor statistics from the regular season for both home and away teams.  
5. **Attach Series Status**: Add contextual features such as `homeTeam_series_wins`, `awayTeam_series_wins`, and `series_score_diff` from previous rounds.  
6. **Calculate Rest Advantage**: Compute days of rest since each team’s most recent playoff game (end of Round 2).  

In [18]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")

# Filter Round 3 Game 1
playoff_games = playoff_games[(playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)]
playoff_stats = playoff_stats[(playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)]
regular_season_stats = regular_season_stats[(regular_season_stats["season"] >= 2015) & (regular_season_stats["season"] <= 2025)]

round3_g1 = playoff_games[(playoff_games["round"] == 3) & (playoff_games["gameNumber"] == 1)].copy()
round2_long = playoff_games[(playoff_games["round"] == 2)].copy()

# Aggregate playoff stats Rounds 1 and 2
prior_playoffs = playoff_games[playoff_games["round"] < 3]
prior_game_ids = prior_playoffs["gameId"].unique()

team_playoff_stats = playoff_stats[playoff_stats["gameId"].isin(prior_game_ids)].copy()
team_playoff_avg = team_playoff_stats.groupby(["season", "teamName"])[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean().reset_index()

# Compute rest days and build final dataset
final_rows = []

for _, row in round3_g1.iterrows():
    game_id = row["gameId"]
    season = row["season"]
    home = row["homeTeam"]
    away = row["awayTeam"]
    date = row["gameDate"]
    round_num = row["round"]
    game_num = row["gameNumber"]

    def get_last_game(team):
        games = playoff_games[(playoff_games["season"] == season) & 
                              ((playoff_games["homeTeam"] == team) | (playoff_games["awayTeam"] == team)) &
                              (playoff_games["gameDate"] < date)]
        if games.empty:
            return None
        return pd.to_datetime(games["gameDate"].max())

    home_last = get_last_game(home)
    away_last = get_last_game(away)

    # Compute Round 2 series lengthes (how many games each team played in Round 2)
    home_games = round2_long[(round2_long["season"] == season) & 
                              ((round2_long["homeTeam"] == home) | (round2_long["awayTeam"] == home))]
    away_games = round2_long[(round2_long["season"] == season) & 
                              ((round2_long["homeTeam"] == away) | (round2_long["awayTeam"] == away))]
    home_series_len = home_games["gameId"].nunique()
    away_series_len = away_games["gameId"].nunique()

    # Compute Round 2 rest advantage
    home_rest = (pd.to_datetime(date) - home_last).days if home_last else None
    away_rest = (pd.to_datetime(date) - away_last).days if away_last else None
    rest_advantage = home_rest - away_rest if (home_rest is not None and away_rest is not None) else None

    # Pull team stats
    home_playoff = team_playoff_avg[(team_playoff_avg["season"] == season) & (team_playoff_avg["teamName"] == home)]
    away_playoff = team_playoff_avg[(team_playoff_avg["season"] == season) & (team_playoff_avg["teamName"] == away)]

    home_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == home)]
    away_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == away)]

    row_data = {
        "gameId": game_id,
        "season": season,
        "homeTeam": home,
        "awayTeam": away,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "restAdvantage": rest_advantage,
        "homeSeriesLength": home_series_len,
        "awaySeriesLength": away_series_len,
        "round": round_num,
        "gameNumber": game_num,
    }

    for team, prefix, playoff_df, reg_df in zip(
        [home, away],
        ["home", "away"],
        [home_playoff, away_playoff],
        [home_reg, away_reg]
    ):
        if not playoff_df.empty:
            for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
                row_data[f"{col}_{prefix}_r1r2_avg"] = playoff_df[col].values[0]

        if not reg_df.empty:
            for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
                row_data[f"{col}_{prefix}_season"] = reg_df[col].values[0]

    final_rows.append(row_data)

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round3_game1.csv", index=False)

#### Build Modeling Dataset – Round 3 Games 2–7

This section builds the modeling dataset to predict **Round 3 Games 2–7** outcomes in the NBA Playoffs. It uses rolling Round 3 playoff performance and current series score..

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_playoff_games.csv` and `playoffs_games.csv` files.
2. **Filter for Round 3 Games 2–7**: Keep only Round 3 matchups where `gameNumber > 1` (seasons 2015–2025).
3. **Collect Prior Series Games**: For each target game, identify earlier Round 3 games between the same teams in the same season.
4. **Compute Rolling Averages**: From prior games, calculate mean Four Factors for each team:
   `eFG%`, `TOV%`, `FT/FGA`, `ORB%`, `DRB%`  
   → produce `*_home_r3_roll` and `*_away_r3_roll`.
5. **Attach Series Status**: From prior games, derive `homeWins` and `awayWins` to capture current series score context.
6. **Assemble Final Dataset**: Keep identifiers and labels (`gameId`, `season`, `round`, `gameNumber`, `homeTeam`, `awayTeam`, `homeCourt`, `homeWin`) plus rolling features and series status, then save to `modeling_round3_games2to7.csv`.


In [21]:
# Import libraries
import pandas as pd

# Load Playoff Games Statistics and Playoff Games Team Statistics.
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")

# Filter Round 3 Games 2 to 7
playoff_games = playoff_games[(playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)]
playoff_stats = playoff_stats[(playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)]

round3 = playoff_games[(playoff_games["round"] == 3) & (playoff_games["gameNumber"] > 1)].copy()

# Build rolling feature dataset
final_rows = []

for _, row in round3.iterrows():
    gid = row["gameId"]
    season = row["season"]
    round_num = row["round"]
    gnum = row["gameNumber"]
    date = row["gameDate"]
    home = row["homeTeam"]
    away = row["awayTeam"]
    

    prior_games = playoff_games[(playoff_games["season"] == season) &
                       (playoff_games["round"] == round_num) &
                       (playoff_games["gameNumber"] < gnum)].copy()

    # Filter Playoffs Games Team Statistics for prior games
    prior_game_ids = prior_games["gameId"].unique()
    prior_stats = playoff_stats[playoff_stats["gameId"].isin(prior_game_ids)].copy()

    # Filter to each team's stats
    home_stats = prior_stats[prior_stats["teamName"] == home]
    away_stats = prior_stats[prior_stats["teamName"] == away]

    # Skip if missing prior stats
    if home_stats.empty or away_stats.empty:
        continue

    home_avg = home_stats[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()
    away_avg = away_stats[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()

    # Series score
    prior_home_wins = prior_games[prior_games["homeTeam"] == home]["homeWin"].sum()
    prior_away_wins = prior_games[prior_games["homeTeam"] == away]["homeWin"].apply(lambda x: not x).sum()

    row_data = {
        "gameId": gid,
        "season": season,
        "gameNumber": gnum,
        "homeTeam": home,
        "awayTeam": away,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "homeWins": prior_home_wins,
        "awayWins": prior_away_wins,
        "round": round_num
    }

    for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
        row_data[f"{col}_home_r3_roll"] = home_avg[col]
        row_data[f"{col}_away_r3_roll"] = away_avg[col]

    final_rows.append(row_data)

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round3_games2to7.csv", index=False)

#### Build Modeling Dataset – Round 4 Game 1

This section builds the modeling dataset to predict **Round 4 Game 1** outcomes in the NBA Playoffs (NBA Finals). It uses cumulative playoff statistics from all prior rounds, regular season performance, rest/fatigue metrics, and team seeding information.

##### Overview of Steps:

1. **Import Data** – Read `team_statistics_regular_season.csv`, `team_statistics_playoff_games.csv`, and `playoffs_games.csv` files.
2. **Filter for Round 4 Game 1** – Keep only NBA Finals Game 1 matchups.
3. **Aggregate Cumulative Playoff Stats** – For each team, calculate averages across all prior playoff games (Rounds 1–3) for key metrics (e.g., eFG%, ORB%, TOV%, FT/FGA, DRB%, points).
4. **Attach Regular Season Stats** – Merge Four Factor statistics from the regular season for both teams.
5. **Include Seeding & Home-Court** – Merge `homeSeed`, `awaySeed`, and `homeCourt` indicators to capture matchup advantages.
6. **Calculate Rest Advantage** – Determine days of rest for each team since their last playoff game before the Finals.

In [24]:
# Import libraries
import pandas as pd

# Load Regular Season Games Team Statistics, Playoff Games Statistics, and Playoff Games Team Statistics.
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")
regular_season_stats = pd.read_csv("../data/processed/team_statistics_regular_season.csv")

# Filter Round 4, Game 1
playoff_games = playoff_games[(playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)]
playoff_stats = playoff_stats[(playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)]
regular_season_stats = regular_season_stats[(regular_season_stats["season"] >= 2015) & (regular_season_stats["season"] <= 2025)]

round4_g1 = playoff_games[(playoff_games["round"] == 4) & (playoff_games["gameNumber"] == 1)].copy()

# Build feature dataset
final_rows = []

for _, row in round4_g1.iterrows():
    gid = row["gameId"]
    season = row["season"]
    date = row["gameDate"]
    home = row["homeTeam"]
    away = row["awayTeam"]
    round_num = row["round"]
    game_num = row["gameNumber"]


    # All games prior to Round 4
    prior_games = playoff_games[(playoff_games["season"] == season) & (playoff_games["round"] < 4)].copy()
    prior_game_ids = prior_games["gameId"].unique()
    prior_stats = playoff_stats[playoff_stats["gameId"].isin(prior_game_ids)].copy()

    home_stats = prior_stats[prior_stats["teamName"] == home]
    away_stats = prior_stats[prior_stats["teamName"] == away]

    # Skip if missing prior playoff stats
    if home_stats.empty or away_stats.empty:
        continue

    home_avg = home_stats[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()
    away_avg = away_stats[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()

    # Regular season stats
    home_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == home)]
    away_reg = regular_season_stats[(regular_season_stats["season"] == season) & (regular_season_stats["teamName"] == away)]

    # Skip if missing regular season
    if home_reg.empty or away_reg.empty:
        continue

    home_reg_vals = home_reg[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].values[0]
    away_reg_vals = away_reg[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].values[0]

    # Rest advantage (home - away)
    home_rest = (pd.to_datetime(date) - pd.to_datetime(
        playoff_games[((playoff_games["homeTeam"] == home) | (playoff_games["awayTeam"] == home)) &
             (playoff_games["gameDate"] < date)]["gameDate"].max()
    )).days

    away_rest = (pd.to_datetime(date) - pd.to_datetime(
        playoff_games[((playoff_games["homeTeam"] == away) | (playoff_games["awayTeam"] == away)) &
             (playoff_games["gameDate"] < date)]["gameDate"].max()
    )).days

    rest_adv = home_rest - away_rest
    # Compute Round 3 series lengths (number of games each team played in Round 3)
    round3_games = playoff_games[
        (playoff_games["season"] == season) &
        (playoff_games["round"] == 3) &
        (
            (playoff_games["homeTeam"] == home) | (playoff_games["awayTeam"] == home) |
            (playoff_games["homeTeam"] == away) | (playoff_games["awayTeam"] == away)
        )
    ]

    home_series_len = round3_games[
        (round3_games["homeTeam"] == home) | (round3_games["awayTeam"] == home)
    ]["gameId"].nunique()

    away_series_len = round3_games[
        (round3_games["homeTeam"] == away) | (round3_games["awayTeam"] == away)
    ]["gameId"].nunique()


    row_data = {
        "gameId": gid,
        "season": season,
        "gameNumber": 1,
        "homeTeam": home,
        "awayTeam": away,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "restAdvantage": rest_adv,
        "homeSeriesLength": home_series_len,
        "awaySeriesLength": away_series_len,
        "round": round_num,
        "gameNumber": game_num

    }

    for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
        row_data[f"{col}_home_r1r2r3_avg"] = home_avg[col]
        row_data[f"{col}_away_r1r2r3_avg"] = away_avg[col]
        row_data[f"{col}_home_season"] = home_reg_vals[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"].index(col)]
        row_data[f"{col}_away_season"] = away_reg_vals[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"].index(col)]

    final_rows.append(row_data)

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round4_game1.csv", index=False)

#### Build Modeling Dataset - Round 4 Games 2–7

This section builds the modeling dataset to predict **Round 4 Games 2–7** outcomes (NBA Finals).  
It uses rolling Round 4 playoff performance and current series score context.

##### Overview of Steps:

1. **Import Data**: Read `team_statistics_playoff_games.csv` and `playoffs_games.csv`.
2. **Filter for Round 4 Games 2–7**: Keep seasons 2015–2025 and subset to Round 4 with `gameNumber > 1`.
3. **Generate Rolling Round 4 Stats**: For each matchup, compute team rolling averages from **prior games in the same Round 4 series**  
   (e.g., `eFG%`, `TOV%`, `FT/FGA`, `ORB%`, `DRB%`). Suffix features as `_home_r4_roll` and `_away_r4_roll`.
4. **Track Series Status**: Add series context before the current game: `homeWins` and `awayWins` (wins to-date for each side).
5. **Finalize Columns & Export**: Keep identifiers (`gameId`, `season`, `homeTeam`, `awayTeam`, `homeCourt`, `homeWin`, `round`, `gameNumber`),  
   series status, and rolling features. Save to `modeling_round4_games2to7.csv`.


In [27]:
# Import libraries
import pandas as pd

# Load Playoff Games Statistics and Playoff Games Team Statistics.
playoff_games = pd.read_csv("../data/processed/playoffs_games.csv")
playoff_stats = pd.read_csv("../data/processed/team_statistics_playoff_games.csv")

# Filter for Round 4 Games 2 to 7
playoff_games = playoff_games[(playoff_games["season"] >= 2015) & (playoff_games["season"] <= 2025)]
playoff_stats = playoff_stats[(playoff_stats["season"] >= 2015) & (playoff_stats["season"] <= 2025)]
round4_games = playoff_games[(playoff_games["round"] == 4) & (playoff_games["gameNumber"] > 1)].copy()

# Build rolling feature dataset
final_rows = []

for _, row in round4_games.iterrows():
    gid = row["gameId"]
    season = row["season"]
    date = row["gameDate"]
    round_num = row["round"]
    game_num = row["gameNumber"]
    home = row["homeTeam"]
    away = row["awayTeam"]

    # Get all prior games in Round 4 before this game
    prior_games = playoff_games[(playoff_games["season"] == season) & (playoff_games["round"] == round_num) & (playoff_games["gameNumber"] < game_num)]
    prior_game_ids = prior_games["gameId"].tolist()

    prior_ff = playoff_stats[playoff_stats["gameId"].isin(prior_game_ids)]

    home_ff = prior_ff[prior_ff["teamName"] == home]
    away_ff = prior_ff[prior_ff["teamName"] == away]

    if home_ff.empty or away_ff.empty:
        continue

    home_avg = home_ff[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()
    away_avg = away_ff[["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]].mean()

    # Series score (number of wins)
    prior_home_wins = prior_games[prior_games["homeTeam"] == home]["homeWin"].sum()
    prior_away_wins = prior_games[prior_games["homeTeam"] == away]["homeWin"].apply(lambda x: not x).sum()

    row_data = {
        "gameId": gid,
        "season": season,
        "gameNumber": game_num,
        "homeTeam": home,
        "awayTeam": away,
        "homeCourt": row["homeCourt"],
        "homeWin": row["homeWin"],
        "homeWins": prior_home_wins,
        "awayWins": prior_away_wins,
        "round": round_num
    }

    for col in ["eFG%", "TOV%", "FT/FGA", "ORB%", "DRB%"]:
        row_data[f"{col}_home_r4_roll"] = home_avg[col]
        row_data[f"{col}_away_r4_roll"] = away_avg[col]

    final_rows.append(row_data)

# Export the final dataset
final_df = pd.DataFrame(final_rows)
final_df.to_csv("../data/processed/modeling_round4_games2to7.csv", index=False)