## ðŸ”§ Feature Engineering for NBA Awards Prediction

This notebook focuses on transforming raw and advanced player statistics
into interpretable, season-aware features suitable for award prediction.

Given that NBA awards are comparative by nature, most features are:
- normalized within each season,
- expressed as ranks or relative scores,
- designed to capture both volume and impact.

No model is trained in this notebook.


In [6]:
import pandas as pd
import numpy as np

from pathlib import Path

from pathlib import Path

# Trouver la racine du projet (nba-awards-predictor)
PROJECT_ROOT = Path().resolve()
while not (PROJECT_ROOT / "data").exists():
    PROJECT_ROOT = PROJECT_ROOT.parent

DATA_PATH = PROJECT_ROOT / "data/processed/players/final/all_years_enriched.parquet"
assert DATA_PATH.exists(), f"File not found: {DATA_PATH}"

df = pd.read_parquet(DATA_PATH)

In [None]:
def zscore_by_season(df, col, season_col="season"):
    return df.groupby(season_col)[col].transform(
        lambda x: (x - x.mean()) / x.std()
    )

for col in ["PTS", "adv_BPM", "adv_VORP", "adv_WS"]:
    df[f"{col}_z"] = zscore_by_season(df, col)
    
print("Z-scores computed for PTS, adv_BPM, adv_VORP, adv_WS.")


Z-scores computed for PTS, adv_BPM, adv_VORP, adv_WS.


0   -0.122232
1    0.394422
2   -0.155565
3    0.377756
4   -0.155565
Name: PTS_z, dtype: float64

In [8]:
def rank_by_season(df, col, asc=False):
    return df.groupby("season")[col].rank(ascending=asc, method="min")

df["PTS_rank"]      = rank_by_season(df, "PTS", asc=False)
df["BPM_rank"]      = rank_by_season(df, "adv_BPM", asc=False)
df["VORP_rank"]     = rank_by_season(df, "adv_VORP", asc=False)
df["DWS_rank"]      = rank_by_season(df, "adv_DWS", asc=False)


In [9]:
df["MVP_score"] = (
    -df["BPM_rank"]
    -df["VORP_rank"]
    -df["PTS_rank"]
    -df["WS_rank"]
)


KeyError: 'WS_rank'

In [None]:
df["DPOY_score"] = (
    -df["DWS_rank"]
    -df["BLK_rank"]
    -df["STL_rank"]
)


In [None]:
df["PTS_per_36"] = df["PTS"] / df["MP"] * 36
df["Impact_per_min"] = df["adv_BPM"] / df["MP"]
df["Usage_x_Minutes"] = df["adv_USG%"] * df["MP"]


In [None]:
df["rookie_volume"] = df["PTS"] * df["is_rookie"]
df["rookie_impact"] = df["adv_BPM"] * df["is_rookie"]


In [None]:
df["is_mvp_winner"]  = (df["mvp_rank"] == 1).astype(int)
df["is_dpoy_winner"] = (df["dpoy_rank"] == 1).astype(int)
df["is_roy_winner"]  = (df["roy_rank"] == 1).astype(int)
