# CS 171 Final Project — Awards Data Pre-processing Notebook

We use ONLY per-game player statistics for the 2025–26 NBA season to
build a dataset for:

- MVP / DPOY / Sixth Man style awards
- All-NBA teams (1st, 2nd, 3rd)
- All-Defensive teams
- All-Star selections


Input:

- `per_game_player_25_26_with_teams.csv` (one row per player per team)

Output:

- `players_25_26_awards_clean.csv` (rotation players, ready for modeling)


In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_columns", 100)

DATA_DIR = "."
PLAYER_PATH = f"{DATA_DIR}/per_game_player_25_26_with_teams.csv"

players_raw = pd.read_csv(PLAYER_PATH)

print("Raw shape:", players_raw.shape)
players_raw.head()


Raw shape: (487, 32)


Unnamed: 0,Rk,Player,Age,Pos,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Awards,Player-additional,Tm
0,1.0,Jalen Johnson,24.0,SF,20,20,34.9,8.4,15.5,0.539,1.7,4.0,0.413,6.7,11.5,0.583,0.592,4.9,6.2,0.789,1.3,8.8,10.0,7.3,1.6,0.4,3.3,2.1,23.2,,johnsja05,ATL
1,2.0,Dyson Daniels,22.0,SG,22,22,33.8,4.6,9.5,0.488,0.2,1.4,0.161,4.4,8.1,0.545,0.5,0.8,1.4,0.548,2.2,4.3,6.5,6.0,2.3,0.4,2.3,2.5,10.3,,daniedy01,ATL
2,3.0,Nickeil Alexander-Walker,27.0,SG,20,17,32.8,7.0,14.9,0.466,2.8,7.2,0.392,4.2,7.8,0.535,0.56,3.7,4.3,0.859,0.7,2.7,3.4,3.6,1.0,0.9,2.3,2.1,20.4,,alexani01,ATL
3,4.0,Onyeka Okongwu,25.0,C,21,10,30.5,6.2,12.0,0.514,1.9,5.1,0.37,4.3,6.9,0.621,0.593,2.1,2.8,0.759,1.6,5.7,7.3,2.9,0.9,1.1,1.9,3.6,16.4,,okongon01,ATL
4,5.0,Trae Young,27.0,PG,5,5,27.8,5.2,14.0,0.371,1.0,5.2,0.192,4.2,8.8,0.477,0.407,6.4,7.8,0.821,0.0,2.0,2.0,7.8,0.8,0.2,2.0,1.8,17.8,,youngtr01,ATL


## Remove Non-player Rows and Missing Teams

The per-game file also contains:

- "Team Totals" summary rows
- Some rows without a team (`Tm` is NaN)

These rows are not individual players and cannot win awards, so we drop them.


In [8]:
players = players_raw.copy()

players = players[players["Player"] != "Team Totals"].copy()

players = players[~players["Tm"].isna()].copy()

print("After removing non-players / NaN teams:", players.shape)
players.head()


After removing non-players / NaN teams: (447, 32)


Unnamed: 0,Rk,Player,Age,Pos,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Awards,Player-additional,Tm
0,1.0,Jalen Johnson,24.0,SF,20,20,34.9,8.4,15.5,0.539,1.7,4.0,0.413,6.7,11.5,0.583,0.592,4.9,6.2,0.789,1.3,8.8,10.0,7.3,1.6,0.4,3.3,2.1,23.2,,johnsja05,ATL
1,2.0,Dyson Daniels,22.0,SG,22,22,33.8,4.6,9.5,0.488,0.2,1.4,0.161,4.4,8.1,0.545,0.5,0.8,1.4,0.548,2.2,4.3,6.5,6.0,2.3,0.4,2.3,2.5,10.3,,daniedy01,ATL
2,3.0,Nickeil Alexander-Walker,27.0,SG,20,17,32.8,7.0,14.9,0.466,2.8,7.2,0.392,4.2,7.8,0.535,0.56,3.7,4.3,0.859,0.7,2.7,3.4,3.6,1.0,0.9,2.3,2.1,20.4,,alexani01,ATL
3,4.0,Onyeka Okongwu,25.0,C,21,10,30.5,6.2,12.0,0.514,1.9,5.1,0.37,4.3,6.9,0.621,0.593,2.1,2.8,0.759,1.6,5.7,7.3,2.9,0.9,1.1,1.9,3.6,16.4,,okongon01,ATL
4,5.0,Trae Young,27.0,PG,5,5,27.8,5.2,14.0,0.371,1.0,5.2,0.192,4.2,8.8,0.477,0.407,6.4,7.8,0.821,0.0,2.0,2.0,7.8,0.8,0.2,2.0,1.8,17.8,,youngtr01,ATL


## Filter to Rotation Players Only

For awards and All-NBA / All-Star teams, we care about real rotation
players, not guys who played only a few garbage-time minutes.

We keep players who satisfy:

- `G >= 10` games played
- `MP >= 15` minutes per game


In [3]:
MIN_GAMES = 10
MIN_MINUTES = 15.0

rotation_players = players[
    (players["G"] >= MIN_GAMES) &
    (players["MP"] >= MIN_MINUTES)
].copy()

print("Rotation players only:", rotation_players.shape)
rotation_players[["Player","Tm","Pos","G","MP","PTS"]].head()


Rotation players only: (255, 32)


Unnamed: 0,Player,Tm,Pos,G,MP,PTS
0,Jalen Johnson,ATL,SF,20,34.9,23.2
1,Dyson Daniels,ATL,SG,22,33.8,10.3
2,Nickeil Alexander-Walker,ATL,SG,20,32.8,20.4
3,Onyeka Okongwu,ATL,C,21,30.5,16.4
6,Zaccharie Risacher,ATL,SF,19,24.7,11.4


## Simplify Positions for All-NBA / All-Star Teams

Original positions are like:

- PG, SG, SF, PF, C

For team construction we use three buckets:

- **G**: point guards and shooting guards (`PG`, `SG`)
- **F**: small forwards and power forwards (`SF`, `PF`)
- **C**: centers (`C`)

We add a column `Pos_simple` with values in {`G`, `F`, `C`}.


In [4]:
def simplify_pos(pos):
    if isinstance(pos, str):
        if "G" in pos:
            return "G"
        if "F" in pos:
            return "F"
        if "C" in pos:
            return "C"
    return "G"  # default fall-back

rotation_players["Pos_simple"] = rotation_players["Pos"].apply(simplify_pos)

rotation_players["Pos_simple"].value_counts()


Pos_simple
G    110
F     99
C     46
Name: count, dtype: int64

## Create Per-36-Minute Stats

Per-game numbers can favor players with huge minutes. To compare players
more fairly, we create **per-36-minute** versions of key box-score stats:

- `PTS36`, `TRB36`, `AST36`, `STL36`, `BLK36`, `TOV36`

Formula: `stat_per_game * 36 / MP`

We only do this for rotation players (MP ≥ 15, so no divide-by-zero).


In [5]:
stats_for_36 = ["PTS", "TRB", "AST", "STL", "BLK", "TOV"]

for stat in stats_for_36:
    new_col = stat + "36"
    rotation_players[new_col] = rotation_players[stat] * 36.0 / rotation_players["MP"]

rotation_players[[ "Player", "MP", "PTS", "PTS36", "TRB36", "AST36" ]].head()


Unnamed: 0,Player,MP,PTS,PTS36,TRB36,AST36
0,Jalen Johnson,34.9,23.2,23.931232,10.315186,7.530086
1,Dyson Daniels,33.8,10.3,10.970414,6.923077,6.390533
2,Nickeil Alexander-Walker,32.8,20.4,22.390244,3.731707,3.95122
3,Onyeka Okongwu,30.5,16.4,19.357377,8.616393,3.422951
6,Zaccharie Risacher,24.7,11.4,16.615385,3.789474,2.186235


## Define Offensive and Defensive Feature Groups

We will later build award scores and use clustering based on two feature
groups:

**Offensive features:**

- `PTS`, `PTS36`, `AST`, `AST36`
- `3P`, `3PA`, `3P%`
- `FT`, `FTA`, `FT%`
- `MP` (more minutes means more impact)

**Defensive features:**

- `TRB`, `TRB36`, `STL`, `STL36`, `BLK`, `BLK36`
- `DRB`, `ORB` (if present)
- `PF` (fouls — we may use inversely later)

Here we just store these lists in the notebook so the model notebook can
reuse them.


In [9]:
offense_features = [
    "PTS", "PTS36",
    "AST", "AST36",
    "3P", "3PA", "3P%",
    "FT", "FTA", "FT%",
    "MP"
]

defense_features = [
    "TRB", "TRB36",
    "STL", "STL36",
    "BLK", "BLK36",
    "DRB", "ORB",
    "PF"
]

offense_features = [c for c in offense_features if c in rotation_players.columns]
defense_features = [c for c in defense_features if c in rotation_players.columns]

print("Offensive features:", offense_features)
print("Defensive features:", defense_features)


Offensive features: ['PTS', 'PTS36', 'AST', 'AST36', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'MP']
Defensive features: ['TRB', 'TRB36', 'STL', 'STL36', 'BLK', 'BLK36', 'DRB', 'ORB', 'PF']


## Save Cleaned Dataset for Awards Modeling

We now save a clean dataset of rotation players with:

- original key columns (name, team, age, position, games, minutes, per-game box score)
- `Pos_simple`
- per-36 stats

This file will be the input to the **Model Construction / Awards** notebook.


In [10]:
output_cols = [
    "Player", "Age", "Pos", "Pos_simple", "Tm",
    "G", "GS", "MP",
    "FG", "FGA", "FG%", "3P", "3PA", "3P%",
    "2P", "2PA", "2P%", "eFG%",
    "FT", "FTA", "FT%",
    "ORB", "DRB", "TRB",
    "AST", "STL", "BLK", "TOV", "PF", "PTS",
    "Player-additional", "Awards",
] + [c for c in rotation_players.columns if c.endswith("36")]


output_cols = [c for c in output_cols if c in rotation_players.columns]

clean_players = rotation_players[output_cols].copy()

OUTPUT_PATH = f"{DATA_DIR}/players_25_26_awards_clean.csv"
clean_players.to_csv(OUTPUT_PATH, index=False)

print("Saved cleaned awards dataset to:", OUTPUT_PATH)
print("Final shape:", clean_players.shape)
clean_players.head()


Saved cleaned awards dataset to: ./players_25_26_awards_clean.csv
Final shape: (255, 38)


Unnamed: 0,Player,Age,Pos,Pos_simple,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Player-additional,Awards,PTS36,TRB36,AST36,STL36,BLK36,TOV36
0,Jalen Johnson,24.0,SF,F,ATL,20,20,34.9,8.4,15.5,0.539,1.7,4.0,0.413,6.7,11.5,0.583,0.592,4.9,6.2,0.789,1.3,8.8,10.0,7.3,1.6,0.4,3.3,2.1,23.2,johnsja05,,23.931232,10.315186,7.530086,1.65043,0.412607,3.404011
1,Dyson Daniels,22.0,SG,G,ATL,22,22,33.8,4.6,9.5,0.488,0.2,1.4,0.161,4.4,8.1,0.545,0.5,0.8,1.4,0.548,2.2,4.3,6.5,6.0,2.3,0.4,2.3,2.5,10.3,daniedy01,,10.970414,6.923077,6.390533,2.449704,0.426036,2.449704
2,Nickeil Alexander-Walker,27.0,SG,G,ATL,20,17,32.8,7.0,14.9,0.466,2.8,7.2,0.392,4.2,7.8,0.535,0.56,3.7,4.3,0.859,0.7,2.7,3.4,3.6,1.0,0.9,2.3,2.1,20.4,alexani01,,22.390244,3.731707,3.95122,1.097561,0.987805,2.52439
3,Onyeka Okongwu,25.0,C,C,ATL,21,10,30.5,6.2,12.0,0.514,1.9,5.1,0.37,4.3,6.9,0.621,0.593,2.1,2.8,0.759,1.6,5.7,7.3,2.9,0.9,1.1,1.9,3.6,16.4,okongon01,,19.357377,8.616393,3.422951,1.062295,1.298361,2.242623
6,Zaccharie Risacher,20.0,SF,F,ATL,19,19,24.7,4.5,10.0,0.453,1.5,4.8,0.308,3.1,5.2,0.586,0.526,0.9,1.4,0.63,0.6,2.0,2.6,1.5,1.1,0.7,0.8,2.1,11.4,risacza01,,16.615385,3.789474,2.186235,1.603239,1.020243,1.165992
