# 01 – Data Setup & Description

**Dataset:** 2024–2025 NBA Player Stats (Season Totals)

- Each row = one player’s season line (sometimes split by team if traded)
- Source: Basketball-Reference style dataset (stats through the 2024–25 regular season).
- Goal: Understand how NBA players score: threes vs free throws vs overall volume.

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 

In [3]:
path = "Basketball Data 2024-2025 Season - Sheet1.csv"
df = pd.read_csv(path)
df.head()

Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,MP,FG,FGA,...,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Trp-Dbl,Awards
0,1.0,Shai Gilgeous-Alexander,26.0,OKC,PG,76.0,76.0,2598.0,860.0,1656.0,...,312.0,379.0,486.0,131.0,77.0,183.0,164.0,2484.0,0.0,"MVP-1,DPOY-10,CPOY-8,AS,NBA1"
1,2.0,Anthony Edwards,23.0,MIN,SG,79.0,79.0,2871.0,721.0,1612.0,...,389.0,450.0,359.0,91.0,51.0,249.0,150.0,2177.0,0.0,"MVP-7,CPOY-3,AS,NBA2"
2,3.0,Nikola Jokić,29.0,DEN,C,70.0,70.0,2571.0,786.0,1364.0,...,692.0,892.0,716.0,127.0,45.0,230.0,160.0,2071.0,34.0,"MVP-2,CPOY-2,AS,NBA1"
3,4.0,Giannis Antetokounmpo,30.0,MIL,PF,67.0,67.0,2289.0,793.0,1319.0,...,651.0,798.0,433.0,58.0,78.0,206.0,155.0,2036.0,11.0,"MVP-3,DPOY-8,AS,NBA1"
4,5.0,Jayson Tatum,26.0,BOS,PF,72.0,72.0,2624.0,662.0,1465.0,...,575.0,623.0,431.0,76.0,38.0,209.0,157.0,1932.0,2.0,"MVP-4,CPOY-10,AS,NBA1"


##  Info about columns and missing values

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 736 entries, 0 to 735
Data columns (total 32 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Rk       735 non-null    float64
 1   Player   736 non-null    object 
 2   Age      735 non-null    float64
 3   Team     735 non-null    object 
 4   Pos      735 non-null    object 
 5   G        735 non-null    float64
 6   GS       735 non-null    float64
 7   MP       735 non-null    float64
 8   FG       735 non-null    float64
 9   FGA      735 non-null    float64
 10  FG%      732 non-null    float64
 11  3P       735 non-null    float64
 12  3PA      735 non-null    float64
 13  3P%      691 non-null    float64
 14  2P       735 non-null    float64
 15  2PA      735 non-null    float64
 16  2P%      725 non-null    float64
 17  eFG%     732 non-null    float64
 18  FT       735 non-null    float64
 19  FTA      735 non-null    float64
 20  FT%      694 non-null    float64
 21  ORB      735 non

## Quick numeric summary

In [5]:
df.describe()

Unnamed: 0,Rk,Age,G,GS,MP,FG,FGA,FG%,3P,3PA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Trp-Dbl
count,735.0,735.0,735.0,735.0,735.0,735.0,735.0,732.0,735.0,735.0,...,735.0,735.0,735.0,735.0,735.0,735.0,735.0,735.0,735.0,735.0
mean,290.903401,26.027211,40.612245,18.287075,904.712925,155.133333,332.439456,0.45204,50.096599,139.280272,...,41.552381,123.511565,165.063946,100.096599,30.813605,18.112925,50.887075,70.171429,423.738776,0.216327
std,159.594096,4.112759,24.595709,24.137674,773.659072,161.509454,337.795899,0.117306,59.403003,157.189884,...,47.674671,124.567129,166.347483,119.802165,28.802661,23.033846,54.044028,57.795237,446.853498,1.570593
min,1.0,19.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,151.5,23.0,19.0,0.0,226.5,27.0,65.5,0.40675,4.0,17.0,...,10.0,27.0,40.5,17.0,7.0,3.0,11.0,19.0,74.0,0.0
50%,299.0,25.0,39.0,6.0,691.0,101.0,223.0,0.4495,25.0,74.0,...,26.0,84.0,116.0,60.0,23.0,11.0,34.0,57.0,279.0,0.0
75%,430.0,28.0,63.5,30.0,1469.5,237.5,508.5,0.5,81.0,222.5,...,55.0,182.0,241.0,134.5,47.0,23.0,74.5,111.0,634.5,0.0
max,569.0,40.0,82.0,82.0,3036.0,860.0,1656.0,1.0,320.0,811.0,...,300.0,710.0,1010.0,880.0,229.0,176.0,355.0,257.0,2484.0,34.0


## Avoid using duplicate players who played on different teams throughout the season 

In [6]:
# Mark multi-team summary rows
multi_team_codes = ["2TM", "3TM", "4TM"]
multi_team_rows = df[df["Team"].isin(multi_team_codes)]

# Players that have multi-team summary rows
multi_team_players = set(multi_team_rows["Player"])

# Single-team rows for players without multi-team summary
single_team_rows = df[~df["Player"].isin(multi_team_players)]

# Combine to form cleaned dataset: one row per player
df_clean = pd.concat([single_team_rows, multi_team_rows], ignore_index=True)

df_clean.shape, df.shape  # check reduction in row count

((570, 32), (736, 32))

## Per-Game Stats: 

In [7]:
# Avoid division by zero
df_clean = df_clean[df_clean["G"] > 0].copy()

df_clean["PTS_per_game"] = df_clean["PTS"] / df_clean["G"]
df_clean["threePA_per_game"] = df_clean["3PA"] / df_clean["G"]
df_clean["FTA_per_game"] = df_clean["FTA"] / df_clean["G"]
df_clean["MP_per_game"] = df_clean["MP"] / df_clean["G"]

df_clean[["Player", "Team", "Pos", "PTS_per_game", "threePA_per_game", "FTA_per_game", "MP_per_game"]].head()

Unnamed: 0,Player,Team,Pos,PTS_per_game,threePA_per_game,FTA_per_game,MP_per_game
0,Shai Gilgeous-Alexander,OKC,PG,32.684211,5.723684,8.802632,34.184211
1,Anthony Edwards,MIN,SG,27.556962,10.265823,6.278481,36.341772
2,Nikola Jokić,DEN,C,29.585714,4.728571,6.442857,36.728571
3,Giannis Antetokounmpo,MIL,PF,30.38806,0.940299,10.552239,34.164179
4,Jayson Tatum,BOS,PF,26.833333,10.111111,6.111111,36.444444


## Column Descriptions

| Column          | Type   | Description                                              |
|-----------------|--------|----------------------------------------------------------|
| Player          | string | Player name                                              |
| Age             | int    | Player age in 2024–25 season                             |
| Team            | string | Team abbreviation (or 2TM/3TM for multi-team totals)     |
| Pos             | string | Position (PG, SG, SF, PF, C)                            |
| G               | int    | Games played                                            |
| MP              | int    | Total minutes                                           |
| FGA             | int    | Field goal attempts (season total)                      |
| 3PA             | int    | 3-point attempts (season total)                         |
| FTA             | int    | Free throw attempts (season total)                      |
| PTS             | int    | Points scored (season total)                            |
| PTS_per_game    | float  | Points per game (PTS / G)                               |
| threePA_per_game| float  | 3-point attempts per game (3PA / G)                     |
| FTA_per_game    | float  | Free throw attempts per game (FTA / G)                  |
| MP_per_game     | float  | Minutes per game (MP / G)                               |
| ...             | ...    | ... complete for the remaining columns ...              |
