<div class="alert alert-danger">
    <h4 style="font-weight: bold; font-size: 28px;">Feature Engineering - Player Stats Per Game 2019-2024</h4>
    <p style="font-size: 20px;">NBA API Data (2022-2024)</p>
</div>

<a name="Feature-Engineering"></a>

# Table of Contents

[Setup](#Setup)

[Explanation](#Explanation)


[Step by Step To Pull Player Stats](#Step-by-Step-To-Pull-Player-Stats)


[Unused Player API Functionss](#Unused-Player-API-Functions)

# Setup

[Return to top](#Feature-Engineering)

In [1]:
import sys
from pathlib import Path
# get current working directory
cwd = %pwd
# add shared_code directory to Python sys.path
sys.path.append(str(Path(cwd).parent / "shared_code"))
# import all libraries in shared_code directory 'imports.py' file
from imports import *
%matplotlib inline

IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html


In [216]:
# import other libraries
import numpy as np
import time
from nba_api.stats.endpoints import playergamelog
from nba_api.stats.endpoints import playergamelogs

from nba_api.stats.endpoints import CommonAllPlayers

from nba_api.stats.endpoints import CommonPlayerInfo
from nba_api.stats.endpoints import CommonTeamRoster



## Explanation

- I have tried several nba_api functions, but they generally do not work for pulling player stats 
- For instance, boxscoreplayertrackv2 is deprecated for 2021-24 seasons, cannot be pulled  
- CommonAllPlayers is deprecrated, returns only 100-120 players for seasons prior to this season
- playergamelog doesn't have Team ID
- CommonTeamRoster only shows ending team roster, but  players change teams throughout the season
- playercareerstats only has season stats, not per game


- To get player stats per game, for each season, we need to:
1. Loop through each team for each season, pull rosters from CommonTeamRosters
2. Aggregate and pull unique player_ids for each season for each roster
3. Then we loop through 'playergamelogs' (different function than playergamelog), and aggregate

## Step by Step To Pull Player Stats

[Return to top](#Feature-Engineering)

In [122]:
# We use nba game box scores to check later work
# Get total box score df
team_bs_df = pd.read_csv('../../data/original/nba_games_box_scores_2022_2024.csv')

# Get season and game IDs into list
# we check unique games later to make sure pull is legitimate
id_df = team_bs_df[['SEASON_ID', 'GAME_ID', 'TEAM_ID']]
id_list = id_df.values.tolist()

In [134]:
team_id_list = id_df['TEAM_ID'].unique()
len(team_id_list)
print(team_id_list)

[1610612737 1610612738 1610612751 1610612766 1610612741 1610612739
 1610612742 1610612743 1610612765 1610612744 1610612745 1610612754
 1610612746 1610612747 1610612763 1610612748 1610612749 1610612750
 1610612740 1610612752 1610612760 1610612753 1610612755 1610612756
 1610612757 1610612758 1610612759 1610612761 1610612762 1610612764]


In [140]:
# Example Player Roster    
player_roster = CommonTeamRoster(
    team_id= '1610612737',
    season = '2019-20', # change year(s) if needed
    league_id_nullable= '00' # nba 00, g_league 20, wnba 10
)

df_player_roster = player_roster.get_data_frames()[0]
df_player_roster

Unnamed: 0,TeamID,SEASON,LeagueID,PLAYER,NICKNAME,PLAYER_SLUG,NUM,POSITION,HEIGHT,WEIGHT,BIRTH_DATE,AGE,EXP,SCHOOL,PLAYER_ID,HOW_ACQUIRED
0,1610612737,2019,0,Jeff Teague,Jeff,jeff-teague,0,G,6-3,195,"JUN 10, 1988",32.0,10,Wake Forest,201952,
1,1610612737,2019,0,Brandon Goodwin,Brandon,brandon-goodwin,0,G,6-0,180,"OCT 02, 1995",24.0,1,Florida Gulf Coast,1629164,
2,1610612737,2019,0,Treveon Graham,Treveon,treveon-graham,2,G-F,6-5,219,"OCT 28, 1993",26.0,3,Va Commonwealth,1626203,
3,1610612737,2019,0,Kevin Huerter,Kevin,kevin-huerter,3,G-F,6-7,190,"AUG 27, 1998",21.0,1,Maryland,1628989,
4,1610612737,2019,0,Charlie Brown Jr.,Charlie,charlie-brown-jr,4,G,6-6,199,"FEB 02, 1997",23.0,R,St. Joseph's (PA),1629718,
5,1610612737,2019,0,Skal Labissiere,Skal,skal-labissiere,7,F-C,6-10,235,"MAR 18, 1996",24.0,3,Kentucky,1627746,
6,1610612737,2019,0,Trae Young,Trae,trae-young,11,G,6-1,180,"SEP 19, 1998",21.0,1,Oklahoma,1629027,Draft Rights Traded from DAL on 06/21/18
7,1610612737,2019,0,De'Andre Hunter,De'Andre,deandre-hunter,12,F-G,6-7,225,"DEC 02, 1997",22.0,R,Virginia,1629631,Draft Rights Traded from NOP on 07/07/19
8,1610612737,2019,0,Dewayne Dedmon,Dewayne,dewayne-dedmon,14,C,7-0,245,"AUG 12, 1989",30.0,6,USC,203473,
9,1610612737,2019,0,Vince Carter,Vince,vince-carter,15,G-F,6-6,220,"JAN 26, 1977",43.0,21,North Carolina,1713,


In [227]:
# specify seasons to pull
seasons = ['2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

# declare lists
#player_rosters_list = []
#player_rosters_col_names = []
all_player_roster_df = pd.DataFrame()

# Loop Through Each Roster for each season    
for season in seasons:
    for team_id in team_id_list:
        player_roster = CommonTeamRoster(
            team_id= team_id,
            season = season, # change year(s) if needed
            league_id_nullable= '00' # nba 00, g_league 20, wnba 10
        )
        
        # get values into df
        df_player_roster = player_roster.get_data_frames()[0]
        
        # add column for season_year
        df_player_roster['SEASON_YEAR'] = season

        # concat to previous df
        all_player_roster_df = pd.concat([all_player_roster_df, df_player_roster], ignore_index=True)

In [288]:
all_player_roster_df.head()

Unnamed: 0,TeamID,SEASON,LeagueID,PLAYER,NICKNAME,PLAYER_SLUG,NUM,POSITION,HEIGHT,WEIGHT,BIRTH_DATE,AGE,EXP,SCHOOL,PLAYER_ID,HOW_ACQUIRED,SEASON_YEAR
0,1610612737,2019,0,Jeff Teague,Jeff,jeff-teague,0,G,6-3,195,"JUN 10, 1988",32.0,10,Wake Forest,201952,,2019-20
1,1610612737,2019,0,Brandon Goodwin,Brandon,brandon-goodwin,0,G,6-0,180,"OCT 02, 1995",24.0,1,Florida Gulf Coast,1629164,,2019-20
2,1610612737,2019,0,Treveon Graham,Treveon,treveon-graham,2,G-F,6-5,219,"OCT 28, 1993",26.0,3,Va Commonwealth,1626203,,2019-20
3,1610612737,2019,0,Kevin Huerter,Kevin,kevin-huerter,3,G-F,6-7,190,"AUG 27, 1998",21.0,1,Maryland,1628989,,2019-20
4,1610612737,2019,0,Charlie Brown Jr.,Charlie,charlie-brown-jr,4,G,6-6,199,"FEB 02, 1997",23.0,R,St. Joseph's (PA),1629718,,2019-20


In [287]:
# Check that playergamelogs is pulling correctly for single player
# because looping through all players and seasons will take 30 mins

# Initialize an empty DataFrame to store all game logs
check_all_seasons_logs_df = pd.DataFrame()

# List of seasons to loop through (update this list as needed)
# '2019-20', '2020-21', '2021-22', '2022-23', '2023-24'
seasons = ['2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

# Fetch game logs for each season and add a 'SEASON' column
for season in seasons:
    player_logs = playergamelogs.PlayerGameLogs(player_id_nullable='203500', season_nullable= season, season_type_nullable= "Regular Season")
    season_logs_df = player_logs.get_data_frames()[0]
    check_all_seasons_logs_df = pd.concat([check_all_seasons_logs_df, season_logs_df], ignore_index=True)

check_all_seasons_logs_df.head()

Unnamed: 0,SEASON_YEAR,PLAYER_ID,PLAYER_NAME,NICKNAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,...,FGA_RANK,FG_PCT_RANK,FG3M_RANK,FG3A_RANK,FG3_PCT_RANK,FTM_RANK,FTA_RANK,FT_PCT_RANK,OREB_RANK,DREB_RANK,REB_RANK,AST_RANK,TOV_RANK,STL_RANK,BLK_RANK,BLKA_RANK,PF_RANK,PFD_RANK,PTS_RANK,PLUS_MINUS_RANK,NBA_FANTASY_PTS_RANK,DD2_RANK,TD3_RANK,WNBA_FANTASY_PTS_RANK,AVAILABLE_FLAG
0,2019-20,203500,Steven Adams,Steven,1610612760,OKC,Oklahoma City Thunder,21901317,2020-08-14T00:00:00,OKC @ LAC,L,6.367,0,2,0.0,0,0,0.0,0,0,0.0,2,2,4,0,...,59,61,2,4,2,43,52,43,40,57,60,57,24,31,37,1,1,61,61,29,62,23,1,62,1
1,2019-20,203500,Steven Adams,Steven,1610612760,OKC,Oklahoma City Thunder,21901306,2020-08-12T00:00:00,OKC vs. MIA,W,19.95,4,7,0.571,0,0,0.0,0,2,0.0,2,6,8,0,...,31,36,2,4,2,43,34,43,40,26,38,57,55,14,37,1,1,39,42,55,50,23,1,50,1
2,2019-20,203500,Steven Adams,Steven,1610612760,OKC,Oklahoma City Thunder,21901265,2020-08-05T00:00:00,OKC @ LAL,W,28.25,7,10,0.7,0,0,0.0,4,7,0.571,1,6,7,2,...,8,18,2,4,2,7,4,22,51,26,45,28,24,31,37,41,29,3,8,10,34,23,1,30,1
3,2019-20,203500,Steven Adams,Steven,1610612760,OKC,Oklahoma City Thunder,21901251,2020-08-03T00:00:00,OKC vs. DEN,L,34.03,3,6,0.5,0,0,0.0,3,4,0.75,3,7,10,1,...,43,39,2,4,2,16,16,13,31,17,23,46,12,31,37,41,63,5,37,62,51,23,1,46,1
4,2019-20,203500,Steven Adams,Steven,1610612760,OKC,Oklahoma City Thunder,21901240,2020-08-01T00:00:00,OKC vs. UTA,W,27.51,7,10,0.7,0,0,0.0,2,5,0.4,3,8,11,2,...,8,18,2,4,2,21,14,38,31,10,13,28,2,31,8,1,46,18,11,5,21,1,1,16,1


In [275]:
##### Warning - this cell will take 30-40 minutes to run
# make sure code is running properly before running this cell

# Loop through all seasons for all players

# Initialize an empty DataFrame to store all game logs
all_seasons_logs_df = pd.DataFrame()

# List of seasons to loop through (update this list as needed)
seasons = ['2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

for season in seasons:
    # Get unique ids in specific season
    unique_player_ids = all_player_roster_df[all_player_roster_df['SEASON_YEAR'] == season]['PLAYER_ID'].unique()
    
    # Loop through unique IDS
    for unique_id in unique_player_ids:
        # Get player logs
        player_logs = playergamelogs.PlayerGameLogs(player_id_nullable=unique_id, 
                                                    season_nullable= season,
                                                    season_type_nullable= "Regular Season")
        # get player game data into df
        season_logs_df = player_logs.get_data_frames()[0]
        # concat with master df
        all_seasons_logs_df = pd.concat([all_seasons_logs_df, season_logs_df], ignore_index=True)

In [277]:
len(all_seasons_logs_df)

114335

In [284]:
# Check that each season has pulled the correct amount of games (should be 1230)
# 2019-20 will have less games becasue of covid
len(all_seasons_logs_df[all_seasons_logs_df['SEASON_YEAR'] == '2022-23']['GAME_ID'].unique())


1230

In [293]:
all_seasons_logs_df.tail(20)

Unnamed: 0,SEASON_YEAR,PLAYER_ID,PLAYER_NAME,NICKNAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,...,FGA_RANK,FG_PCT_RANK,FG3M_RANK,FG3A_RANK,FG3_PCT_RANK,FTM_RANK,FTA_RANK,FT_PCT_RANK,OREB_RANK,DREB_RANK,REB_RANK,AST_RANK,TOV_RANK,STL_RANK,BLK_RANK,BLKA_RANK,PF_RANK,PFD_RANK,PTS_RANK,PLUS_MINUS_RANK,NBA_FANTASY_PTS_RANK,DD2_RANK,TD3_RANK,WNBA_FANTASY_PTS_RANK,AVAILABLE_FLAG
114315,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300465,2024-01-03T00:00:00,WAS @ CLE,L,6.917,1,3,0.333,0,1,0.0,0,0,0.0,1,0,1,0,...,14,17,7,5,7,15,16,15,8,17,14,15,11,13,2,23,17,18,19,20,23,2,1,21,1
114316,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300416,2023-12-27T00:00:00,WAS vs. TOR,L,6.267,3,4,0.75,0,1,0.0,1,2,0.5,0,1,1,2,...,11,4,7,5,7,8,4,10,13,9,14,3,5,4,2,1,17,9,7,20,6,2,1,7,1
114317,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300358,2023-12-18T00:00:00,WAS @ SAC,L,4.683,1,3,0.333,0,1,0.0,0,0,0.0,0,2,2,0,...,14,17,7,5,7,15,16,15,13,7,10,15,11,13,2,1,9,18,19,4,20,2,1,20,1
114318,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300347,2023-12-17T00:00:00,WAS @ PHX,L,0.338,0,0,0.0,0,0,0.0,0,0,0.0,0,0,0,0,...,27,21,7,22,7,15,16,15,13,17,22,15,11,13,2,1,1,18,23,10,27,2,1,27,1
114319,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300295,2023-12-11T00:00:00,WAS @ PHI,L,2.065,0,1,0.0,0,0,0.0,0,0,0.0,0,0,0,0,...,22,21,7,22,7,15,16,15,13,17,22,15,11,13,2,1,9,18,23,20,27,2,1,27,1
114320,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22301219,2023-12-08T00:00:00,WAS @ BKN,L,3.933,1,2,0.5,0,1,0.0,1,1,1.0,0,1,1,0,...,18,10,7,5,7,8,13,1,13,9,14,15,2,5,2,1,9,9,16,13,18,2,1,17,1
114321,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300259,2023-11-29T00:00:00,WAS @ ORL,L,9.667,3,5,0.6,1,2,0.5,3,4,0.75,1,0,1,1,...,8,7,1,2,4,1,1,7,8,17,14,8,11,5,2,23,25,4,3,13,7,2,1,5,1
114322,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300253,2023-11-27T00:00:00,WAS @ DET,W,1.833,1,3,0.333,1,1,1.0,0,0,0.0,2,0,2,0,...,14,17,1,5,1,15,16,15,4,17,10,15,11,13,2,1,1,18,16,10,17,2,1,17,1
114323,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300241,2023-11-25T00:00:00,WAS vs. ATL,L,12.0,4,6,0.667,0,1,0.0,2,2,1.0,2,1,3,1,...,4,5,7,5,7,4,4,1,4,9,6,8,11,5,2,1,9,2,3,7,4,2,1,4,1
114324,2023-24,1630647,Eugene Omoruyi,Eugene,1610612764,WAS,Washington Wizards,22300219,2023-11-20T00:00:00,WAS vs. MIL,L,1.25,0,1,0.0,0,1,0.0,2,2,1.0,0,0,0,0,...,22,21,7,5,7,4,4,1,13,17,22,15,11,13,2,1,1,9,19,6,25,2,1,24,1


In [292]:
# Save file
all_seasons_logs_df.to_csv('../../data/original/nba_players_statistics_per_game_2019_2024.csv', index=False)

In [268]:
# Check nba_games_box_scores_2022-2024 for unique SEASON_IDs
id_df['SEASON_ID'].unique()

array([12021, 22021, 52021, 42021, 22022, 12022, 52022, 42022, 22023,
       12023, 62023], dtype=int64)

In [274]:
# check games with unique SEASON_IDs
len(id_df[id_df['SEASON_ID'] == 22022])

2636

In [286]:
len(id_df['GAME_ID'].unique())

3767

## Unused Player API Functions

In [155]:
## Doesn't work because commonAllplayers only pulls 100-120 players for 2019-2023 seasons

# specify seasons to pull
seasons = ['2019-20', '2020-21', '2021-22', '2022-23', '2023-24']

# loop through seasons
player_list = []
for season in seasons:
    common_all_players = CommonAllPlayers(
        is_only_current_season = 1, # 1 active, 0 not active
        league_id = '00', # nba 00, g_league 20, wnba 10
        season = season # change year(s) if needed
    )

    df_common_players = common_all_players.get_data_frames()[0]
    player_list.append(df_common_players.values.tolist())

In [156]:
len(pd.DataFrame(player_list[3]))

126

In [289]:
# Show example of player box scores for a given game id
player_boxscores = boxscoreplayertrackv2.BoxScorePlayerTrackV2(game_id = '0022200021')
df_player_boxscores = player_boxscores.get_data_frames()[0]
df_player_boxscores.head()


Unnamed: 0,GAME_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_CITY,PLAYER_ID,PLAYER_NAME,START_POSITION,COMMENT,MIN,SPD,DIST,ORBC,DRBC,RBC,TCHS,SAST,FTAST,PASS,AST,CFGM,CFGA,CFG_PCT,UFGM,UFGA,UFG_PCT,FG_PCT,DFGM,DFGA,DFG_PCT
0,22200021,1610612761,TOR,Toronto,1628384,OG Anunoby,F,,37:15,4.11,2.73,3,8,11,48,1,0,33,1,1,4,0.25,2,5,0.4,0.333,1,2,0.5
1,22200021,1610612761,TOR,Toronto,1630567,Scottie Barnes,F,,35:53,4.19,2.7,4,9,11,61,0,1,43,3,4,7,0.571,3,7,0.429,0.5,1,3,0.333
2,22200021,1610612761,TOR,Toronto,1627783,Pascal Siakam,C,,36:48,4.11,2.7,6,12,15,90,0,0,60,11,10,11,0.909,5,10,0.5,0.713,3,3,1.0
3,22200021,1610612761,TOR,Toronto,1629018,Gary Trent Jr.,G,,35:58,4.05,2.62,0,1,1,33,0,0,15,0,4,9,0.444,2,7,0.286,0.375,1,1,1.0
4,22200021,1610612761,TOR,Toronto,1627832,Fred VanVleet,G,,37:56,4.14,2.81,1,9,10,82,1,0,65,9,1,2,0.5,6,9,0.667,0.636,3,3,1.0


In [15]:
from nba_api.stats.endpoints import playercareerstats
# Fetching career statistics for Player of Choice using his player ID
player_career = playercareerstats.PlayerCareerStats(player_id='203500')
player_career_df = player_career.get_data_frames()[0]

# Extracting the seasons of player of choice
seasons_played = player_career_df['SEASON_ID'].unique()
print(seasons_played.tolist())

['2013-14', '2014-15', '2015-16', '2016-17', '2017-18', '2018-19', '2019-20', '2020-21', '2021-22', '2022-23']


In [291]:
# Show example of player box scores for a given game id
player_boxscores = playergamelog.PlayerGameLog(player_id= '203925', season=2023)
df_player_boxscores = player_boxscores.get_data_frames()[0]
df_player_boxscores.head()

Unnamed: 0,SEASON_ID,Player_ID,Game_ID,GAME_DATE,MATCHUP,WL,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS,VIDEO_AVAILABLE
0,22023,203925,22300732,"FEB 07, 2024",DET @ SAC,W,12,1,3,0.333,0,1,0.0,0,0,0.0,0,0,0,0,0,0,0,1,2,-5,1
1,22023,203925,22300558,"JAN 15, 2024",DET @ WAS,W,5,0,0,0.0,0,0,0.0,0,0,0.0,0,0,0,1,0,0,1,0,0,-10,1
2,22023,203925,22300519,"JAN 10, 2024",DET vs. SAS,L,6,0,0,0.0,0,0,0.0,0,0,0.0,0,1,1,0,0,0,0,1,0,3,1
3,22023,203925,22300512,"JAN 09, 2024",DET vs. SAC,L,9,3,5,0.6,3,5,0.6,0,0,0.0,0,1,1,0,0,0,0,2,9,2,1
4,22023,203925,22300394,"DEC 23, 2023",DET @ BKN,L,7,0,0,0.0,0,0,0.0,0,0,0.0,0,1,1,0,0,0,0,0,0,-7,1


In [290]:
from nba_api.stats.endpoints import CommonTeamRoster

common_team_roster = CommonTeamRoster(
    team_id = '1610612752', # input team id
    league_id_nullable = '00', # nba 00, g_league 20, wnba 10
    season='2023-24')
df_common_team_roster = common_team_roster.get_data_frames()[0]
df_common_team_roster.head()

Unnamed: 0,TeamID,SEASON,LeagueID,PLAYER,NICKNAME,PLAYER_SLUG,NUM,POSITION,HEIGHT,WEIGHT,BIRTH_DATE,AGE,EXP,SCHOOL,PLAYER_ID,HOW_ACQUIRED
0,1610612752,2023,0,Donte DiVincenzo,Donte,donte-divincenzo,0,G,6-4,203,"JAN 31, 1997",27.0,5,Villanova,1628978,Signed on 07/08/23
1,1610612752,2023,0,Jacob Toppin,Jacob,jacob-toppin,0,F,6-8,200,"MAY 08, 2000",23.0,R,Kentucky,1631210,Signed on 07/06/23
2,1610612752,2023,0,Duane Washington Jr.,Duane,duane-washington-jr,1,G,6-2,197,"MAR 24, 2000",23.0,2,Ohio State,1630613,Signed on 02/28/23
3,1610612752,2023,0,Miles McBride,Miles,miles-mcbride,2,G,6-1,195,"SEP 08, 2000",23.0,2,West Virginia,1630540,Draft Rights Traded from OKC on 07/30/21
4,1610612752,2023,0,Josh Hart,Josh,josh-hart,3,G,6-4,215,"MAR 06, 1995",29.0,6,Villanova,1628404,Traded from POR on 02/09/23
