# NBA API Data Exploration

Basic data analysis and exploration of available NBA data. Attempting to wrap my head around what data is available, and glean some interesting insights outside the run of the mill analysis often discussed on ESPN broadcasts.

To limit the scope of this, this notebook will focus on the relation of Draft Position to career stats/success as an individual and impact on team success.

Step 1 is of course import the API wrapper, and start having a look at the available endpoints and the data. I've determined the endpoints relevant to the selected topic include:

 - drafthistory
 - playercareerstats
 - playergamelog
 - playerawards
 - leaguestandings

In [1]:
import pandas as pd
import nba_api.stats.endpoints as nba_api
import time

# Set pandas options to show more columns. NBA has a large number of statistical categories.
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

draft_pos = nba_api.drafthistory.DraftHistory().get_data_frames()[0]
league_standings = nba_api.leaguestandings.LeagueStandings().get_data_frames()[0]
game_log = nba_api.leaguegamelog.LeagueGameLog().get_data_frames()[0]
player_stats = nba_api.leaguedashplayerstats.LeagueDashPlayerStats().get_data_frames()[0]

def lowerHeaders(df):
    df.columns = [x.lower() for x in df.columns]
    
# Make headers lowercase because I will probably forget to use caps.
lowerHeaders(draft_pos)
lowerHeaders(league_standings)
lowerHeaders(game_log)
lowerHeaders(player_stats)

In [2]:
# Get all the individual player stats for each year we have draft positions. 

all_player_data = player_stats.copy()
all_player_data["nba_season"] = "2019-20"
nba_seasons = draft_pos.loc[(draft_pos["season"] != "2019"), "season"].unique()

for i, yr in enumerate(nba_seasons):
    next_year = str(int(yr[-2:]) + 1).zfill(2)
    if next_year == "100":
        next_year = "00"
    nba_seasons[i] = yr + "-" + next_year

for x in nba_seasons:
    temp_player_data = nba_api.leaguedashplayerstats.LeagueDashPlayerStats(season=x).get_data_frames()[0]
    temp_player_data["nba_season"] = x
    lowerHeaders(temp_player_data)
    all_player_data = all_player_data.append(temp_player_data)
    time.sleep(0.5) # Throttle API calls because got kicked out running this full speed. 
    
print(all_player_data.groupby("nba_season").size())

nba_season
1996-97    446
1997-98    442
1998-99    440
1999-00    439
2000-01    441
2001-02    440
2002-03    428
2003-04    442
2004-05    464
2005-06    458
2006-07    458
2007-08    451
2008-09    445
2009-10    442
2010-11    452
2011-12    478
2012-13    469
2013-14    482
2014-15    492
2015-16    476
2016-17    486
2017-18    540
2018-19    530
2019-20    529
dtype: int64


# Issue #1
The league player stats API returns blank dataframes for all seasons before 1996-97. Due to unavailability of the datasets, this analysis will only include players drafted from in the 1996-97 season onwards. Unfortunately, this means many of the greatest players in NBA history will be excluded including Michael Jordan, Kareem Abdul Jabbar, Larry Bird & Magic Johnson.

While we're on the topic of excluded players, undrafted players who have found their way onto an NBA team will also be excluded from this, they fall outside the scope of the Draft Position analysis.

In [3]:
# Create custom career stats dataframe from all seasons player data.
aggregates = {
    "gp": "sum",
    "w": "sum",
    "l": "sum",
    "fg_pct": "mean",
    "fg3_pct": "mean",
    "ft_pct": "mean",
    "pts": "sum",
    "reb": "sum",
    "ast": "sum",
    "tov": "sum",
    "stl": "sum",
    "blk": "sum"
}
career_stats = all_player_data.groupby("player_id player_name".split(" ")).agg(aggregates)

career_stats["win_pct"] = career_stats["w"] / career_stats["gp"]
career_stats["pts_pg"] = career_stats["pts"] / career_stats["gp"]
career_stats["reb_pg"] = career_stats["reb"] / career_stats["gp"]
career_stats["ast_pg"] = career_stats["ast"] / career_stats["gp"]
career_stats["tov_pg"] = career_stats["tov"] / career_stats["gp"]
career_stats["stl_pg"] = career_stats["stl"] / career_stats["gp"]
career_stats["blk_pg"] = career_stats["blk"] / career_stats["gp"]

career_summary = draft_pos.loc[pd.to_numeric(draft_pos["season"]) >= 1996].merge(career_stats, left_on="person_id", right_on="player_id")
career_summary.describe()

Unnamed: 0,person_id,round_number,round_pick,overall_pick,team_id,gp,w,l,fg_pct,fg3_pct,ft_pct,pts,reb,ast,tov,stl,blk,win_pct,pts_pg,reb_pg,ast_pg,tov_pg,stl_pg,blk_pg
count,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0,1222.0
mean,433209.9,1.42635,14.508183,27.10311,1610613000.0,329.36743,163.618658,165.748773,0.42676,0.229386,0.673686,3396.194763,1411.049918,717.771686,465.690671,250.047463,165.371522,0.466864,7.246936,3.20015,1.514986,1.051943,0.563088,0.380212
std,631991.3,0.494748,8.44064,16.181126,8.713241,312.838433,171.372951,149.781407,0.091942,0.144849,0.177989,4704.115463,1888.029119,1216.162987,625.693198,333.394515,284.535605,0.1411,5.03985,2.102022,1.478426,0.670114,0.365478,0.38938
min,947.0,1.0,1.0,1.0,1610613000.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2407.25,1.0,7.0,13.0,1610613000.0,67.0,29.0,37.25,0.390562,0.103313,0.606597,241.0,123.0,42.0,43.0,22.25,11.0,0.38785,3.312806,1.706241,0.538071,0.579944,0.3,0.126804
50%,201580.5,1.0,14.0,26.0,1610613000.0,220.5,101.0,116.0,0.428808,0.278673,0.719238,1437.0,635.5,237.0,207.5,111.0,57.0,0.470588,6.000897,2.778695,1.016682,0.91044,0.5,0.25501
75%,203905.2,2.0,22.0,40.0,1610613000.0,541.75,260.75,272.75,0.469094,0.341108,0.784475,4813.75,1955.0,811.25,642.75,351.0,193.75,0.555056,9.825109,4.185221,1.997312,1.390301,0.76031,0.492824
max,1629714.0,2.0,30.0,60.0,1610613000.0,1541.0,1001.0,756.0,1.0,1.0,1.0,34241.0,15091.0,10335.0,4424.0,2233.0,3020.0,1.0,27.067984,13.836394,9.463725,4.234043,2.189216,2.42439


In [4]:
draft_pos_summary = career_summary.groupby("overall_pick").agg({
    
})