This notebook scrapes and organizes NBA box score data found at https://www.nba.com/stats/players/boxscores. It contains a row for every game played for every player. For example, if a certain player has played in 10 games this NBA season, the data will contain 10 rows for that particular player. The columns included are basic stats like points, rebounds, assists, turnovers, and more.
Once the data is scraped, it is then organized into a pandas dataframe and exported into a .csv file to be used for EDA and app development.
This data is updated online every night with each new NBA game played. It is recommended to update the data regularly by running this script.
A guide for scraping this data is found at https://towardsdatascience.com/how-scraping-nba-stats-is-cooler-than-michael-jordan-49d7562ce3ef.
For the future: Add a way to easily control which dates are pulled.

In [2]:
import requests as r
import pandas as pd

In [4]:
url = 'https://stats.nba.com/stats/leaguegamelog?'

header = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,es;q=0.8',
    'Connection': 'keep-alive',
    'Host': 'stats.nba.com',
    'If-Modified-Since': 'Thu, 03 Nov 2022 16:07:11 GMT',
    'Origin': 'https://www.nba.com',
    'Referer': 'https://www.nba.com/',
    'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': "Windows",
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

params = {
    'Counter': '1000',
    'DateFrom': '',
    'DateTo': '',
    'Direction': 'DESC',
    'LeagueID': '00',
    'PlayerOrTeam': 'P',
    'Season': '2022-23',
    'SeasonType': 'Regular Season',
    'Sorter': 'DATE'
}

request = r.get(url, headers = header, params = params)

print(request)

<Response [200]>


In [5]:
json_data = request.json()

pause
pause


In [6]:
columns = json_data['resultSets'][0]['headers']
columns

['SEASON_ID',
 'PLAYER_ID',
 'PLAYER_NAME',
 'TEAM_ID',
 'TEAM_ABBREVIATION',
 'TEAM_NAME',
 'GAME_ID',
 'GAME_DATE',
 'MATCHUP',
 'WL',
 'MIN',
 'FGM',
 'FGA',
 'FG_PCT',
 'FG3M',
 'FG3A',
 'FG3_PCT',
 'FTM',
 'FTA',
 'FT_PCT',
 'OREB',
 'DREB',
 'REB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF',
 'PTS',
 'PLUS_MINUS',
 'FANTASY_PTS',
 'VIDEO_AVAILABLE']

In [7]:
length = len(json_data['resultSets'][0]['rowSet'])
print(f'Current number of rows in data: {length}')

Current number of rows in data: 2578


In [8]:
data = pd.DataFrame(json_data['resultSets'][0]['rowSet'])
data.columns = columns

In [9]:
# Drop a few columns for easier use
data = data.drop(columns = ['SEASON_ID', 'VIDEO_AVAILABLE', 'FANTASY_PTS'])

In [10]:
data.sample(10)

Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,...,OREB,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS
1608,1629001,De'Anthony Melton,1610612755,PHI,Philadelphia 76ers,22200044,2022-10-24,PHI vs. IND,W,17,...,0,5,5,1,1,0,0,1,11,11
302,1630181,R.J. Hampton,1610612753,ORL,Orlando Magic,22200105,2022-11-01,ORL @ OKC,L,21,...,0,2,2,1,0,1,1,3,10,-8
2359,200752,Rudy Gay,1610612762,UTA,Utah Jazz,22200012,2022-10-19,UTA vs. DEN,W,26,...,0,0,0,2,1,0,1,2,16,24
2173,1628988,Aaron Holiday,1610612737,ATL,Atlanta Hawks,22200020,2022-10-21,ATL vs. ORL,W,14,...,0,2,2,1,0,0,0,2,4,0
2305,1630558,Davion Mitchell,1610612758,SAC,Sacramento Kings,22200014,2022-10-19,SAC vs. POR,L,26,...,0,3,3,4,0,0,0,1,2,-7
2245,201950,Jrue Holiday,1610612749,MIL,Milwaukee Bucks,22200015,2022-10-20,MIL @ PHI,W,36,...,1,3,4,8,2,0,1,1,6,-6
2121,202709,Cory Joseph,1610612765,DET,Detroit Pistons,22200023,2022-10-21,DET @ NYK,L,12,...,1,1,2,3,0,0,1,1,0,-26
747,200782,P.J. Tucker,1610612755,PHI,Philadelphia 76ers,22200084,2022-10-29,PHI @ CHI,W,23,...,0,6,6,0,0,2,4,4,8,1
2295,1628370,Malik Monk,1610612758,SAC,Sacramento Kings,22200014,2022-10-19,SAC vs. POR,L,16,...,1,1,2,3,0,1,2,1,6,-18
1328,1628378,Donovan Mitchell,1610612739,CLE,Cleveland Cavaliers,22200056,2022-10-26,CLE vs. ORL,W,37,...,2,2,4,8,2,0,1,3,14,9


In [11]:
data.to_csv('boxScores.csv')