This notebook scrapes and organizes NBA box score data found at https://www.nba.com/stats/players/boxscores. It contains a row for every game played for every player. For example, if a certain player has played in 10 games this NBA season, the data will contain 10 rows for that particular player. The columns included are basic stats like points, rebounds, assists, turnovers, and more.
Once the data is scraped, it is then organized into a pandas dataframe and exported into a .csv file to be used for EDA and app development.
This data is updated online every night with each new NBA game played. It is recommended to update the data regularly by running this script.
A guide for scraping this data is found at https://towardsdatascience.com/how-scraping-nba-stats-is-cooler-than-michael-jordan-49d7562ce3ef.
For the future: Add a way to easily control which dates are pulled.

For the future: Get data that is totals?

In [2]:
import requests as r
import pandas as pd

In [3]:
url = 'https://stats.nba.com/stats/leaguegamelog?'

header = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,es;q=0.8',
    'Connection': 'keep-alive',
    'Host': 'stats.nba.com',
    'If-Modified-Since': 'Thu, 03 Nov 2022 16:07:11 GMT',
    'Origin': 'https://www.nba.com',
    'Referer': 'https://www.nba.com/',
    'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': "Windows",
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

params = {
    'Counter': '1000',
    'DateFrom': '',
    'DateTo': '',
    'Direction': 'DESC',
    'LeagueID': '00',
    'PlayerOrTeam': 'P',
    'Season': '2022-23',
    'SeasonType': 'Regular Season',
    'Sorter': 'DATE'
}

request = r.get(url, headers = header, params = params)

print(request)

<Response [200]>


In [4]:
json_data = request.json()

In [5]:
columns = json_data['resultSets'][0]['headers']
columns

['SEASON_ID',
 'PLAYER_ID',
 'PLAYER_NAME',
 'TEAM_ID',
 'TEAM_ABBREVIATION',
 'TEAM_NAME',
 'GAME_ID',
 'GAME_DATE',
 'MATCHUP',
 'WL',
 'MIN',
 'FGM',
 'FGA',
 'FG_PCT',
 'FG3M',
 'FG3A',
 'FG3_PCT',
 'FTM',
 'FTA',
 'FT_PCT',
 'OREB',
 'DREB',
 'REB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF',
 'PTS',
 'PLUS_MINUS',
 'FANTASY_PTS',
 'VIDEO_AVAILABLE']

In [6]:
length = len(json_data['resultSets'][0]['rowSet'])
print(f'Current number of rows in data: {length}')

Current number of rows in data: 3454


In [7]:
data = pd.DataFrame(json_data['resultSets'][0]['rowSet'])
data.columns = columns

In [8]:
# Drop a few columns for easier use
data = data.drop(columns = ['SEASON_ID', 'VIDEO_AVAILABLE', 'FANTASY_PTS'])

In [9]:
# Create a new variable that is fantasy points
data['FANTASY'] = (2 * data['FGM']) - data['FGA'] + data['FTM'] - data['FTA'] + data['FG3M'] + data['REB'] + (2 * data['AST']) + (4 * data['STL']) + (4 * data['BLK']) - (2 * data['TOV']) + data['PTS']

In [10]:
data.sample(10)

Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS,FANTASY
2476,1631106,Tari Eason,1610612745,HOU,Houston Rockets,22200048,2022-10-24,HOU vs. UTA,W,19,...,4,9,0,3,0,1,3,11,7,25
2408,1631207,Dalen Terry,1610612741,CHI,Chicago Bulls,22200047,2022-10-24,CHI vs. BOS,W,2,...,0,0,0,0,0,0,0,0,1,-1
1689,1629655,Daniel Gafford,1610612764,WAS,Washington Wizards,22200074,2022-10-28,WAS vs. IND,L,15,...,2,4,0,0,0,1,2,10,1,13
2645,1628989,Kevin Huerter,1610612758,SAC,Sacramento Kings,22200042,2022-10-23,SAC @ GSW,L,31,...,5,5,4,0,0,2,3,9,5,11
1846,1630534,Ochai Agbaji,1610612762,UTA,Utah Jazz,22200078,2022-10-28,UTA @ DEN,L,19,...,1,1,0,0,0,0,2,9,-6,10
2479,1629630,Ja Morant,1610612763,MEM,Memphis Grizzlies,22200049,2022-10-24,MEM vs. BKN,W,34,...,6,8,7,2,0,2,1,38,20,69
2908,203500,Steven Adams,1610612763,MEM,Memphis Grizzlies,22200024,2022-10-21,MEM @ HOU,W,33,...,7,9,2,0,2,0,5,6,23,27
3194,1630560,Cam Thomas,1610612751,BKN,Brooklyn Nets,22200006,2022-10-19,BKN vs. NOP,L,13,...,0,0,1,0,0,0,0,2,3,2
610,201942,DeMar DeRozan,1610612741,CHI,Chicago Bulls,22200124,2022-11-04,CHI @ BOS,L,36,...,3,3,5,2,1,2,3,46,10,68
2689,1631309,Trevor Hudgins,1610612745,HOU,Houston Rockets,22200033,2022-10-22,HOU @ MIL,L,4,...,0,0,0,0,0,0,0,3,-2,5


In [11]:
data.to_csv('boxScores.csv', index = False)