This notebook scrapes and organizes NBA box score data found at https://www.nba.com/stats/players/boxscores. It contains a row for every game played for every player. For example, if a certain player has played in 10 games this NBA season, the data will contain 10 rows for that particular player. The columns included are basic stats like points, rebounds, assists, turnovers, and more.
Once the data is scraped, it is then organized into a pandas dataframe and exported into a .csv file to be used for EDA and app development.
This data is updated online every night with each new NBA game played. It is recommended to update the data regularly by running this script.
A guide for scraping this data is found at https://towardsdatascience.com/how-scraping-nba-stats-is-cooler-than-michael-jordan-49d7562ce3ef.
For the future: 
- Add a way to easily control which dates are pulled.
- Join in ADP and player position
- Add a ranking variable to the totals data?



In [21]:
import requests as r
import pandas as pd
from datetime import date

In [22]:
print('Last time data was updated:')
print(date.today())

Last time data was updated:
2022-12-07


In [23]:
url = 'https://stats.nba.com/stats/leaguegamelog?'

header = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,es;q=0.8',
    'Connection': 'keep-alive',
    'Host': 'stats.nba.com',
    'If-Modified-Since': 'Thu, 03 Nov 2022 16:07:11 GMT',
    'Origin': 'https://www.nba.com',
    'Referer': 'https://www.nba.com/',
    'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': "Windows",
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

params = {
    'Counter': '1000',
    'DateFrom': '',
    'DateTo': '',
    'Direction': 'DESC',
    'LeagueID': '00',
    'PlayerOrTeam': 'P',
    'Season': '2022-23',
    'SeasonType': 'Regular Season',
    'Sorter': 'DATE'
}

request = r.get(url, headers = header, params = params)

print(request)

<Response [200]>


In [24]:
json_data = request.json()

In [25]:
columns = json_data['resultSets'][0]['headers']
columns

['SEASON_ID',
 'PLAYER_ID',
 'PLAYER_NAME',
 'TEAM_ID',
 'TEAM_ABBREVIATION',
 'TEAM_NAME',
 'GAME_ID',
 'GAME_DATE',
 'MATCHUP',
 'WL',
 'MIN',
 'FGM',
 'FGA',
 'FG_PCT',
 'FG3M',
 'FG3A',
 'FG3_PCT',
 'FTM',
 'FTA',
 'FT_PCT',
 'OREB',
 'DREB',
 'REB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF',
 'PTS',
 'PLUS_MINUS',
 'FANTASY_PTS',
 'VIDEO_AVAILABLE']

In [26]:
length = len(json_data['resultSets'][0]['rowSet'])
print(f'Current number of rows in data: {length}')

Current number of rows in data: 7984


In [27]:
data = pd.DataFrame(json_data['resultSets'][0]['rowSet'])
data.columns = columns

In [28]:
# Drop a few columns for easier use
data = data.drop(columns = ['SEASON_ID', 'VIDEO_AVAILABLE', 'FANTASY_PTS'])

In [29]:
# Create a new variable that is fantasy points
data['FANTASY'] = (2 * data['FGM']) - data['FGA'] + data['FTM'] - data['FTA'] + data['FG3M'] + data['REB'] + (2 * data['AST']) + (4 * data['STL']) + (4 * data['BLK']) - (2 * data['TOV']) + data['PTS']

In [30]:
data.sample(10)

Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS,FANTASY
3525,1630167,Obi Toppin,1610612752,NYK,New York Knicks,22200208,2022-11-15,NYK @ UTA,W,18,...,5,6,4,0,0,0,1,9,23,28
6500,1626174,Christian Wood,1610612742,DAL,Dallas Mavericks,22200066,2022-10-27,DAL @ BKN,W,24,...,6,6,0,0,1,3,5,11,2,17
7489,1631094,Paolo Banchero,1610612753,ORL,Orlando Magic,22200020,2022-10-21,ORL @ ATL,L,34,...,10,12,2,1,3,4,1,20,-10,38
4025,1628983,Shai Gilgeous-Alexander,1610612760,OKC,Oklahoma City Thunder,22200179,2022-11-11,OKC vs. TOR,W,28,...,3,3,4,3,1,3,2,20,13,43
6616,1628378,Donovan Mitchell,1610612739,CLE,Cleveland Cavaliers,22200056,2022-10-26,CLE vs. ORL,W,37,...,2,4,8,2,0,1,3,14,9,31
412,1627741,Buddy Hield,1610612754,IND,Indiana Pacers,22200359,2022-12-05,IND @ GSW,W,40,...,9,9,5,1,0,1,0,17,3,34
7091,203994,Jusuf Nurkic,1610612757,POR,Portland Trail Blazers,22200037,2022-10-23,POR @ LAL,W,30,...,10,13,4,0,1,2,3,6,3,26
2021,203914,Gary Harris,1610612753,ORL,Orlando Magic,22200273,2022-11-25,ORL vs. PHI,L,27,...,2,3,3,1,0,3,2,7,3,13
6016,202331,Paul George,1610612746,LAC,LA Clippers,22200088,2022-10-30,LAC vs. NOP,L,28,...,2,3,3,1,0,4,3,14,-10,12
6654,1626156,D'Angelo Russell,1610612750,MIN,Minnesota Timberwolves,22200062,2022-10-26,MIN vs. SAS,W,29,...,2,2,9,0,1,4,1,12,0,26


In [31]:
data.to_csv('boxScores.csv', index = False)