This notebook scrapes and organizes NBA box score data found at https://www.nba.com/stats/players/boxscores. It contains a row for every game played for every player. For example, if a certain player has played in 10 games this NBA season, the data will contain 10 rows for that particular player. The columns included are basic stats like points, rebounds, assists, turnovers, and more.
Once the data is scraped, it is then organized into a pandas dataframe and exported into a .csv file to be used for EDA and app development.
This data is updated online every night with each new NBA game played. It is recommended to update the data regularly by running this script.
A guide for scraping this data is found at https://towardsdatascience.com/how-scraping-nba-stats-is-cooler-than-michael-jordan-49d7562ce3ef.
For the future: 
- Add a way to easily control which dates are pulled.
- Join in ADP and player position
- Add a ranking variable to the totals data?



In [1]:
import requests as r
import pandas as pd

In [2]:
url = 'https://stats.nba.com/stats/leaguegamelog?'

header = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Accept-Language': 'en-US,en;q=0.9,es;q=0.8',
    'Connection': 'keep-alive',
    'Host': 'stats.nba.com',
    'If-Modified-Since': 'Thu, 03 Nov 2022 16:07:11 GMT',
    'Origin': 'https://www.nba.com',
    'Referer': 'https://www.nba.com/',
    'sec-ch-ua': '"Google Chrome";v="107", "Chromium";v="107", "Not=A?Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': "Windows",
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'same-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}

params = {
    'Counter': '1000',
    'DateFrom': '',
    'DateTo': '',
    'Direction': 'DESC',
    'LeagueID': '00',
    'PlayerOrTeam': 'P',
    'Season': '2022-23',
    'SeasonType': 'Regular Season',
    'Sorter': 'DATE'
}

request = r.get(url, headers = header, params = params)

print(request)

<Response [200]>


In [3]:
json_data = request.json()

In [4]:
columns = json_data['resultSets'][0]['headers']
columns

['SEASON_ID',
 'PLAYER_ID',
 'PLAYER_NAME',
 'TEAM_ID',
 'TEAM_ABBREVIATION',
 'TEAM_NAME',
 'GAME_ID',
 'GAME_DATE',
 'MATCHUP',
 'WL',
 'MIN',
 'FGM',
 'FGA',
 'FG_PCT',
 'FG3M',
 'FG3A',
 'FG3_PCT',
 'FTM',
 'FTA',
 'FT_PCT',
 'OREB',
 'DREB',
 'REB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PF',
 'PTS',
 'PLUS_MINUS',
 'FANTASY_PTS',
 'VIDEO_AVAILABLE']

In [5]:
length = len(json_data['resultSets'][0]['rowSet'])
print(f'Current number of rows in data: {length}')

Current number of rows in data: 5045


In [6]:
data = pd.DataFrame(json_data['resultSets'][0]['rowSet'])
data.columns = columns

In [7]:
# Drop a few columns for easier use
data = data.drop(columns = ['SEASON_ID', 'VIDEO_AVAILABLE', 'FANTASY_PTS'])

In [8]:
# Create a new variable that is fantasy points
data['FANTASY'] = (2 * data['FGM']) - data['FGA'] + data['FTM'] - data['FTA'] + data['FG3M'] + data['REB'] + (2 * data['AST']) + (4 * data['STL']) + (4 * data['BLK']) - (2 * data['TOV']) + data['PTS']

In [9]:
data.sample(10)

Unnamed: 0,PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,TEAM_NAME,GAME_ID,GAME_DATE,MATCHUP,WL,MIN,...,DREB,REB,AST,STL,BLK,TOV,PF,PTS,PLUS_MINUS,FANTASY
4139,1630581,Josh Giddey,1610612760,OKC,Oklahoma City Thunder,22200041,2022-10-23,OKC vs. MIN,L,22,...,2,2,5,0,0,4,2,10,-15,13
651,1631093,Jaden Ivey,1610612765,DET,Detroit Pistons,22200199,2022-11-14,DET vs. TOR,L,34,...,4,4,8,1,0,4,6,21,3,32
1059,1630227,Daishen Nix,1610612745,HOU,Houston Rockets,22200191,2022-11-12,HOU @ NOP,L,11,...,3,3,3,1,0,2,0,0,13,9
4170,1629611,Terance Mann,1610612746,LAC,LA Clippers,22200043,2022-10-23,LAC vs. PHX,L,9,...,1,2,0,0,0,0,2,9,-8,15
4370,203484,Kentavious Caldwell-Pope,1610612743,DEN,Denver Nuggets,22200035,2022-10-22,DEN vs. OKC,W,32,...,5,5,2,0,0,2,5,21,16,33
2559,1629642,Nassir Little,1610612757,POR,Portland Trail Blazers,22200116,2022-11-02,POR vs. MEM,L,12,...,1,1,1,0,0,1,0,3,-8,2
4517,1628976,Wendell Carter Jr.,1610612753,ORL,Orlando Magic,22200020,2022-10-21,ORL @ ATL,L,33,...,7,8,2,0,0,2,4,14,-12,16
205,1628373,Frank Ntilikina,1610612742,DAL,Dallas Mavericks,22200230,2022-11-18,DAL vs. DEN,W,8,...,0,0,2,0,0,1,1,2,-1,5
1153,1629646,Charles Bassey,1610612759,SAS,San Antonio Spurs,22200180,2022-11-11,SAS vs. MIL,W,18,...,11,14,4,0,4,3,4,5,10,39
1206,1629111,Jock Landale,1610612756,PHX,Phoenix Suns,22200177,2022-11-11,PHX @ ORL,L,13,...,4,6,0,0,0,0,0,7,5,14


In [10]:
data.to_csv('boxScores.csv', index = False)