<a href="https://colab.research.google.com/github/gui98araujo/NBAPlayers/blob/main/WebScrapingNBA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Author: Guilherme Araujo

The objective of the API developed by Guilherme Marinho de Araujo for scraping NBA data is to provide comprehensive statistical information about basketball players and teams. These data are valuable for in-depth analyses that can benefit various audiences, including coaches, team managers, sports journalists, and data analysis enthusiasts.

This API focuses on general player statistics, including points, assists, rebounds, steals, blocks, and other key performance indicators. By collecting these statistics over multiple seasons and season types (such as regular season and playoffs), data analysts can perform a range of important analyses, including:

Player Performance Evaluation: The data allows for the assessment of individual player performance over time, identifying trends and improvements in their statistics.

Team Analysis: Statistics can be used to evaluate team performance by comparing player statistics and identifying areas of strength and weakness.

Game Strategy: Coaches can use statistics to adjust their game strategies based on the past performance of players and opposing teams.

Managerial Decision-Making: Team managers can use statistics to make decisions about player recruitment, trades, and contracts.

Predictions and Modeling: Historical data can be used to create prediction models for future game and season outcomes.

It's important to note that not all NBA APIs are unlocked for public access, making this web scraping API a valuable tool for collecting data relevant to data analysis.

# 0. Importing Libraries



In [None]:
import pandas as pd
import requests
pd.set_option('display.max_columns', None) # So we can see all columns in a wide DataFrame
import time
import numpy as np

# 1. Request

In [None]:
test_url = 'https://stats.nba.com/stats/leagueLeaders?LeagueID=00&PerMode=PerGame&Scope=S&Season=2012-13&SeasonType=Regular%20Season&StatCategory=PTS'

In [None]:
r = requests.get(url=test_url).json()

In [None]:
r

{'resource': 'leagueleaders',
 'parameters': {'LeagueID': '00',
  'PerMode': 'PerGame',
  'StatCategory': 'PTS',
  'Season': '2012-13',
  'SeasonType': 'Regular Season',
  'Scope': 'S',
  'ActiveFlag': None},
 'resultSet': {'name': 'LeagueLeaders',
  'headers': ['PLAYER_ID',
   'RANK',
   'PLAYER',
   'TEAM_ID',
   'TEAM',
   'GP',
   'MIN',
   'FGM',
   'FGA',
   'FG_PCT',
   'FG3M',
   'FG3A',
   'FG3_PCT',
   'FTM',
   'FTA',
   'FT_PCT',
   'OREB',
   'DREB',
   'REB',
   'AST',
   'STL',
   'BLK',
   'TOV',
   'PTS',
   'EFF'],
  'rowSet': [[2546,
    1,
    'Carmelo Anthony',
    1610612752,
    'NYK',
    67,
    37.0,
    10.0,
    22.2,
    0.449,
    2.3,
    6.2,
    0.379,
    6.3,
    7.6,
    0.83,
    2.0,
    4.9,
    6.9,
    2.6,
    0.8,
    0.5,
    2.6,
    28.7,
    23.2],
   [201142,
    2,
    'Kevin Durant',
    1610612760,
    'OKC',
    81,
    38.5,
    9.0,
    17.7,
    0.51,
    1.7,
    4.1,
    0.416,
    8.4,
    9.3,
    0.905,
    0.6,
    7.3,
    7

In [None]:
# Columns view
table_headers =  r['resultSet']['headers']

In [None]:
table_headers

['PLAYER_ID',
 'RANK',
 'PLAYER',
 'TEAM_ID',
 'TEAM',
 'GP',
 'MIN',
 'FGM',
 'FGA',
 'FG_PCT',
 'FG3M',
 'FG3A',
 'FG3_PCT',
 'FTM',
 'FTA',
 'FT_PCT',
 'OREB',
 'DREB',
 'REB',
 'AST',
 'STL',
 'BLK',
 'TOV',
 'PTS',
 'EFF']

In [None]:
r['resultSet']['rowSet'][0]

[2546,
 1,
 'Carmelo Anthony',
 1610612752,
 'NYK',
 67,
 37.0,
 10.0,
 22.2,
 0.449,
 2.3,
 6.2,
 0.379,
 6.3,
 7.6,
 0.83,
 2.0,
 4.9,
 6.9,
 2.6,
 0.8,
 0.5,
 2.6,
 28.7,
 23.2]

In [None]:
pd.DataFrame(r['resultSet']['rowSet'], columns= table_headers)

Unnamed: 0,PLAYER_ID,RANK,PLAYER,TEAM_ID,TEAM,GP,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PTS,EFF
0,2546,1,Carmelo Anthony,1610612752,NYK,67,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.830,2.0,4.9,6.9,2.6,0.8,0.5,2.6,28.7,23.2
1,201142,2,Kevin Durant,1610612760,OKC,81,38.5,9.0,17.7,0.510,1.7,4.1,0.416,8.4,9.3,0.905,0.6,7.3,7.9,4.6,1.4,1.3,3.5,28.1,30.4
2,977,3,Kobe Bryant,1610612747,LAL,78,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,0.8,4.7,5.6,6.0,1.4,0.3,3.7,27.3,24.6
3,2544,4,LeBron James,1610612748,MIA,76,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,1.3,6.8,8.0,7.3,1.7,0.9,3.0,26.8,32.2
4,201935,5,James Harden,1610612745,HOU,78,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,0.8,4.1,4.9,5.8,1.8,0.5,3.8,25.9,24.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,203143,179,Pablo Prigioni,1610612752,NYK,78,16.2,1.3,2.8,0.455,0.7,1.7,0.396,0.3,0.3,0.880,0.5,1.3,1.8,3.0,0.9,0.0,1.1,3.5,6.6
179,2563,180,Dahntay Jones,1610612737,ATL,78,13.0,1.1,2.8,0.369,0.1,0.6,0.224,1.1,1.4,0.770,0.3,1.0,1.3,0.6,0.3,0.1,0.5,3.4,3.0
180,203110,181,Draymond Green,1610612744,GSW,79,13.4,1.1,3.3,0.327,0.2,0.8,0.209,0.6,0.7,0.818,0.7,2.6,3.3,0.7,0.5,0.3,0.6,2.9,4.7
181,101236,182,Chuck Hayes,1610612758,SAC,74,16.3,1.1,2.6,0.442,0.0,0.0,0.000,0.4,0.6,0.625,1.5,2.5,4.0,1.5,0.4,0.2,0.6,2.7,6.6


In [None]:
temp_df1= pd.DataFrame(r['resultSet']['rowSet'], columns = table_headers)

In [None]:
temp_df2 = pd.DataFrame({'Year':['2012-2013' for i in range(len(temp_df1))],
                         'Season_type':['Regular%20Season' for i in range(len(temp_df1))]})

In [None]:
temp_df2

Unnamed: 0,Year,Season_type
0,2012-2013,Regular%20Season
1,2012-2013,Regular%20Season
2,2012-2013,Regular%20Season
3,2012-2013,Regular%20Season
4,2012-2013,Regular%20Season
...,...,...
178,2012-2013,Regular%20Season
179,2012-2013,Regular%20Season
180,2012-2013,Regular%20Season
181,2012-2013,Regular%20Season


In [None]:
temp_df3 = pd.concat ((temp_df2,temp_df1), axis = 1)

In [None]:
temp_df3

Unnamed: 0,Year,Season_type,PLAYER_ID,RANK,PLAYER,TEAM_ID,TEAM,GP,MIN,FGM,FGA,FG_PCT,FG3M,FG3A,FG3_PCT,FTM,FTA,FT_PCT,OREB,DREB,REB,AST,STL,BLK,TOV,PTS,EFF
0,2012-2013,Regular%20Season,2546,1,Carmelo Anthony,1610612752,NYK,67,37.0,10.0,22.2,0.449,2.3,6.2,0.379,6.3,7.6,0.830,2.0,4.9,6.9,2.6,0.8,0.5,2.6,28.7,23.2
1,2012-2013,Regular%20Season,201142,2,Kevin Durant,1610612760,OKC,81,38.5,9.0,17.7,0.510,1.7,4.1,0.416,8.4,9.3,0.905,0.6,7.3,7.9,4.6,1.4,1.3,3.5,28.1,30.4
2,2012-2013,Regular%20Season,977,3,Kobe Bryant,1610612747,LAL,78,38.6,9.5,20.4,0.463,1.7,5.2,0.324,6.7,8.0,0.839,0.8,4.7,5.6,6.0,1.4,0.3,3.7,27.3,24.6
3,2012-2013,Regular%20Season,2544,4,LeBron James,1610612748,MIA,76,37.9,10.1,17.8,0.565,1.4,3.3,0.406,5.3,7.0,0.753,1.3,6.8,8.0,7.3,1.7,0.9,3.0,26.8,32.2
4,2012-2013,Regular%20Season,201935,5,James Harden,1610612745,HOU,78,38.3,7.5,17.1,0.438,2.3,6.2,0.368,8.6,10.2,0.851,0.8,4.1,4.9,5.8,1.8,0.5,3.8,25.9,24.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
178,2012-2013,Regular%20Season,203143,179,Pablo Prigioni,1610612752,NYK,78,16.2,1.3,2.8,0.455,0.7,1.7,0.396,0.3,0.3,0.880,0.5,1.3,1.8,3.0,0.9,0.0,1.1,3.5,6.6
179,2012-2013,Regular%20Season,2563,180,Dahntay Jones,1610612737,ATL,78,13.0,1.1,2.8,0.369,0.1,0.6,0.224,1.1,1.4,0.770,0.3,1.0,1.3,0.6,0.3,0.1,0.5,3.4,3.0
180,2012-2013,Regular%20Season,203110,181,Draymond Green,1610612744,GSW,79,13.4,1.1,3.3,0.327,0.2,0.8,0.209,0.6,0.7,0.818,0.7,2.6,3.3,0.7,0.5,0.3,0.6,2.9,4.7
181,2012-2013,Regular%20Season,101236,182,Chuck Hayes,1610612758,SAC,74,16.3,1.1,2.6,0.442,0.0,0.0,0.000,0.4,0.6,0.625,1.5,2.5,4.0,1.5,0.4,0.2,0.6,2.7,6.6


In [None]:
del temp_df1, temp_df2, temp_df3

# 2. Dataframe Creation

Building the Dataset from the zero, inclunding more years and the playoffs in the table.

In [None]:
df_cols = ['Year', 'Season_type'] + table_headers

In [None]:
#Creating the columns of the DataFrame df
df = pd.DataFrame(columns = df_cols)

In [None]:
headers = {
    'Accept':'*/*',
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'pt-BR,pt;q=0.9',
    'Connection':'keep-alive',
    'Host':'stats.nba.com',
    'Origin':'https://www.nba.com',
    'Referer':'https://www.nba.com/',
    'Sec-Ch-Ua':' "Chromium";v="116", "Not)A;Brand";v="24", "Google Chrome";v="116" ',
    'Sec-Ch-Ua-Mobile':'?0',
    'Sec-Ch-Ua-Platform':"Windows",
    'Sec-Fetch-Dest':'empty',
    'Sec-Fetch-Mode':'cors',
    'Sec-Fetch-Site':'same-site',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'
}

In [None]:
# Including more years and the season types

season_types = ['Regular%20Season', 'Playoffs']
years = ['2012-13','2013-14','2014-15','2015-16','2016-17','2017-18','2018-19','2019-20','2020-21','2021-22','2022-23']

begin_loop = time.time()

for y in years:
  for s in season_types:
    api_url = 'https://stats.nba.com/stats/leagueLeaders?LeagueID=00&PerMode=PerGame&Scope=S&Season='+y+'&SeasonType='+s+'&StatCategory=PTS'
    r = requests.get(url=api_url).json()
    temp_df1= pd.DataFrame(r['resultSet']['rowSet'], columns = table_headers)
    temp_df2 = pd.DataFrame({'Year':[y for i in range(len(temp_df1))],
                         'Season_type':[s for i in range(len(temp_df1))]})
    temp_df3=pd.concat ((temp_df2,temp_df1), axis = 1)
    df = pd.concat([df, temp_df3],axis = 0)
    print(f'Finished Scraping Data for the{y} {s}.')
    lag = np.random.uniform(low=5, high=40)
    print(f' waiting {round(lag,1)} seconds')
    time.sleep(lag)

print(f'Process completed: Total run time: {round((time.time()- begin_loop)/60,2)}')
df.to_excel('nba_player_data.xlsx', index= False)

Finished Scraping Data for the2012-13 Regular%20Season.
 waiting 19.5 seconds
Finished Scraping Data for the2012-13 Playoffs.
 waiting 20.5 seconds
Finished Scraping Data for the2013-14 Regular%20Season.
 waiting 17.1 seconds
Finished Scraping Data for the2013-14 Playoffs.
 waiting 30.8 seconds
Finished Scraping Data for the2014-15 Regular%20Season.
 waiting 9.7 seconds
Finished Scraping Data for the2014-15 Playoffs.
 waiting 19.1 seconds
Finished Scraping Data for the2015-16 Regular%20Season.
 waiting 32.8 seconds
Finished Scraping Data for the2015-16 Playoffs.
 waiting 16.1 seconds
Finished Scraping Data for the2016-17 Regular%20Season.
 waiting 38.3 seconds
Finished Scraping Data for the2016-17 Playoffs.
 waiting 13.3 seconds
Finished Scraping Data for the2017-18 Regular%20Season.
 waiting 24.0 seconds
Finished Scraping Data for the2017-18 Playoffs.
 waiting 34.0 seconds
Finished Scraping Data for the2018-19 Regular%20Season.
 waiting 13.7 seconds
Finished Scraping Data for the2018-