### NBL Shot Data for the Entire 2016/2017 Season

**Eric Nesi**

**All Code in Python 3**

This notebook is meant to scrape data from fibalivestats.com for each game of the 2016/2017 NBL season. Unfortunately, some of the games are missing data.  I will try to find some data elsewhere to make up for these games; however, I may be missing some for the season.  

In [6]:
import requests
import json
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

In [1]:
nbl_urls = []

for gm_num in range(314685, 314797): 
    if gm_num == 314700:
        None
    elif gm_num == 314702:
        None
    elif gm_num == 314773:
        None
    else:
        url1 = ['http://www.fibalivestats.com/data/' + str(gm_num) + '/data.json']
        nbl_urls.extend(url1)
nbl_urls[0:10]

['http://www.fibalivestats.com/data/314685/data.json',
 'http://www.fibalivestats.com/data/314686/data.json',
 'http://www.fibalivestats.com/data/314687/data.json',
 'http://www.fibalivestats.com/data/314688/data.json',
 'http://www.fibalivestats.com/data/314689/data.json',
 'http://www.fibalivestats.com/data/314690/data.json',
 'http://www.fibalivestats.com/data/314691/data.json',
 'http://www.fibalivestats.com/data/314692/data.json',
 'http://www.fibalivestats.com/data/314693/data.json',
 'http://www.fibalivestats.com/data/314694/data.json']

In [8]:
#playoffs Round 1 scrape
for gm_num in range(574998, 575003): 
    if gm_num == 574999:
        None
    else:
        url1 = ['http://www.fibalivestats.com/data/' + str(gm_num) + '/data.json']
        nbl_urls.extend(url1)
        
nbl_urls[104:111]

['http://www.fibalivestats.com/data/314792/data.json',
 'http://www.fibalivestats.com/data/314793/data.json',
 'http://www.fibalivestats.com/data/314794/data.json',
 'http://www.fibalivestats.com/data/314795/data.json',
 'http://www.fibalivestats.com/data/314796/data.json',
 'http://www.fibalivestats.com/data/574998/data.json',
 'http://www.fibalivestats.com/data/575000/data.json']

In [9]:
#playoffs Grand Final scrape
for gm_num in range(579821, 579824): 
    url1 = ['http://www.fibalivestats.com/data/' + str(gm_num) + '/data.json']
    nbl_urls.extend(url1)
nbl_urls[109:118]

['http://www.fibalivestats.com/data/574998/data.json',
 'http://www.fibalivestats.com/data/575000/data.json',
 'http://www.fibalivestats.com/data/575001/data.json',
 'http://www.fibalivestats.com/data/575002/data.json',
 'http://www.fibalivestats.com/data/579821/data.json',
 'http://www.fibalivestats.com/data/579822/data.json',
 'http://www.fibalivestats.com/data/579823/data.json']

In [10]:
#check to see if compiled correctly
nbl_urls[110:120]

['http://www.fibalivestats.com/data/575000/data.json',
 'http://www.fibalivestats.com/data/575001/data.json',
 'http://www.fibalivestats.com/data/575002/data.json',
 'http://www.fibalivestats.com/data/579821/data.json',
 'http://www.fibalivestats.com/data/579822/data.json',
 'http://www.fibalivestats.com/data/579823/data.json']

In [11]:
##Used Greg Reda as a guide, but created a function so I could read in the entire season at once
##Created a list of urls to shotcharts for every game in 2016/2017 season
## http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/
total_shots=[]

def nbl_shot_data(shots_url):
    
    # request the URL and parse the JSON
    response = requests.get(shots_url)
    response.raise_for_status() # raise exception if invalid response
    shots = response.json()['tm']['1']['shot']
    shots2 = response.json()['tm']['2']['shot']
    total_shots.extend(shots)
    total_shots.extend(shots2)

In [12]:
#run the function
for url in nbl_urls:
   nbl_shot_data(url)

In [13]:
#check to see results
total_shots[0:1]

[{'actionType': '2pt',
  'p': 5,
  'per': 1,
  'perType': 'REGULAR',
  'player': 'J. McKay',
  'pno': 5,
  'r': 0,
  'shirtNumber': '5',
  'subType': 'jumpshot',
  'tno': 1,
  'x': 10.018,
  'y': 50.68}]

In [15]:
#create Dataframe
nbl_df = pd.DataFrame.from_records(total_shots)

In [16]:
nbl_df.shape

(15547, 12)

In [17]:
nbl_df = nbl_df[0:15547]

In [18]:
#check it out
nbl_df.tail(5)

Unnamed: 0,actionType,p,per,perType,player,pno,r,shirtNumber,subType,tno,x,y
15542,3pt,7,4,REGULAR,R. Martin,7.0,1,13,,2.0,32.787,47.619
15543,2pt,8,4,REGULAR,R. Clarke,8.0,0,15,jumpshot,2.0,6.74,74.15
15544,2pt,7,4,REGULAR,R. Martin,7.0,0,13,jumpshot,2.0,7.286,50.68
15545,3pt,8,4,REGULAR,R. Clarke,8.0,1,15,,2.0,31.876,78.571
15546,2pt,10,4,REGULAR,M. Harris,10.0,1,23,layup,2.0,4.554,60.204


In [20]:
#immediately notice some of my players have number, Name in the player column and no pno
#create new column that gets just the number using reg expression
nbl_df['Num'] = nbl_df['player'].str.replace(r'\D+', '')

In [21]:
#fill null values in player number column with new column values
nbl_df.shirtNumber = nbl_df.shirtNumber.fillna(value=nbl_df.Num)

In [22]:
#clean player data to make it easier to parse, replace , and spaces with nothing
nbl_df['player'] = nbl_df['player'].str.replace(',', '')
nbl_df['player'] = nbl_df['player'].str.replace(' ', '')

In [23]:
#replace all digits with nothing and then replace . with .space in order to get the format First Inital Last Name
nbl_df['player'] = nbl_df['player'].str.replace('\d', '')
nbl_df['player'] = nbl_df['player'].str.replace('.', '. ')

In [25]:
#drop column don't need
nbl_df = nbl_df.drop('Num', 1)

In [26]:
nbl_df.head()

Unnamed: 0,actionType,p,per,perType,player,pno,r,shirtNumber,subType,tno,x,y
0,2pt,5,1,REGULAR,J. McKay,5.0,0,5,jumpshot,1.0,10.018,50.68
1,2pt,12,1,REGULAR,C. Prather,12.0,0,23,jumpshot,1.0,9.836,56.803
2,2pt,5,1,REGULAR,J. McKay,5.0,1,5,jumpshot,1.0,7.104,54.082
3,2pt,17,1,REGULAR,D. Martin,17.0,1,53,jumpshot,1.0,10.383,36.395
4,2pt,5,1,REGULAR,J. McKay,5.0,1,5,jumpshot,1.0,8.015,56.463


In [27]:
#export to csv
nbl_df.to_csv('/Users/ericnesi/Desktop/capstone_eric/datasets/All_Shots.csv')

#### UPDATE
Found data for two of the games I was missing on SpatialJam.com Github.  I am still missing two games from the year.

### END