# NHL 2009-2018 Draft dataset

This notebook describes the process of retrieval

# NHL API

NHL provides two main APIs: Stats API and Records API. Description of some of their endpoints is provided below (taken from [Philip Bulsink](https://gitlab.com/dword4/nhlapi)).

In [1]:
import requests
import pandas as pd
from time import time
import os

## NHL Stats API endpoints

Only some endpoints are presented here, full description can be found [here](https://gitlab.com/dword4/nhlapi/blob/master/stats-api.md).

* Teams  

GET https://statsapi.web.nhl.com/api/v1/teams   
Returns a list of data about all teams including their id, venue details, division, conference and franchise information.  

GET https://statsapi.web.nhl.com/api/v1/teams/ID/roster  
Returns entire roster for a team including id value, name, jersey number and position details.

* People

**GET** https://statsapi.web.nhl.com/api/v1/people/ID  
Gets details for a player, must specify the id value in order to return data.

**GET** https://statsapi.web.nhl.com/api/v1/people/ID/stats  
Complex endpoint with lots of append options to change what kind of stats you wish to obtain

_Modifiers_  
?stats=statsSingleSeason&season=19801981   
Obtains single season statistics for a player  

?stats=homeAndAway&season=20162017  
Provides a split between home and away games.

?stats=winLoss&season=20162017  
Very similar to the previous modifier except it provides the W/L/OT split instead of Home and Away

?stats=byMonth&season=20162017  
Monthly split of stats

?stats=byDayOfWeek&season=20162017  
Split done by day of the week

?stats=vsDivision&season=20162017  
Division stats split

?stats=vsConference&season=20162017  
Conference stats split

?stats=vsTeam&season=20162017  
Conference stats split

?stats=gameLog&season=20162017  
Provides a game log showing stats for each game of a season

?stats=regularSeasonStatRankings&season=20162017   
Returns where someone stands vs the rest of the league for a specific regularSeasonStatRankings

?stats=goalsByGameSituation&season=20162017  
Shows number on when goals for a
player happened like how many in the shootout, how many in each period, etc.

?stats=onPaceRegularSeason&season=20172018  
This only works with the current in-progress season and shows projected totals based on current onPaceRegularSeason

* Draft

GET https://statsapi.web.nhl.com/api/v1/draft  
Get round-by-round data for current year's NHL Entry Draft.

**GET** https://statsapi.web.nhl.com/api/v1/draft/YEAR  
Takes a YYYY format year and returns draft data

* Prospects

**GET** https://statsapi.web.nhl.com/api/v1/draft/prospects  
Get all NHL Entry Draft prospects.

GET https://statsapi.web.nhl.com/api/v1/draft/prospects/ID  
Get an NHL Entry Draft prospect.

## NHL Records API endpoints

Only some endpoints are presented here, full description can be found [here](https://gitlab.com/dword4/nhlapi/blob/master/records-api.md).

All queries are prefixed with https://records.nhl.com/site/api and are GET
requests unless otherwise noted.

**Filtering**

This is slightly different than the normal NHL API, see the following example:
https://records.nhl.com/site/api/draft?cayenneExp=draftYear=2017%20and%20draftedByTeamId=15

The %20 value translates to a space, this needs to be taken into account as removing the spaces
will break the query, so anything after cayenneExp can have spaces when separating two
or more conditions.

Often you can filter by information returned in an unfiltered query so using
the draft example you can append roundNumber=4 onto the cayenneExp to only look at 4th
round selections.

* Draft

**GET** https://records.nhl.com/site/api/draft  
Returns ALOT of draft data, looks to be every pick ever

_Filtering_  
?cayenneExp=draftYear=2017  
This filters by a single year.

draftedByTeamId=ID  
drill down to a specific teams drafting


In [3]:
os.chdir('Documents/repos/nhl_draft')
os.listdir()

['.git',
 '.gitattributes',
 '.gitignore',
 '.idea',
 'data',
 'design',
 'main.py',
 'models',
 'notebooks',
 'README.md',
 'requirements.txt']

In [14]:
url = 'https://records.nhl.com/site/api/draft'
save_path = 'data/nhl_api/nhl_draft_all.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "took {0:.2f} seconds."
      .format(elapsed))
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.took 0.94 seconds.
----- DataFrame with NHL Draft Data loaded
in 0.08 seconds
with 11,587 rows
and 25 columns
-- Column names:
 Index(['amateurClubName', 'amateurLeague', 'birthDate', 'birthPlace',
       'countryCode', 'csPlayerId', 'draftYear', 'draftedByTeamId',
       'firstName', 'height', 'id', 'lastName', 'overallPickNumber',
       'pickInRound', 'playerId', 'playerName', 'position', 'removedOutright',
       'removedOutrightWhy', 'roundNumber', 'shootsCatches',
       'supplementalDraft', 'teamPickHistory', 'triCode', 'weight'],
      dtype='object')
saved to file:
 ../../data/nhl_api/nhl_draft_all.csv


## Teams
### Get team info from the NHL Stats API Teams endpoint

In [13]:
url = 'https://statsapi.web.nhl.com/api/v1/teams'
save_path = 'data/nhl_api/teams.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))
json_data['copyright']

JSON received from NHL API.
took 0.15 seconds.


'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.'

### Save results to a .csv file

In [11]:
t = time()
df_raw = pd.DataFrame(json_data['teams'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

----- DataFrame with NHL Draft Data loaded
in 0.01 seconds
with 31 rows
and 15 columns
-- Column names:
 Index(['abbreviation', 'active', 'conference', 'division', 'firstYearOfPlay',
       'franchise', 'franchiseId', 'id', 'link', 'locationName', 'name',
       'officialSiteUrl', 'shortName', 'teamName', 'venue'],
      dtype='object')
saved to file:
 data/nhl_api/teams.csv


## Prospects
### Get NHL Draft Prospects info from the NHL Stats API Teams endpoint

In [12]:
url = 'https://statsapi.web.nhl.com/api/v1/draft/prospects'
save_path = 'data/nhl_api/prospects.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

JSON received from NHL API.
took 0.63 seconds.


### Save results to a .csv file

In [None]:
s

In [None]:
t = time()
df_raw = pd.DataFrame(json_data['teams'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)
