# NHL 2009-2018 Draft dataset

This notebook describes the process of retrieval

# NHL API

NHL provides two main APIs: Stats API and Records API. Description of some of their endpoints is provided below (taken from [Philip Bulsink](https://gitlab.com/dword4/nhlapi)).

In [1]:
import requests
import pandas as pd
from time import time
import os

## NHL Stats API endpoints

Only some endpoints are presented here, full description can be found [here](https://gitlab.com/dword4/nhlapi/blob/master/stats-api.md).

* Teams  

GET https://statsapi.web.nhl.com/api/v1/teams   
Returns a list of data about all teams including their id, venue details, division, conference and franchise information.  

GET https://statsapi.web.nhl.com/api/v1/teams/ID/roster  
Returns entire roster for a team including id value, name, jersey number and position details.

* People

**GET** https://statsapi.web.nhl.com/api/v1/people/ID  
Gets details for a player, must specify the id value in order to return data.

**GET** https://statsapi.web.nhl.com/api/v1/people/ID/stats  
Complex endpoint with lots of append options to change what kind of stats you wish to obtain

_Modifiers_  
?stats=statsSingleSeason&season=19801981   
Obtains single season statistics for a player  

?stats=homeAndAway&season=20162017  
Provides a split between home and away games.

?stats=winLoss&season=20162017  
Very similar to the previous modifier except it provides the W/L/OT split instead of Home and Away

?stats=byMonth&season=20162017  
Monthly split of stats

?stats=byDayOfWeek&season=20162017  
Split done by day of the week

?stats=vsDivision&season=20162017  
Division stats split

?stats=vsConference&season=20162017  
Conference stats split

?stats=vsTeam&season=20162017  
Conference stats split

?stats=gameLog&season=20162017  
Provides a game log showing stats for each game of a season

?stats=regularSeasonStatRankings&season=20162017   
Returns where someone stands vs the rest of the league for a specific regularSeasonStatRankings

?stats=goalsByGameSituation&season=20162017  
Shows number on when goals for a
player happened like how many in the shootout, how many in each period, etc.

?stats=onPaceRegularSeason&season=20172018  
This only works with the current in-progress season and shows projected totals based on current onPaceRegularSeason

* Draft

GET https://statsapi.web.nhl.com/api/v1/draft  
Get round-by-round data for current year's NHL Entry Draft.

**GET** https://statsapi.web.nhl.com/api/v1/draft/YEAR  
Takes a YYYY format year and returns draft data

* Prospects

**GET** https://statsapi.web.nhl.com/api/v1/draft/prospects  
Get all NHL Entry Draft prospects.

GET https://statsapi.web.nhl.com/api/v1/draft/prospects/ID  
Get an NHL Entry Draft prospect.

## NHL Records API endpoints

Only some endpoints are presented here, full description can be found [here](https://gitlab.com/dword4/nhlapi/blob/master/records-api.md).

All queries are prefixed with https://records.nhl.com/site/api and are GET
requests unless otherwise noted.

**Filtering**

This is slightly different than the normal NHL API, see the following example:
https://records.nhl.com/site/api/draft?cayenneExp=draftYear=2017%20and%20draftedByTeamId=15

The %20 value translates to a space, this needs to be taken into account as removing the spaces
will break the query, so anything after cayenneExp can have spaces when separating two
or more conditions.

Often you can filter by information returned in an unfiltered query so using
the draft example you can append roundNumber=4 onto the cayenneExp to only look at 4th
round selections.

* Draft

**GET** https://records.nhl.com/site/api/draft  
Returns ALOT of draft data, looks to be every pick ever

_Filtering_  
?cayenneExp=draftYear=2017  
This filters by a single year.

draftedByTeamId=ID  
drill down to a specific teams drafting


In [3]:
os.chdir('Documents/repos/nhl_draft')
os.listdir()

['.git',
 '.gitattributes',
 '.gitignore',
 '.idea',
 'data',
 'design',
 'main.py',
 'models',
 'notebooks',
 'README.md',
 'requirements.txt']

In [14]:
url = 'https://records.nhl.com/site/api/draft'
save_path = 'data/nhl_api/nhl_draft_all.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "took {0:.2f} seconds."
      .format(elapsed))
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.took 0.94 seconds.
----- DataFrame with NHL Draft Data loaded
in 0.08 seconds
with 11,587 rows
and 25 columns
-- Column names:
 Index(['amateurClubName', 'amateurLeague', 'birthDate', 'birthPlace',
       'countryCode', 'csPlayerId', 'draftYear', 'draftedByTeamId',
       'firstName', 'height', 'id', 'lastName', 'overallPickNumber',
       'pickInRound', 'playerId', 'playerName', 'position', 'removedOutright',
       'removedOutrightWhy', 'roundNumber', 'shootsCatches',
       'supplementalDraft', 'teamPickHistory', 'triCode', 'weight'],
      dtype='object')
saved to file:
 ../../data/nhl_api/nhl_draft_all.csv


## Teams
### Get team info from the NHL Stats API Teams endpoint

In [13]:
url = 'https://statsapi.web.nhl.com/api/v1/teams'
save_path = 'data/nhl_api/teams.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))
json_data['copyright']

JSON received from NHL API.
took 0.15 seconds.


'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.'

### Save results to a .csv file

In [11]:
t = time()
df_raw = pd.DataFrame(json_data['teams'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

----- DataFrame with NHL Draft Data loaded
in 0.01 seconds
with 31 rows
and 15 columns
-- Column names:
 Index(['abbreviation', 'active', 'conference', 'division', 'firstYearOfPlay',
       'franchise', 'franchiseId', 'id', 'link', 'locationName', 'name',
       'officialSiteUrl', 'shortName', 'teamName', 'venue'],
      dtype='object')
saved to file:
 data/nhl_api/teams.csv


## Prospects
### Get NHL Draft Prospects info from the NHL Stats API Teams endpoint

In [18]:
url = 'https://statsapi.web.nhl.com/api/v1/draft/prospects'
save_path = 'data/nhl_api/prospects.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))
json_data['copyright']

JSON received from NHL API.
took 0.63 seconds.


'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.'

### Save results to a .csv file

In [21]:
t = time()
df_raw = pd.DataFrame(json_data['prospects'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

----- DataFrame with NHL Draft Data loaded
in 0.07 seconds
with 9,632 rows
and 20 columns
-- Column names:
 Index(['amateurLeague', 'amateurTeam', 'birthCity', 'birthCountry',
       'birthDate', 'birthStateProvince', 'draftStatus', 'firstName',
       'fullName', 'height', 'id', 'lastName', 'link', 'nationality',
       'nhlPlayerId', 'primaryPosition', 'prospectCategory', 'ranks',
       'shootsCatches', 'weight'],
      dtype='object')
saved to file:
 data/nhl_api/prospects.csv


## Attendance
### Get arena attendance info from the NHL Records API Attendance endpoint
Seems to return only limited information, perhaps the call to the API needs to be modified.

In [26]:
url = 'https://records.nhl.com/site/api/attendance'
save_path = 'data/nhl_api/attendance.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

JSON received from NHL API.
took 0.20 seconds.


### Save results to a .csv file

In [27]:
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

----- DataFrame with NHL Draft Data loaded
in 0.01 seconds
with 44 rows
and 5 columns
-- Column names:
 Index(['id', 'playoffAttendance', 'regularAttendance', 'seasonId',
       'totalAttendance'],
      dtype='object')
saved to file:
 data/nhl_api/attendance.csv


## Detailed records
### Get detailed records from the NHL Records API Records endpoint
#### Main records directory with restURLs for each record category

In [35]:
url = 'https://records.nhl.com/site/api/record-detail'
save_path = 'data/nhl_api/records/records_main.csv'

# get data from NHL API
t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

# records results to a .csv file
t = time()
df_records = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_records.shape[0], df_records.shape[1]) + 
      "\n-- Column names:\n", df_records.columns)
df_records.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.
took 0.25 seconds.
----- DataFrame with NHL Draft Data loaded
in 0.01 seconds
with 657 rows
and 6 columns
-- Column names:
 Index(['description', 'descriptionKey', 'id', 'restUrl', 'sequence',
       'videoId'],
      dtype='object')
saved to file:
 data/nhl_api/records/records_main.csv


In [36]:
cols = ['description', 'descriptionKey', 'restUrl']
df_records[cols]

Unnamed: 0,description,descriptionKey,restUrl
0,"Most Goals, Rookie, Season",most-goals-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
1,"Most Goals, Rookie, Game",most-goals-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
2,"Most Goals, Rookie, First NHL Game",most-goals-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
3,"Most Assists, Rookie, Season",most-assists-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
4,"Most Assists, Rookie, Game",most-assists-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
5,"Most Assists, Rookie, First NHL Game",most-assists-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
6,"Most Points, Rookie, Season",most-points-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
7,"Most Points, Rookie, Game",most-points-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
8,"Most Points, Rookie, First NHL Game",most-points-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
9,"Most Goals, Rookie Defenseman, Season",most-goals-rookie-defenseman-one-season,/site/api/skater-regular-season-scoring?cayenn...


#### Get detailed records using restUrls obtained above

In [37]:
url_records = 'https://records.nhl.com'
line = 0
print(df_records.loc[line, 'description'] + '\n')
url = url_records + df_records.loc[line, 'restUrl']
save_path = 'data/nhl_api/records/' + \
            df_records.loc[line, 'descriptionKey'] + \
            '.csv'

# request data from NHL API
t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

# record results to file
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

Most Goals, Rookie, Season

JSON received from NHL API.
took 0.49 seconds.
----- DataFrame with NHL Draft Data loaded
in 0.03 seconds
with 3,377 rows
and 35 columns
-- Column names:
 Index(['activePlayer', 'assists', 'assistsPerGpMin20', 'firstGoals',
       'firstName', 'fiveGoalGames', 'fourGoalGames', 'gameWinningGoals',
       'gamesInSchedule', 'gamesPlayed', 'goals', 'goalsPerGpMin20',
       'goalsPerGpMin50', 'id', 'lastName', 'overtimeAssists', 'overtimeGoals',
       'overtimePoints', 'penalties', 'penaltyMinutes', 'playerId', 'points',
       'pointsPerGpMin50', 'positionCode', 'powerPlayGoals', 'rookieFlag',
       'seasonId', 'sevenGoalGames', 'shorthandedGoals', 'shots',
       'sixGoalGames', 'teamAbbrevs', 'teamNames', 'threeGoalGames',
       'threeOrMoreGoalGames'],
      dtype='object')
saved to file:
 data/nhl_api/records/most-goals-rookie-one-season.csv


In [39]:
df_raw['seasonId'].value_counts()

20052006    83
20162017    82
19811982    78
19841985    76
19801981    73
19921993    71
20172018    70
19821983    68
20182019    67
20102011    66
20152016    65
20022003    64
19871988    63
20072008    63
20002001    63
19741975    62
19951996    61
20132014    61
20112012    59
19961997    59
20082009    58
19992000    58
19911992    56
19671968    56
19931994    55
19831984    54
19891990    54
19851986    53
19721973    52
20062007    52
            ..
19301931    13
19401941    12
19581959    12
19461947    12
19601961    11
19341935    11
19271928    11
19491950    11
19621963    11
19591960    11
19281929    10
19321933    10
19391940    10
19331934    10
19631964     9
19541955     8
19451946     8
19191920     8
19661967     8
19651966     7
19311932     7
19371938     6
19571958     6
19231924     5
19361937     4
19381939     4
19351936     3
19211922     3
19181919     3
19221923     2
Name: seasonId, Length: 100, dtype: int64

In [42]:
mask = df_raw['seasonId'] == 19181919
cols = ['firstName', 'lastName', 'positionCode', 'seasonId', 'goals', 'assists']
df_raw[mask]

Unnamed: 0,activePlayer,assists,assistsPerGpMin20,firstGoals,firstName,fiveGoalGames,fourGoalGames,gameWinningGoals,gamesInSchedule,gamesPlayed,...,rookieFlag,seasonId,sevenGoalGames,shorthandedGoals,shots,sixGoalGames,teamAbbrevs,teamNames,threeGoalGames,threeOrMoreGoalGames
260,False,6,,2,Odie,0.0,1.0,1,18,18,...,True,19181919,0.0,,,0.0,MTL,Montréal Canadiens,0.0,1.0
1587,False,8,,1,Sprague,,,2,18,18,...,True,19181919,,,,,SEN,Ottawa Senators (1917),,
2401,False,3,,0,Punch,,,1,18,8,...,True,19181919,,,,,SEN,Ottawa Senators (1917),,


In [None]:
df_raw
