# NHL dataset
# Step 1. Obtain data (OSEMN methodology)
# Collecting data from the NHL API

This notebook presents the process of collecting data from the NHL API.

For detailed description of OSEMN methodology, NHL APIs and their endpoints, refer to folder **methodology**.

In [1]:
import requests
import pandas as pd
from time import time
import os

In [8]:
data_path = '../../data/nhl_api/'
os.listdir(data_path)

[]

## NHL Draft data for all seasons
Obtained from the NHL Records API Draft endpoint.

In [9]:
url = 'https://records.nhl.com/site/api/draft'
save_path = data_path + 'nhl_draft_all.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "took {0:.2f} seconds."
      .format(elapsed))
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with NHL Draft Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.took 0.98 seconds.
----- DataFrame with NHL Draft Data loaded
in 0.06 seconds
with 11,588 rows
and 25 columns
-- Column names:
 Index(['amateurClubName', 'amateurLeague', 'birthDate', 'birthPlace',
       'countryCode', 'csPlayerId', 'draftYear', 'draftedByTeamId',
       'firstName', 'height', 'id', 'lastName', 'overallPickNumber',
       'pickInRound', 'playerId', 'playerName', 'position', 'removedOutright',
       'removedOutrightWhy', 'roundNumber', 'shootsCatches',
       'supplementalDraft', 'teamPickHistory', 'triCode', 'weight'],
      dtype='object')
saved to file:
 ../../data/nhl_api/nhl_draft_all.csv


## NHL Teams
Obtained from the NHL Stats API Teams endpoint.

In [14]:
url = 'https://statsapi.web.nhl.com/api/v1/teams'
save_path = data_path + 'teams.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))
print("\nCopyright:\n", json_data['copyright'])

t = time()
df_raw = pd.DataFrame(json_data['teams'])
elapsed = time() - t
print("\n----- DataFrame with NHL Teams Data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.
took 0.31 seconds.

Copyright:
 NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.

----- DataFrame with NHL Teams Data loaded
in 0.03 seconds
with 31 rows
and 15 columns
-- Column names:
 Index(['abbreviation', 'active', 'conference', 'division', 'firstYearOfPlay',
       'franchise', 'franchiseId', 'id', 'link', 'locationName', 'name',
       'officialSiteUrl', 'shortName', 'teamName', 'venue'],
      dtype='object')
saved to file:
 ../../data/nhl_api/teams.csv


## NHL Draft Prospects
Obtained from the NHL Stats API Prospects endpoint.

In [15]:
url = 'https://statsapi.web.nhl.com/api/v1/draft/prospects'
save_path = data_path + 'prospects.csv'

t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))
print("\nCopyright:\n", json_data['copyright'])

t = time()
df_raw = pd.DataFrame(json_data['prospects'])
elapsed = time() - t
print("\n----- DataFrame with NHL Prospects data loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.
took 0.81 seconds.

Copyright:
 NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.

----- DataFrame with NHL Prospects data loaded
in 0.04 seconds
with 9,632 rows
and 20 columns
-- Column names:
 Index(['amateurLeague', 'amateurTeam', 'birthCity', 'birthCountry',
       'birthDate', 'birthStateProvince', 'draftStatus', 'firstName',
       'fullName', 'height', 'id', 'lastName', 'link', 'nationality',
       'nhlPlayerId', 'primaryPosition', 'prospectCategory', 'ranks',
       'shootsCatches', 'weight'],
      dtype='object')
saved to file:
 ../../data/nhl_api/prospects.csv


## Detailed records
### Get main records directory with restURLs for each record category

In [16]:
url = 'https://records.nhl.com/site/api/record-detail'
save_path = data_path + 'records/records_main.csv'

# get data from NHL API
t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

# records results to a .csv file
t = time()
df_records = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with the main records directory loaded"
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_records.shape[0], df_records.shape[1]) + 
      "\n-- Column names:\n", df_records.columns)
df_records.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

JSON received from NHL API.
took 0.38 seconds.
----- DataFrame with the main records directory loaded
in 0.00 seconds
with 657 rows
and 6 columns
-- Column names:
 Index(['description', 'descriptionKey', 'id', 'restUrl', 'sequence',
       'videoId'],
      dtype='object')
saved to file:
 ../../data/nhl_api/records/records_main.csv


### Select relevant columns

In [17]:
cols = ['description', 'descriptionKey', 'restUrl']
df_records[cols]

Unnamed: 0,description,descriptionKey,restUrl
0,"Most Goals, Rookie, Season",most-goals-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
1,"Most Goals, Rookie, Game",most-goals-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
2,"Most Goals, Rookie, First NHL Game",most-goals-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
3,"Most Assists, Rookie, Season",most-assists-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
4,"Most Assists, Rookie, Game",most-assists-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
5,"Most Assists, Rookie, First NHL Game",most-assists-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
6,"Most Points, Rookie, Season",most-points-rookie-one-season,/site/api/skater-regular-season-scoring?cayenn...
7,"Most Points, Rookie, Game",most-points-rookie-first-season-one-game,/site/api/skater-first-season-game-scoring?cay...
8,"Most Points, Rookie, First NHL Game",most-points-rookie-first-game,/site/api/skater-first-game-scoring?cayenneExp...
9,"Most Goals, Rookie Defenseman, Season",most-goals-rookie-defenseman-one-season,/site/api/skater-regular-season-scoring?cayenn...


### Get detailed records using restUrls obtained above

In [19]:
url_records = 'https://records.nhl.com'
line = 0
print(df_records.loc[line, 'description'] + '\n')
url = url_records + df_records.loc[line, 'restUrl']
save_path = data_path + 'records/' + \
            df_records.loc[line, 'descriptionKey'] + \
            '.csv'

# request data from NHL API
t = time()
r = requests.get(url)
json_data = r.json()
elapsed = time() - t
print("JSON received from NHL API."
      "\ntook {0:.2f} seconds."
      .format(elapsed))

# record results to file
t = time()
df_raw = pd.DataFrame(json_data['data'])
elapsed = time() - t
print("----- DataFrame with {0}".format(df_records.loc[line, 'description']) + ' loaded'
      "\nin {0:.2f} seconds".format(elapsed) + 
      "\nwith {0:,} rows\nand {1:,} columns"
      .format(df_raw.shape[0], df_raw.shape[1]) + 
      "\n-- Column names:\n", df_raw.columns)
df_raw.to_csv(save_path, index=False)
print("saved to file:\n", save_path)

Most Goals, Rookie, Season

JSON received from NHL API.
took 0.60 seconds.
----- DataFrame with Most Goals, Rookie, Season loaded
in 0.02 seconds
with 3,377 rows
and 35 columns
-- Column names:
 Index(['activePlayer', 'assists', 'assistsPerGpMin20', 'firstGoals',
       'firstName', 'fiveGoalGames', 'fourGoalGames', 'gameWinningGoals',
       'gamesInSchedule', 'gamesPlayed', 'goals', 'goalsPerGpMin20',
       'goalsPerGpMin50', 'id', 'lastName', 'overtimeAssists', 'overtimeGoals',
       'overtimePoints', 'penalties', 'penaltyMinutes', 'playerId', 'points',
       'pointsPerGpMin50', 'positionCode', 'powerPlayGoals', 'rookieFlag',
       'seasonId', 'sevenGoalGames', 'shorthandedGoals', 'shots',
       'sixGoalGames', 'teamAbbrevs', 'teamNames', 'threeGoalGames',
       'threeOrMoreGoalGames'],
      dtype='object')
saved to file:
 ../../data/nhl_api/records/most-goals-rookie-one-season.csv
