# Task 4 - Tidy Data

For this task we want to tidy the raw JSON data we fetched from the NHL API. All we need to tidy the data into a clean DataFrame ready to be used for analysis is an instance of the `NHLDataParser` class.

In [1]:
from ift6758.data.nhl_data_parser import NHLDataParser

The `NHLDataParser` class has its own instance of the `NHLDataFetcher` class. If the game isn't found locally on the machine, it will fetch it and then tidy it. Here's a sample of using a specific game ID.

In [2]:
data_parser = NHLDataParser()
df = data_parser.get_shot_and_goal_pbp_df('2016020033')
df.sample(10) # displays 10 random shot/goal events

Unnamed: 0,gameId,timeRemaining,periodNumber,timeInPeriod,isGoal,shotType,emptyNet,xCoord,yCoord,zoneCode,...,shootingPlayer,goalieInNet,previousEvent,timeDiff,previousEventX,previousEventY,rebound,distanceDiff,shotAngleDiff,speed
293,2016020033,96,3,18:24,1,wrist,1,4.0,7.0,N,...,Mats Zuccarello,,hit,6.0,-66.0,-25.0,0,76.967526,0.0,12.827921
214,2016020033,1077,3,02:03,0,wrist,0,38.0,-19.0,O,...,David Schlemko,Antti Raanta,faceoff,33.0,-68.0,-21.0,0,106.018866,0.0,3.212693
212,2016020033,1111,3,01:29,0,wrist,0,-40.0,-34.0,O,...,Kevin Hayes,Martin Jones,stoppage,-1.0,,,0,,0.0,6.105804
95,2016020033,1109,2,01:31,0,slap,0,-46.0,0.0,O,...,Matt Nieto,Antti Raanta,hit,8.0,98.0,-28.0,0,146.696967,0.0,18.337121
67,2016020033,198,1,16:42,0,slap,0,41.0,-35.0,O,...,Brent Burns,Antti Raanta,hit,34.0,95.0,34.0,0,87.618491,0.0,2.577014
105,2016020033,1038,2,02:42,0,backhand,0,81.0,-6.0,O,...,Derek Stepan,Martin Jones,giveaway,2.0,96.0,-3.0,0,15.297059,0.0,7.648529
19,2016020033,904,1,04:56,0,wrist,0,-85.0,-18.0,O,...,Ryan McDonagh,Martin Jones,blocked-shot,45.0,69.0,11.0,0,156.706732,0.0,3.482372
275,2016020033,280,3,15:20,0,snap,0,80.0,-5.0,O,...,Logan Couture,Antti Raanta,missed-shot,7.0,71.0,16.0,0,22.847319,0.0,3.263903
42,2016020033,520,1,11:20,0,wrist,0,-65.0,-20.0,O,...,Chris Kreider,Martin Jones,hit,46.0,-4.0,32.0,0,80.156098,0.0,1.742524
15,2016020033,968,1,03:52,0,wrist,0,53.0,33.0,O,...,Logan Couture,Antti Raanta,shot-on-goal,28.0,-79.0,4.0,1,135.148067,20.709038,4.826717


We can also use the `NHLDataParser` to get tidied shot and goal play-by-play for an entire season.

In [3]:
df = data_parser.get_shot_and_goal_pbp_df_for_season(2016)
df.sample(10)

Fetching regular season games for 2016...
Processing game_id: 2016020001
Processing game_id: 2016020002
Processing game_id: 2016020003
Processing game_id: 2016020004
Processing game_id: 2016020005
Processing game_id: 2016020006
Processing game_id: 2016020007
Processing game_id: 2016020008
Processing game_id: 2016020009
Processing game_id: 2016020010
Processing game_id: 2016020011
Processing game_id: 2016020012
Processing game_id: 2016020013
Processing game_id: 2016020014
Processing game_id: 2016020015
Processing game_id: 2016020016
Processing game_id: 2016020017
Processing game_id: 2016020018
Processing game_id: 2016020019
Processing game_id: 2016020020
Processing game_id: 2016020021
Processing game_id: 2016020022
Processing game_id: 2016020023
Processing game_id: 2016020024
Processing game_id: 2016020025
Processing game_id: 2016020026
Processing game_id: 2016020027
Processing game_id: 2016020028
Processing game_id: 2016020029
Processing game_id: 2016020030
Processing game_id: 20160200

KeyboardInterrupt: 

And also a range of seasons:

In [None]:
df = data_parser.get_shot_and_goal_pbp_df_for_seasons(2016, 2018)
df.sample(10)

Unnamed: 0,gameId,timeRemaining,periodNumber,timeInPeriod,isGoal,shotType,xCoord,yCoord,shootingTeam,shotDistance,shootingTeamSide,shootingPlayer,goalieInNet
250727,2018030183,06:03,3,13:57,1,backhand,-83.0,0.0,Golden Knights,6.0,1,Mark Stone,Martin Jones
252522,2018030311,17:15,1,02:45,0,wrist,-31.0,5.0,Hurricanes,58.215118,1,Justin Faulk,Tuukka Rask
143988,2017020987,05:53,1,14:07,0,wrist,-84.0,-8.0,Predators,9.433981,1,Kyle Turris,Cam Talbot
140263,2017020930,08:13,2,11:47,0,snap,64.0,-19.0,Maple Leafs,31.400637,0,Connor Carrick,Jaroslav Halak
190010,2018020353,04:07,2,15:53,0,wrist,-81.0,5.0,Penguins,9.433981,1,Dominik Simon,Joonas Korpisalo
87643,2017020112,02:38,1,17:22,0,wrist,-67.0,-3.0,Sabres,22.203603,1,Ryan O'Reilly,Anton Khudobin
24955,2016020411,04:28,1,15:32,0,slap,-31.0,-15.0,Flyers,59.908263,1,Shayne Gostisbehere,Kari Lehtonen
194441,2018020424,17:11,3,02:49,1,wrist,-82.0,6.0,Golden Knights,9.219544,1,Cody Eakin,Braden Holtby
22814,2016020375,03:39,3,16:21,0,backhand,-75.0,3.0,Ducks,14.317821,1,Corey Perry,Chad Johnson
236733,2018021095,18:06,1,01:54,0,snap,-84.0,3.0,Stars,5.830952,1,Alexander Radulov,Marc-Andre Fleury
