# Task 4 - Tidy Data

For this task we want to tidy the raw JSON data we fetched from the NHL API. All we need to tidy the data into a clean DataFrame ready to be used for analysis is an instance of the `NHLDataParser` class.

In [1]:
from ift6758.data.nhl_data_parser import NHLDataParser

The `NHLDataParser` class has its own instance of the `NHLDataFetcher` class. If the game isn't found locally on the machine, it will fetch it and then tidy it. Here's a sample of using a specific game ID.

In [2]:
data_parser = NHLDataParser()
df = data_parser.get_shot_and_goal_pbp_df('2016020033')
df.sample(10) # displays 10 random shot/goal events

Unnamed: 0,gameId,timeRemaining,periodNumber,timeInPeriod,isGoal,shotType,xCoord,yCoord,shootingTeam,shotDistance,shootingTeamSide,shootingPlayer,goalieInNet
212,2016020033,18:31,3,01:29,0,wrist,-40,-34,Rangers,59.64059,1,Kevin Hayes,Martin Jones
191,2016020033,01:08,2,18:52,0,wrist,-62,29,Sharks,39.623226,1,David Schlemko,Antti Raanta
24,2016020033,14:17,1,05:43,0,slap,36,31,Sharks,61.400326,0,Marc-Edouard Vlasic,Antti Raanta
61,2016020033,04:42,1,15:18,0,wrist,-54,-38,Rangers,51.662365,1,Chris Kreider,Martin Jones
280,2016020033,04:29,3,15:31,1,tip-in,78,-5,Sharks,12.083046,0,Joe Pavelski,Antti Raanta
93,2016020033,18:41,2,01:19,0,wrist,83,-10,Rangers,11.661904,0,Mika Zibanejad,Martin Jones
13,2016020033,16:40,1,03:20,0,slap,-44,10,Rangers,46.097722,1,Mika Zibanejad,Martin Jones
305,2016020033,00:07,3,19:53,1,wrist,-69,-7,Rangers,21.18962,1,Michael Grabner,
177,2016020033,03:53,2,16:07,0,wrist,-43,34,Sharks,57.201399,1,Joe Pavelski,Antti Raanta
275,2016020033,04:40,3,15:20,0,snap,80,-5,Sharks,10.29563,0,Logan Couture,Antti Raanta


We can also use the `NHLDataParser` to get tidied shot and goal play-by-play for an entire season.

In [3]:
df = data_parser.get_shot_and_goal_pbp_df_for_season(2016)
df.sample(10)

Unnamed: 0,gameId,timeRemaining,periodNumber,timeInPeriod,isGoal,shotType,xCoord,yCoord,shootingTeam,shotDistance,shootingTeamSide,shootingPlayer,goalieInNet
25802,2016020425,01:45,2,18:15,0,backhand,-70.0,-19.0,Avalanche,26.870058,1,Carl Soderberg,Antoine Bibeau
74797,2016021227,05:00,5,00:00,1,snap,59.0,-1.0,Hurricanes,30.016662,0,Bryan Bickell,Anthony Stolarz
56397,2016020927,13:58,2,06:02,0,slap,-53.0,40.0,Hurricanes,53.814496,1,Elias Lindholm,Roberto Luongo
55470,2016020912,11:49,3,08:11,0,snap,27.0,4.0,Bruins,62.128898,0,Brandon Carlo,Kari Lehtonen
10148,2016020168,13:26,1,06:34,0,wrist,40.0,24.0,Panthers,54.561891,0,Mike Matheson,Braden Holtby
14557,2016020241,04:26,3,15:34,0,wrist,57.0,-1.0,Canucks,32.015621,0,Bo Horvat,Henrik Lundqvist
42936,2016020708,10:02,3,09:58,0,tip-in,74.0,-14.0,Kings,20.518285,0,Dustin Brown,Henrik Lundqvist
17388,2016020287,05:00,5,00:00,0,slap,-66.0,-6.0,Islanders,23.769729,1,Johnny Boychuk,Jonathan Bernier
57617,2016020947,02:41,2,17:19,0,slap,35.0,-2.0,Hurricanes,54.037024,0,Justin Faulk,Louis Domingue
47864,2016020789,19:09,3,00:51,0,slap,-14.0,30.0,Flyers,80.777472,1,Radko Gudas,Carter Hutton


And also a range of seasons:

In [4]:
df = data_parser.get_shot_and_goal_pbp_df_for_seasons(2016, 2018)
df.sample(10)

Unnamed: 0,gameId,timeRemaining,periodNumber,timeInPeriod,isGoal,shotType,xCoord,yCoord,shootingTeam,shotDistance,shootingTeamSide,shootingPlayer,goalieInNet
250727,2018030183,06:03,3,13:57,1,backhand,-83.0,0.0,Golden Knights,6.0,1,Mark Stone,Martin Jones
252522,2018030311,17:15,1,02:45,0,wrist,-31.0,5.0,Hurricanes,58.215118,1,Justin Faulk,Tuukka Rask
143988,2017020987,05:53,1,14:07,0,wrist,-84.0,-8.0,Predators,9.433981,1,Kyle Turris,Cam Talbot
140263,2017020930,08:13,2,11:47,0,snap,64.0,-19.0,Maple Leafs,31.400637,0,Connor Carrick,Jaroslav Halak
190010,2018020353,04:07,2,15:53,0,wrist,-81.0,5.0,Penguins,9.433981,1,Dominik Simon,Joonas Korpisalo
87643,2017020112,02:38,1,17:22,0,wrist,-67.0,-3.0,Sabres,22.203603,1,Ryan O'Reilly,Anton Khudobin
24955,2016020411,04:28,1,15:32,0,slap,-31.0,-15.0,Flyers,59.908263,1,Shayne Gostisbehere,Kari Lehtonen
194441,2018020424,17:11,3,02:49,1,wrist,-82.0,6.0,Golden Knights,9.219544,1,Cody Eakin,Braden Holtby
22814,2016020375,03:39,3,16:21,0,backhand,-75.0,3.0,Ducks,14.317821,1,Corey Perry,Chad Johnson
236733,2018021095,18:06,1,01:54,0,snap,-84.0,3.0,Stars,5.830952,1,Alexander Radulov,Marc-Andre Fleury
