# Data acquisition

The following notebook is used to acquire the data from the API and creates a Pandas DataFrame for each of the datasets. The datasets are then saved to disk for further processing.

## Get raw data from API

This will get the data from the API and store the dump in the `ift6758/data/storage/dump` directory.


In [None]:
from ift6758.data import fetch_all_seasons_games_data

# This process takes a few minutes / hours
fetch_all_seasons_games_data()

This also stores every single API response in the `ift6758/data/storage/cache` directory.
Once the raw data are stored in the `ift6758/data/storage/dump` directory, you can clear the cache.
Run the following cell to clear the data.

In [None]:
from ift6758.data import clear_cache

clear_cache()


## Load raw data

Now all the season data are stored in the `ift6758/data/storage/dump` directory, we can load them into objects.

In [None]:
from ift6758.data import load_raw_games_data

# You can pass a season number (first year) as argument to load only one season
data = load_raw_games_data() 
print(len(data))

## Load dataframe

Extract features from raw data set and convert in Panda's Dataframe

In [2]:
from ift6758.data import load_events_dataframe

# You can pass a season number (first year) as argument to load only one season
data = load_events_dataframe(2020)
print(data.columns)
print(data.head(10))

Found 647679 events
Index(['game_id', 'season', 'game_type', 'game_date', 'venue',
       'venue_location', 'away_team_id', 'away_team_abbrev', 'away_team_name',
       'home_team_id', 'home_team_abbrev', 'home_team_name', 'event_id',
       'period_number', 'period_type', 'time_in_period', 'time_remaining',
       'situation_code', 'type_code', 'type_desc_key', 'sort_order', 'x_coord',
       'y_coord', 'zone_code', 'shot_type', 'event_owner_team_id',
       'goalie_in_net_id', 'goalie_in_net_name', 'goalie_in_net_team_id',
       'goalie_in_net_position_code', 'shooting_player_id',
       'shooting_player_name', 'shooting_player_team_id',
       'shooting_player_position_code', 'scoring_player_id',
       'scoring_player_name', 'scoring_player_team_id',
       'scoring_player_position_code', 'assist1_player_id',
       'assist1_player_name', 'assist1_player_team_id',
       'assist1_player_position_code', 'assist2_player_id',
       'assist2_player_name', 'assist2_player_team_id',
  