# Data acquisition

The following notebook is used to acquire the data from the API and creates a Pandas DataFrame for each of the datasets. The datasets are then saved to disk for further processing.

## Get raw data from API

This will get the data from the API and store the dump in the `ift6758/data/storage/dump` directory.


In [None]:
from ift6758.data import fetch_all_seasons_games_data

# This process takes a few minutes / hours
fetch_all_seasons_games_data()

This also stores every single API response in the `ift6758/data/storage/cache` directory.
Once the raw data are stored in the `ift6758/data/storage/dump` directory, you can clear the cache.
Run the following cell to clear the data.

In [None]:
from ift6758.data import clear_cache

clear_cache()


## Load raw data

Now all the season data are stored in the `ift6758/data/storage/dump` directory, we can load them into objects.

In [None]:
from ift6758.data import load_raw_games_data

# You can pass a season number (first year) as argument to load only one season
data = load_raw_games_data() 
print(len(data))

## Load flattened data

Extract features from raw data set and convert in records


In [None]:
from ift6758.data import load_events_records

# You can pass a season number (first year) as argument to load only one season
data = load_events_records(2020)
print(data[0])


## Load dataframe

Extract features from raw data set and convert in Panda's Dataframe

In [None]:
from ift6758.data import load_events_dataframe

# You can pass a season number (first year) as argument to load only one season
data = load_events_dataframe(2020)
print(data.columns)
print(data.head(10))