## Create a tome for the footsteps data

The general process for making a tome is:

1. Choose a "header tome" as an index.
1. Load the data from a match.
1. Transform that data.
1. Finalize results into a single data frame.
1. Concat that dataframe with the others.
1. Repeat from step 2 until no more matches remain.


To make a tome, we use the `make_tome` function of the tome curator. This function handles reading the data science files, concatenating dataframes, writing tome pages, deciding when to write tome pages, and keeping track of matches included. We only need to provide the name of the tome we are making and the name of the header or subheader tome to serve as the index of matches. 

An important (but optional) parameter is the `ds_reading_instructions` where you only read in certain channels and columns for each match. This generally provides a drastic speed up because of how large some channels are, particularly `player_vector`, `player_status`, and `tick`. Avoiding reading those channels altogether will speed things up.

It is possible that we don't want to include data from a match. In that case, you can pass `None` into the tome maker and this will acknowledge the match data as included without changing the data.

_**Run this notebook as-is.**_

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook(silent=True)

In [None]:
# %load ../usual_suspects.py
# pylint: disable=unused-import
import time
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pureskillgg_dsdk.tome import create_tome_curator

pd.set_option("display.max_columns", 150)
pd.set_option("display.max_rows", 150)
pd.set_option("display.min_rows", 150)
# pd.set_option('display.float_format', '{:.4f}'.format)

curator = create_tome_curator()


In [None]:
# Import the functions from step 4
from pureskillgg_makenew_pyskill.tutorial import (
    aggregate_footsteps, 
    simplify_player_info,
    get_map_name, 
    assemble_final_df
)

In [None]:
# Initialize our "footsteps_by_rank" tome
footsteps_tome_name = 'footsteps_by_rank.2022-05-15,2022-05-15'
tomer = curator.make_tome(
    footsteps_tome_name,
    ds_reading_instructions=[
        {
            "channel": 'player_footstep',
            "columns":['player_id_fixed']
        },
        {
            "channel": 'player_info',
            "columns":['player_id_fixed','commends_friendly','wins','rank']
        },
        {
            "channel": 'header'
        }
    ])

In [None]:
# Loop through each match and add our processed dataframe to 
for data, key in tomer.iterate():
    df_footsteps_total = aggregate_footsteps(data['player_footstep'])
    df_pi_simple = simplify_player_info(data['player_info'])
    map_name = get_map_name(data['header'])
    df_final = assemble_final_df(df_footsteps_total, df_pi_simple, map_name)
    df_final['match_key'] = key
    tomer.concat(df_final)

In [None]:
df = curator.get_dataframe(footsteps_tome_name)

In [None]:
len(df)

In [None]:
df.head()

Advance to the [next notebook](6%20-%20Train%20data%20science%20models.ipynb).