## Explore tome and train models

A tome consists of four things

- **The keysets** - A simple list of keys that are included in this tome.
- **The dataframes** - The data.
- **The manifest** - A listing of availble keyset/dataframe files and other metadata about the tome.
- **The header** - This provides the target list of keys while making the tome. Generally used by internal processes only. This is what makes tomes "immutable" once started or finished.

Because they can be large, the combined dataframe data are saved in separate files called "pages". You can set a max (memory) size for each page when making a tome.

Let's read in that tome and train a model to go from number of footsteps to rank, obviously.

_**Run this notebook as-is.**_

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook(silent=True)

In [None]:
# %load ../usual_suspects.py
# pylint: disable=unused-import
import time
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pureskillgg_dsdk.tome import create_tome_curator

pd.set_option("display.max_columns", 150)
pd.set_option("display.max_rows", 150)
pd.set_option("display.min_rows", 150)
# pd.set_option('display.float_format', '{:.4f}'.format)

curator = create_tome_curator()


In [None]:
footsteps_tome_name = 'footsteps_by_rank.2022-05-15,2022-05-15'
df = curator.get_dataframe(footsteps_tome_name)
keyset = curator.get_keyset(footsteps_tome_name)

In [None]:
print(len(df),len(keyset))

In [None]:
df.head(25)

## Explore the tome

Let's look at footsteps as a function of rank.

In [None]:
# Remove unknown rank people
df=df[df['rank']!=0]

df['rank_fixed']=df['rank'].apply(lambda x: x/2).apply(round)

gb = df[['rank','steps']].groupby('rank',as_index=False).mean()

gb

In [None]:
plt.scatter(gb['rank'],gb['steps'])
plt.xlabel('Rank')
plt.ylabel('Average Number of Steps')

## Coaching time

Obviously, any player that makes more than 1025 steps in a match is a pro, obviously and they deserve a 10/10 score. Anyone with less is a noob and will get a 1/10 score. We will do a toy example of applying this to one player in a match, which is how the Coach for PureSkill.gg works, mostly.

In [None]:
# Import the functions from step 4
from pureskillgg_makenew_pyskill.tutorial import (
    aggregate_footsteps, 
    simplify_player_info,
    get_map_name, 
    assemble_final_df
)
model_parameters={
    'footstep_threshold':1025
}

def grade_footsteps(data, player_id_fixed):
    df_footsteps_total = aggregate_footsteps(data['player_footstep'])
    df_pi_simple = simplify_player_info(data['player_info'])
    map_name = get_map_name(data['header'])
    df_final = assemble_final_df(df_footsteps_total, df_pi_simple, map_name)
    player_footstep_count = df_final[df_final['player_id_fixed']==player_id_fixed]['steps'].iat[0]
    footstep_threshold = model_parameters['footstep_threshold']
    if player_footstep_count > footstep_threshold:
        return 1.0
    return 0.1

# Grab a test match to analyze.
data = curator.get_random_match().get_channels()

In [None]:
random_player = data['player_info']['player_id_fixed'][0]
grade = grade_footsteps(data, random_player)
if grade > 0.5:
    print('congrats! you got a good grade.')
else:
    print('whoopsie, get gud kid.')