## Time to do some data science

Before creating a tome, we must decide on how to transform our data before concatenating. Therefore, we will explore the data for a single match. 

We will investigate the number of footsteps players make as a function of rank, wins, and friendly commends.

After we developed the code that does our data processing, we moved them to functions and put them in `pureskillgg_makenew_pyskill\tutorial_datascience\footsteps_example.py` so that we can import them in the next notebook. This avoids code duplication and will let the PureSkill.gg Coach import these functions in the future!

_**Run this notebook as-is.**_

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook(silent=True)

In [None]:
# %load ../usual_suspects.py
# pylint: disable=unused-import
import time
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pureskillgg_dsdk.tome import create_tome_curator

pd.set_option("display.max_columns", 150)
pd.set_option("display.max_rows", 150)
pd.set_option("display.min_rows", 150)
# pd.set_option('display.float_format', '{:.4f}'.format)

curator = create_tome_curator()

In [None]:
from pureskillgg_dsdk import DsReaderFs, GameDsLoader

In [None]:
ds_name = os.environ.get('PURESKILLGG_TOME_DS_TYPE')
header_name = os.environ.get('PURESKILLGG_TOME_DEFAULT_HEADER_NAME')
df_header = curator.get_dataframe(header_name)

In [None]:
df_header.head(2)

In [None]:
# Just grab the first match :)

full_path = df_header['ds_path'][0]
key = df_header['key'][0]
root_path = full_path.split(key)[0]
manifest_key = os.sep.join([key, ds_name])


In [None]:
csds_reader = DsReaderFs(
    root_path=root_path,
    manifest_key=manifest_key,
)

csds_loader = GameDsLoader(reader=csds_reader)

In [None]:
manifest = csds_loader.manifest

In [None]:
for channel in manifest['channels']:
    print(channel['channel'], '-', len(channel['columns']), 'columns')

In [None]:
data = csds_loader.get_channels() # This reads in all channels because we gave no reading instructions.

In [None]:
data['player_footstep'].head()

In [None]:
df_footsteps_total = (
    data['player_footstep']
    .groupby('player_id_fixed', as_index=False)
    .size()
    .rename(columns={'size':'steps'})
)
df_footsteps_total

In [None]:
pi = data['player_info']
pi.head()

In [None]:
pi_simple=pi[['player_id_fixed', 'commends_friendly', 'wins', 'rank']].groupby('player_id_fixed',as_index=False).max()
pi_simple

In [None]:
map_name = data['header']['map_name'].iat[0]
print(map_name)

In [None]:
df_final = pd.merge(df_footsteps_total, pi_simple, how='left', on='player_id_fixed')
df_final['map_name'] = map_name
df_final