## Time to do some data science

Before creating a tome, we must decide on how to transform our data before concatenating. Therefore, we will explore the data for a single match. 

We will investigate the number of footsteps players make as a function of rank, wins, and friendly commends.

After we developed the code that does our data processing, we moved them to functions and put them in `pureskillgg_makenew_pyskill\tutorial_datascience\footsteps_example.py` so that we can import them in the next notebook. This avoids code duplication and will let the PureSkill.gg Coach import these functions in the future!

_**Run this notebook as-is.**_

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook(silent=True)

In [None]:
# %load ../usual_suspects.py
# pylint: disable=unused-import
import time
import os

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from pureskillgg_dsdk.tome import create_tome_curator

pd.set_option("display.max_columns", 150)
pd.set_option("display.max_rows", 150)
pd.set_option("display.min_rows", 150)
# pd.set_option('display.float_format', '{:.4f}'.format)

curator = create_tome_curator()

## Read in one match worth of data

The tome curator also provides a convienent way to grab a random match to do some exploration on. The `get_single_match` method will return the DS Loader for that particular match.


In [None]:
# Just grab the first match
match_loader = curator.get_match_by_index(0)

# Get the manifest for these data.
manifest = match_loader.manifest

# Read in all channels (you can read in a subset if you pass in reading_instructions).
data=match_loader.get_channels()

## Explore the CSDS

The CSDS files are rich in data. Feel free to explore them in depth. Here we use the manifest file to see the available channels and how many columns they contain.

In [None]:
for channel in manifest['channels']:
    print(channel['channel'], '-', len(channel['columns']), 'columns')

## Explore the relevant data and develop the engineering

In [None]:
# Inspect player_footstep dataframe
data['player_footstep'].head()

In [None]:
# Count up footsteps per player
df_footsteps_total = (
    data['player_footstep']
    .groupby('player_id_fixed', as_index=False)
    .size()
    .rename(columns={'size':'steps'})
)
df_footsteps_total

In [None]:
# Inspect player_info dataframe
pi = data['player_info']
pi.head()

In [None]:
# Inspect player_info dataframe
pi_simple = pi[['player_id_fixed', 'commends_friendly', 'wins', 'rank']].groupby('player_id_fixed',as_index=False).max()
pi_simple

In [None]:
# Get the map name
map_name = data['header']['map_name'].iat[0]
print(map_name)

In [None]:
# Combine the data into a final dataframe
df_final = pd.merge(df_footsteps_total, pi_simple, how='left', on='player_id_fixed')
df_final['map_name'] = map_name
df_final

Advance to the [next notebook](5%20-%20Create%20tome.ipynb).