## Introduction to tomes

Let's say you wanted to cluster smoke grenades on dust2. One match may have ~100 smokes which isn't enough to do clustering. To get a large enough dataset for clustering, you need hundreds of thousands or millions of smokes. You would have to loop through thousands of matches and read thousands of files. No matter the file size, reading so many files is time consuming and cumbersome.

The solution to this is combining the data from many matches into a "Tome".  Once created, a tome allows you to read in the data from thousands of matches without having to read in thousands of files.

## Make header tome

To start making tomes, we must make a *header tome*. This tome contains the path to all matches that will be considered as tome members.  The header tome maker uses glob to find your files, read in the header channel, and stitch them all together. Each row corresponds to one match. CSDS files that are not in the main header tome are invisible in subsequent steps. Right now, the file search assumes your files are nested to the same level, but a more robust csds finder is possible, just open a PR.

You can also make subheader tomes that filters out some of the header tome rows. An example of a subheader tome that selects only matches on dust2 is in the second to last cell. Subheader tomes are useful when exploring certain maps, skill ranges, or any info from a match that can be found in the header. 

_**Run this notebook as-is.**_

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook()

In [None]:
import os
from pureskillgg_dsdk.tome import create_tome_curator 

In [None]:
# The curator is our interface to the tomes
curator = create_tome_curator()

In [None]:
header_loader = curator.get_header_loader()

In [None]:
if not header_loader.exists:
    header_loader = curator.create_header_tome(path_depth=7)

In [None]:
df = header_loader.get_dataframe()
keys = header_loader.get_keyset()
if df is None:
    raise RuntimeError('Something went wrong when making the header.')
print('There are',len(df),'matches in the header.')

## Make subheaders too

You might want to analyze players on a specific map, rank, or platform. You can create "subheaders" that are a filtered view of the main header. The `create_subheader_tome` will create the subheader with the specified filter applied to the header tome.

In [None]:
def map_name_selector(map_name):
    return lambda df: df['map_name']==map_name

subheader_loader = curator.create_subheader_tome('subheader_dust2', map_name_selector('de_dust2'))

In [None]:
df = subheader_loader.get_dataframe()

In [None]:
df.head()