## What are these CSDS files?

In this notebook, you will download the sample CSDS data for 12 matches. CSDS stands for Counter Strike Data Science. We extract data from each demo file into outputs called channels. Each channel is one of four types.

- Telemetry: Information as a function of time
- Single event: Information for a single event type
- Multi event: Information for several events of similar type grouped together (For example, bomb events)
- Header: Metadata about the map such as map_name or tick_rate

There are 33 channels, and you can see more information about each in the manifest file. They are all compressed parquet files except for the manifest, which is gzipped json. When we read in a CSDS file, we generally don't need all channels, so our reader will accept "instructions" that describe which channels and columns to read. An example of some instructions are in step 5.

Each demo goes through our parser then through a post-parser processor (PPP) that fixes certain things and extends some channels with info from other channels. For instance, most events have a `player_id` attached to them, but this ID can be two different numbers for the same person if they disconnect and reconnect to the server. Therefore we added `player_id_fixed` which is tied to steam id and does not change even if a player disconnects or reconnects.

## Downloading the data

We built in a simple downloader for the sample data. The properties of the sample data are:

- 12 matches
- ~400 files 
- ~300mb
- ~20 minutes to download with the simple downloader

For a larger dataset, you should copy the dataset to your s3 bucket and download it from there.

Set your `dataset_id` from AWS. Follow these steps:

- Log in to AWS.
- Subscribe to the [Sample dataset](https://aws.amazon.com/marketplace/pp/prodview-42sep6wtmcfvg) if you have not already.
- Navigate to the AWS Data Exchange.
- Under "My Subscriptions" click "Entitled data".
- Find `PureSkill.gg Competitive CS:GO Sample Data` in the list.
- Expand the dropdown to find `sample-csds-micro` and click on it.
- Expand the "Data set overview" box near the top to reveal the Data set ID.

_**Replace the `dataset_id` parameter below with your particular ID then run the notebook.**_

Note: you can safely stop and restart this downloader.

In [None]:
dataset_id = '27cb2d5cd702007de08d8d08417c9c40'

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
# Set environment variables
setup_notebook()

In [None]:
import os
from pureskillgg_dsdk.exchange import download_dataexchange_dataset_revision

In [None]:
# Choose our Data Science collection path as the location to save
output_path = os.environ.get('PURESKILLGG_TOME_DS_COLLECTION_PATH')

# Download the data
download_dataexchange_dataset_revision(output_path, dataset_id)