## What are these CSDS files?

In this notebook, you will download the sample CSDS data for 4 matches. CSDS stands for Counter Strike Data Science. We extract data from each demo file into outputs called channels. Each channel is one of four types.

- Telemetry: Information as a function of time
- Single event: Information for a single event type
- Multi event: Information for several events of similar type grouped together (For example, bomb events)
- Header: Metadata about the map such as map_name or tick_rate

There are 33 channels, and you can see more information about each in the manifest file. They are all compressed parquet files except for the manifest, which is gzipped json. When we read in a CSDS file, we generally don't need all channels, so our reader will accept "instructions" that describe which channels and columns to read. An example of some instructions are in step 5.

Check out the [csds specs](https://docs.pureskill.gg/datascience/adx/csgo/csds/spec).

Each demo goes through our parser then through a post-parser processor (PPP) that fixes certain things and extends some channels with info from other channels. For instance, most events have a `player_id` attached to them, but this ID can be two different numbers for the same person if they disconnect and reconnect to the server. Therefore we added `player_id_fixed` which is tied to steam id and does not change even if a player disconnects or reconnects.

## Downloading the data

We built in a simple downloader for the sample data. The properties of the sample data are:

- 9 matches
- ~300 files 
- ~200mb
- ~45 seconds to download

For the main dataset of 60k+ matches, you need to subscribe to the [product on the AWS Data Exchange](https://aws.amazon.com/marketplace/pp/prodview-v3o7zrt6okwmo).

The use license will be in your `PURESKILLGG_TOME_DS_COLLECTION_PATH` named `LICENSE.pdf`. You must agree to these terms to use the data. See `README.md` for more information.

Note: you can safely stop and restart this downloader.

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
# Set environment variables
setup_notebook()

In [None]:
import os
import io
import zipfile
import requests

In [None]:
# # Choose our Data Science collection path as the location to save
dataset_sample_urls = [
    "https://d1ewbp317vsrbd.cloudfront.net/a1b80cdb-d15a-4828-b955-f0f42c45109c.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/d39ba8f3-a61f-4bad-8550-42ec3fcf0e67.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/dfc0bf8a-2cd1-4073-9725-c70018f2c2fd.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/f7030153-7383-481a-87eb-450fa5cc408e.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/87303b35-34eb-4590-a058-f4d734987291.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/24321cdc-d788-46b4-8c2d-a7d385cdcd75.zip",
    "https://d1ewbp317vsrbd.cloudfront.net/85e30395-da9a-4f40-b3eb-e173ab42df6d.zip"
]

ds_collection_path = os.environ.get('PURESKILLGG_TOME_DS_COLLECTION_PATH')

if not os.path.isdir(ds_collection_path):
    os.makedirs(ds_collection_path)

for index, url in enumerate(dataset_sample_urls):
    print("working on ",url)
    response = requests.get(url)
    bytes_content = io.BytesIO(response.content)
    zip_ref = zipfile.ZipFile(bytes_content)
    zip_ref.extractall(ds_collection_path)

Advance to the [next notebook](3%20-%20Make%20header%20tome.ipynb).