------------

**FPS Critic Inc., makers of PureSkill.gg, is not liable for any AWS costs your incur. Run this notebook only if you understand and accept the AWS billing implications.**

------------

## Getting data from the ADX

This notebook will help you download data from the main dataset off of the data exchange.

Here are the definition of some of the terms around data on the AWS Data Exchange (ADX):
- Data Exchange: AWS service that hosts the data.
- Data Product: This is our listing on the ADX where we are publishing the csds data.
- Data Set: A container for one type of data.
- Revision: For the csds data, each revision is equivalent to one day of data. This allows for granularity for both data volume and date range.

This notebook will show you how to transfer some set of revisions from the ADX to a bucket on S3. It will also allow you to enable automatic transferring for new revisions to S3.

## You will incur costs!

While the data set subscription as provided is free, **exporting and downloading the data set will incur a real money cost to your AWS bill according to their pricing**. Even if you are on the AWS free tier, the volume of data in this data set may exceed the free tier limits.

Things that will most likely cost money:
- Transferring from ADX to your S3 bucket
- Storing the data in S3
- Transferring from your S3 bucket to your local hard drive


For example, the month of april is:

- 275 GB
- 286,671 files
- 8,717 matches


You may use [this calculator](https://calculator.aws) to estimate your costs tailored to your account and region:  For full S3 pricing, refer to [the amazon pricing page](https://aws.amazon.com/s3/pricing/)

We have included a calculator below to assist you in calculating your costs, however, we do not guarantee its accuracy and **you** are responsible for the final calculations of your cost and paying your AWS bill.

## Data Volume

For example, here are the number of matches in each month for the first five months:

- December 2021: 17,952
- January 2022: 15,315
- February 2022: 9,892
- March 2022: 16,855
- April 2022: 8,717

For the first bunch of revisions we uploaded, we made a [documentation website](https://docs.pureskill.gg/datascience/adx/csgo/matches_per_day) showing the approximate number of matches for each day. There is quite a large variance in some days due to promotions or other campaigns.

In [None]:
dataset_id = "f49be2ef387af522a7b6f000158113e0"
bucket = 'my-bucket'
prefix = None
start_date="2022-04-01"
end_date="2022-05-01"
days_to_store_on_s3 = 30

In [None]:
from pureskillgg_makenew_pyskill.notebook import setup_notebook

In [None]:
setup_notebook()

In [None]:
from pureskillgg_dsdk import (
    enable_auto_exporting_adx_dataset_revisions_to_s3,
    disable_auto_exporting_adx_dataset_revisions_to_s3,
    download_adx_dataset_revision,
    get_adx_dataset_revisions,
    export_single_adx_dataset_revision_to_s3,
    export_multiple_adx_dataset_revisions_to_s3,
)


In [None]:
revs = get_adx_dataset_revisions(dataset_id, start_date=start_date, end_date=end_date)
number_of_revisions=len(revs)
print('There are',number_of_revisions,'revisions in that date range.')

## Specific Cost Estimate

We provide the estimates below for convienence but do not guarantee their accuracy or applicability to your AWS account. You should perform your own calculations using the cost calculator provided by AWS.

In [None]:
# From AWS
cost_per_get = 0.0000004 # for US East (Virginia)
cost_per_put = 0.000005 # for US East (Virginia)
cost_per_gb_per_month = 0.023 # for US East (Virginia)
cost_per_gb_transfer_from_adx = 0.00 # for US East (Virginia)
cost_per_gb_transfer_out = 0.09 # for US East (Virginia)

## Estimate for example EU region
# cost_per_get = 0.00000042 # for EU (Paris)
# cost_per_put = 0.0000053 # for EU (Paris)
# cost_per_gb_per_month = 0.024 # for EU (Paris)
# cost_per_gb_transfer_from_adx = 0.02 # for EU (Paris)
# cost_per_gb_transfer_out = 0.09 # for EU (Paris)

## Estimate for example Asia Pacific region
# cost_per_get = 0.00000035 # for Asia Pacific (Seoul)
# cost_per_put = 0.0000045 # for Asia Pacific (Seoul)
# cost_per_gb_per_month = 0.025 # for Asia Pacific (Seoul)
# cost_per_gb_transfer_from_adx = 0.08 # for Asia Pacific (Seoul)
# cost_per_gb_transfer_out = 0.11 # for Asia Pacific (Seoul)

In [None]:
matches_per_revision = 8717/30 # April average
# matches_per_revision = 9892/28 # Feb average
GB_per_match = 0.03154
files_per_match = 33

In [None]:
days_per_month = 30

number_of_matches = number_of_revisions*matches_per_revision
number_of_files = number_of_matches*files_per_match
storage_volume_GB = number_of_matches*GB_per_match

storage_cost = storage_volume_GB*cost_per_gb_per_month*days_to_store_on_s3/days_per_month

transfer_put_cost = cost_per_put*number_of_files
transfer_get_cost = cost_per_get*number_of_files

transfer_volume_cost = storage_volume_GB*cost_per_gb_transfer_from_adx

download_cost = storage_volume_GB*cost_per_gb_transfer_out

print(f"{number_of_revisions} revisions will be {round(storage_volume_GB,2)} GB, "
     f"{number_of_matches} matches, and {number_of_files} files.")
print(f"Cost to transfer to your S3 bucket: ${round(transfer_put_cost+transfer_volume_cost,3)}")
print(f"Cost to store data in S3 for {days_to_store_on_s3} days: ${round(storage_cost,3)}")
print(f"Cost to transfer from S3 to local: ${round(transfer_get_cost+download_cost,3)}")
print(f"Total cost: ${round(transfer_put_cost+transfer_volume_cost+storage_cost+transfer_get_cost+download_cost,3)}")

## Transferring revisions from Data Exchange to your S3 bucket

By uncommenting the code below, you agree to pay whatever AWS costs you will incur by running this notebook. 

**⚠️💵⚠️ Uncommenting the code below in this notebook will cause you to incur AWS usage fees! ⚠️💵⚠️**

### Transfer latest revision from ADX to S3

In [None]:
# export_single_adx_dataset_revision_to_s3(
#     bucket, dataset_id, prefix=prefix
# )

In [None]:
# export_single_adx_dataset_revision_to_s3(
#     bucket, dataset_id, revision_id=revs[0] prefix=prefix
# )

### Transfer a specific day from ADX to S3

In [None]:
# export_multiple_adx_dataset_revisions_to_s3(
#     bucket,
#     dataset_id,
#     prefix=prefix,
#     start_date="2022-04-08", #Edit this date
#     end_date="2022-04-08",
# )

### Transfer everything from ADX to S3

In [None]:
# export_multiple_adx_dataset_revisions_to_s3(
#     bucket,
#     dataset_id,
#     prefix=prefix
# )

## Transferring data from S3 to local

There are many many ways to do this so we won't list them all here. We generally sync one month at a time with the AWS CLI like this:

```
aws s3 sync s3://my-bucket/csds/2022/04/ .
```

## Transferring single revision from ADX to local (not recommended)

This is not recommended because it will take a **VERY VERY** long time to download. Please instead download from your S3 bucket.

In [None]:
ds_collection_path = os.environ.get('PURESKILLGG_TOME_DS_COLLECTION_PATH')
# download_adx_dataset_revision(ds_collection_path, dataset_id, prefix=prefix)

## Automatically export revisions to s3

In [None]:
# enable_auto_exporting_adx_dataset_revisions_to_s3(bucket, dataset_id, prefix=prefix)

## Disable exporting of revisions to s3

You can cancel your automatic exporting of revisions by running the code below. 

You can also disable this automatic job through the AWS console by:

1. navigating to the ADX, 
1. clicking on "Entitled data" under "My Subscriptions"
1. Click on "PureSkill.gg Competitive CS:GO Gameplay"
1. Click on the data set which should be named "pureskillgg-csgo-production-dataexchange-csds-0"
1. Scroll down to "Jobs" and you should be able to cancel any outstanding jobs.

In [None]:
# disable_auto_exporting_adx_dataset_revisions_to_s3(dataset_id)