# Summary

The code to acquire and pre-process the card data to use for future analysis.

# Grab MTGJSON Card Data

Here I will download and clean the data for MTG cards.

First we will download the data from [MTGJSON](https://mtgjson.com/downloads/all-files/).  The `AllPrintings` card data comes in various formats, such as json, sql, csv, and parquet.

I will use the [parquet format](https://parquet.apache.org/), since that is the most performant format for data analysis.  It has high compression, fast load times, and can query directly on disk.  This minimizes both disk space and memory usage.

Data URLs:
- __Linux__: https://mtgjson.com/api/v5/AllPrintingsParquetFiles.tar.gz
- __Windows__: https://mtgjson.com/api/v5/AllPrintingsParquetFiles.zip

The following code downloads and decompresses the data.

In [19]:
# Setup Notebook
import os
if os.path.basename(os.getcwd()) != 'mtg-modeling':
    get_ipython().run_line_magic("run", '-i "../../scripts/notebook_header.py"') # type: ignore

In [20]:
from src.data.mtgjson_fetcher import MtgJsonFetcher


In [21]:
card_fetcher = MtgJsonFetcher(dataset='AllPrintingsParquetFiles', save_root='data/raw/mtgjson')
card_fetcher.fetch()

Downloading AllPrintingsParquetFiles Data
Starting datetime: 2024-08-24 20:29:54.591346


--2024-08-24 20:29:54--  https://mtgjson.com/api/v5/AllPrintingsParquetFiles.tar.gz
Resolving mtgjson.com (mtgjson.com)... 104.21.64.186, 172.67.154.80
Connecting to mtgjson.com (mtgjson.com)|104.21.64.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 120607387 (115M) [application/octet-stream]
Saving to: ‘data/raw/mtgjson/AllPrintingsParquetFiles.tar.gz’

     0K ........ ........ ........ ........ 27% 2.92M 28s
 32768K ........ ........ ........ ........ 55% 2.80M 18s
 65536K ........ ........ ........ ........ 83% 2.97M 7s
 98304K ........ ........ ...              100% 2.95M=40s

2024-08-24 20:30:34 (2.90 MB/s) - ‘data/raw/mtgjson/AllPrintingsParquetFiles.tar.gz’ saved [120607387/120607387]



1.1G	data/raw/mtgjson
Downloaded AllPrintingsParquetFiles Data
Final size: 0
Final path: data/raw/mtgjson/AllPrintingsParquetFiles
Finished datetime: 2024-08-24 20:30:35.446599


# Grab MTGJSON All Price Data

Next we will download the data from [MTGJSON](https://mtgjson.com/downloads/all-files/).  The `AllPrices` card data only comes in json format.  We will have to convert this to parquet for ease of use in future analysis.  

Note this data only covers the previous 90 days.

Since the file is very large, I will use [polars](https://docs.pola.rs) instead of [pandas](https://pandas.pydata.org/docs/)

Data URLs:
- __Linux__: https://mtgjson.com/api/v5/AllPrices.json.gz
- __Windows__: https://mtgjson.com/api/v5/AllPrices.json.zip

The following code downloads and decompresses the data.

In [22]:
price_fetcher = MtgJsonFetcher(dataset='AllPrices.json', save_root='data/raw/mtgjson')
price_fetcher.fetch()

Downloading AllPrices.json Data
Starting datetime: 2024-08-24 20:30:35.455416


--2024-08-24 20:30:35--  https://mtgjson.com/api/v5/AllPrices.json.gz
Resolving mtgjson.com (mtgjson.com)... 104.21.64.186, 172.67.154.80
Connecting to mtgjson.com (mtgjson.com)|104.21.64.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 106343489 (101M) [application/octet-stream]
Saving to: ‘data/raw/mtgjson/AllPrices.json.gz’

     0K ........ ........ ........ ........ 31% 3.00M 23s
 32768K ........ ........ ........ ........ 63% 3.01M 12s
 65536K ........ ........ ........ ........ 94% 2.96M 2s
 98304K .....                              100% 3.17M=34s

2024-08-24 20:31:09 (3.00 MB/s) - ‘data/raw/mtgjson/AllPrices.json.gz’ saved [106343489/106343489]



1.1G	data/raw/mtgjson
Downloaded AllPrices.json Data
Final size: 0
Final path: data/raw/mtgjson/AllPrices
Finished datetime: 2024-08-24 20:31:12.933750
