# Summary

The code to acquire and pre-process the card data to use for future analysis.

# Grab MTGJSON Card Data

Here I will download and clean the data for MTG cards.

First we will download the data from [MTGJSON](https://mtgjson.com/downloads/all-files/).  The `AllPrintings` card data comes in various formats, such as json, sql, csv, and parquet.

I will use the [parquet format](https://parquet.apache.org/), since that is the most performant format for data analysis.  It has high compression, fast load times, and can query directly on disk.  This minimizes both disk space and memory usage.

Data URLs:
- __Linux__: https://mtgjson.com/api/v5/AllPrintingsParquetFiles.tar.gz
- __Windows__: https://mtgjson.com/api/v5/AllPrintingsParquetFiles.zip

The following code downloads and decompresses the data.

In [1]:
# Setup Notebook
import os
if os.path.basename(os.getcwd()) != 'mtg-modeling':
    %run -i "../../scripts/notebook_header.py"

Changed working directory to: /root/mtg-modeling


In [2]:
from src.data.mtgjson_fetcher import MtgJsonFetcher

DOWNLOAD_MTGJSON = True

In [3]:
card_fetcher = MtgJsonFetcher(dataset='AllPrintingsParquetFiles', save_root='data/raw/mtgjson')
if DOWNLOAD_MTGJSON:
    card_fetcher.fetch()

--2024-08-24 20:01:14--  https://mtgjson.com/api/v5/AllPrintingsParquetFiles.tar.gz
Resolving mtgjson.com (mtgjson.com)... 

Downloading AllPrintingsParquetFiles Data
Starting datetime: 2024-08-24 20:01:14.845559


104.21.64.186, 172.67.154.80
Connecting to mtgjson.com (mtgjson.com)|104.21.64.186|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 120607387 (115M) [application/octet-stream]
Saving to: ‘data/raw/mtgjson/AllPrintingsParquetFiles.tar.gz’

     0K ........ ........ ........ ........ 27% 2.80M 30s
 32768K ........ ........ ........ ........ 55% 2.75M 18s
 65536K ........ ........ ........ ........ 83% 2.79M 7s
 98304K ........ ..

# Grab MTGJSON All Price Data

Next we will download the data from [MTGJSON](https://mtgjson.com/downloads/all-files/).  The `AllPrices` card data only comes in json format.  We will have to convert this to parquet for ease of use in future analysis.  

Note this data only covers the previous 90 days.

Since the file is very large, I will use [polars](https://docs.pola.rs) instead of [pandas](https://pandas.pydata.org/docs/)

Data URLs:
- __Linux__: https://mtgjson.com/api/v5/AllPrices.json.gz
- __Windows__: https://mtgjson.com/api/v5/AllPrices.json.zip

The following code downloads and decompresses the data.

In [None]:
price_fetcher = MtgJsonFetcher(dataset='AllPrices.json', save_root='data/raw/mtgjson')
if DOWNLOAD_MTGJSON:
    price_fetcher.fetch()

Downloading AllPrices.json Data
Starting datetime: 2024-08-24 09:55:39.885955
Downloaded AllPrices.json Data
Final size: 255
Final path: data\raw\mtgjson3\AllPrices
Finished datetime: 2024-08-24 09:57:35.668784
