# Scryfall Data Fetcher

This program extracts magic the gathering (mtg) card information and prices from the Scryfall rest-API.

The idea is to get the bulk data in order to be able to store the all cards information (more info than the API will return in a normal query). If you need help understanding the data structure or you would like to query only for specific info, then look the Jupyter Notebook call "mtg_cards_query.ipynb". In there is an example of how to do that.

In [1]:
#Import libreries
import requests
import pandas as pd
from datetime import datetime

import sqlite3

### Setup

In [2]:
DEBUG = False

#Current date
now = datetime.now()
date_time = str(now.strftime("%Y%m%d"))

#Change display limit for big data frames
#pd.set_option('display.max_colwidth', None)

### Getting Bulk Data

Downloading the full card json that is generated every day.
More info about it in: https://scryfall.com/docs/api/bulk-data

In [3]:
api_response = requests.get('https://api.scryfall.com/bulk-data/default-cards')
print(api_response.status_code) #should be 200=ok

200


In [4]:
#Translate the response object's content from bytes to dictionary object so 
#it can be easily manipulate it
api_json = api_response.json() #converts from bytes to dictionary

#[Debug]
if DEBUG:
    print(api_json['download_uri'])

In [5]:
#Get cards information from bulk data in the responsed url.
bulk_url_json = requests.get(api_json['download_uri'], allow_redirects=True)

##Save data as json file in disk
#file_size = open('default_cards.json', 'wb').write(bulk_url_json.content)
#[Debug]
#if DEBUG:
    #print("Json downloaded correctly. File size: " + str(file_size))

##Save data to pandas
#Load json data to a pandas Data frame.
url_json = bulk_url_json.json() #converts from bytes to dictionary
cards_bulk_df = pd.DataFrame(url_json)

#[Debug]
if DEBUG:
    cards_bulk_df.to_csv('initial_bulk.csv', sep=";", index=False) 

#[Debug]
if DEBUG:
    #All columns types and info
    cards_bulk_df.info(verbose=True)
    #Summary of the first 3 rows
    print(cards_bulk_df.head(3))

## Save Data to local DB

| id | name | mana_cost | cmc | power | toughness | reserved | foil | nonfoil | set_id | set_name | rarity | frame | date_time | small | normal | usd | usd_foil | usd_etched | eur | eur_foil |
| :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: |
| 6c900f34-1bd2-43c7-be33-cf5cc02a62ea | Replenish | {3}{W} | 4.0 |  |  | True | False | True | 632741a4-411d-4110-b507-5a5cfdd52ef2 | World Championship Decks 2000 | rare | 1997 | 20220417 | https://c1.scryfall.com/file/scryfall-cards/small/front/6/c/6c900f34-1bd2-43c7-be33-cf5cc02a62ea.jpg?1562767042 | https://c1.scryfall.com/file/scryfall-cards/normal/front/6/c/6c900f34-1bd2-43c7-be33-cf5cc02a62ea.jpg?1562767042 | 40.15 |  |  | 25.00 | | 
| 7fd2fe13-bbc0-42b7-bc42-3b51910ce118 | Replenish | {3}{W} | 4.0 |  |  | True | True | True | 44f17b37-dcf8-4239-baab-1efc00cd3480 | Urza's Destiny | rare | 1997 | 20220417 | https://c1.scryfall.com/file/scryfall-cards/small/front/7/f/7fd2fe13-bbc0-42b7-bc42-3b51910ce118.jpg?1562444251 | https://c1.scryfall.com/file/scryfall-cards/normal/front/7/f/7fd2fe13-bbc0-42b7-bc42-3b51910ce118.jpg?1562444251 | 114.42 | 925.99 |  | 74.24 | 571.50 |

### DataFrame Cleaning and Preparation

First we need to filter the data that we don't want and keep the rest.

In [6]:
columns_list = ['id', 'name', 'image_uris', 'mana_cost', 'cmc', 'power',
                'toughness', 'reserved', 'foil', 'nonfoil', 'set_id',
                'set_name', 'rarity', 'frame', 'prices', 'date_time']

#General Cleaning
#---------------------------------------------------
#Discard any card that's not in english
cards_df_flt = cards_bulk_df.loc[cards_bulk_df["lang"] == "en"]
#Discard the cards without image
cards_df_flt = cards_df_flt.dropna(subset = ['image_uris'])
#Adds date_time to the Data frame
cards_df_flt['date_time'] = date_time
#Keep only necessary columns
cards_df_flt = cards_df_flt[columns_list]

#[Debug]
if DEBUG:
    #All columns types and info
    cards_df_flt.info(verbose=True)

Now we split the columns with dictionary-like objects to obtain a "plain table" and store it on the DB.

In [7]:
#Split dictonary column for cards images
cards_df_flt = cards_df_flt.join(cards_df_flt.image_uris.apply(pd.Series), how='left')

#Split dictonary column for cards prices
cards_df_flt = cards_df_flt.join(cards_df_flt.prices.apply(pd.Series), how='left')

#Drop innecesary columns after de split
cards_df_flt.drop(["image_uris", "prices", "large", "png", "art_crop", "border_crop", "tix"], axis=1, inplace=True)

### Divide DataFrame into two different tables

The DB is divided into two tables:
    
    -scryfall_cards : Holds the cards static info (name, set, images, rarity, etc)
    -scryfall_daily_prices : Holds the cards columns variable in a daily basis (regular card price, foil price, etc)

In [8]:
prices_df = cards_df_flt.copy(deep=True)
prices_df = prices_df[['id', 'usd', 'usd_foil', 'usd_etched', 'eur', 'eur_foil', 'date_time']]
prices_df.rename(columns = {'id':'card_id'}, inplace = True)
#[Debug]
if DEBUG:
    print("----------------- scryfall_daily_prices -----------------")
    print(prices_df.head(3))

cards_static_info_df = cards_df_flt.copy(deep=True)
cards_static_info_df = cards_static_info_df[['id', 'name', 'mana_cost', 'cmc', 'power', 'toughness', 
                                             'reserved', 'foil', 'nonfoil', 'set_id', 'set_name', 
                                             'rarity', 'frame', 'small', 'normal', 'date_time']]
cards_static_info_df.rename(columns = {'id':'card_id', 'date_time':'last_update'}, inplace = True)
#[Debug]
if DEBUG:
    print("--------------------- scryfall_cards ---------------------")
    print(cards_static_info_df.head(3))

#### Connect to the DB

In [9]:
#Generates (or creates if it doesn't exist) the db connection.
conn = sqlite3.connect("sqlite_db/mtg_cards.db")

#### Load DataFrame to table

In [10]:
#Load pandas Data frame into scryfall_cards table. Append Because it will add new prices each day.
prices_df.to_sql(name="scryfall_daily_prices", con=conn, if_exists="append", index=False)

#Load pandas Data frame into scryfall_cards table. Replace existing because is static info about the cards.
cards_static_info_df.to_sql(name="scryfall_cards", con=conn, if_exists="replace", index=False)

64248

### Testing result

In [11]:
#SQL query for testing
sql_result = conn.execute(
    """ SELECT
            *
        FROM scryfall_cards
        WHERE
            name = 'Replenish'
;""")

#Obtains the table column names because sqlite query returns only the data.
colums_names = [column[0] for column in sql_result.description]
#Creates a pandas dataframe qith the query data and the column names.
sql_df= pd.DataFrame.from_records(data = sql_result.fetchall(), columns = colums_names)

sql_df.head(5)

Unnamed: 0,card_id,name,mana_cost,cmc,power,toughness,reserved,foil,nonfoil,set_id,set_name,rarity,frame,small,normal,last_update
0,6c900f34-1bd2-43c7-be33-cf5cc02a62ea,Replenish,{3}{W},4.0,,,1,0,1,632741a4-411d-4110-b507-5a5cfdd52ef2,World Championship Decks 2000,rare,1997,https://c1.scryfall.com/file/scryfall-cards/sm...,https://c1.scryfall.com/file/scryfall-cards/no...,20220421
1,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,{3}{W},4.0,,,1,1,1,44f17b37-dcf8-4239-baab-1efc00cd3480,Urza's Destiny,rare,1997,https://c1.scryfall.com/file/scryfall-cards/sm...,https://c1.scryfall.com/file/scryfall-cards/no...,20220421


In [12]:
#SQL query for testing
sql_result = conn.execute(
    """ SELECT
            *
        FROM scryfall_daily_prices
        WHERE
            card_id = '6c900f34-1bd2-43c7-be33-cf5cc02a62ea'
            OR
            card_id = '7fd2fe13-bbc0-42b7-bc42-3b51910ce118'
;""")

#Obtains the table column names because sqlite query returns only the data.
colums_names = [column[0] for column in sql_result.description]
#Creates a pandas dataframe qith the query data and the column names.
sql_df= pd.DataFrame.from_records(data = sql_result.fetchall(), columns = colums_names)

sql_df.head(5)

Unnamed: 0,card_id,usd,usd_foil,usd_etched,eur,eur_foil,date_time
0,6c900f34-1bd2-43c7-be33-cf5cc02a62ea,40.15,,,25.0,,20220417
1,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,114.42,925.99,,74.24,571.5,20220417
2,6c900f34-1bd2-43c7-be33-cf5cc02a62ea,40.15,,,25.0,,20220418
3,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,114.42,925.99,,74.24,571.5,20220418
4,6c900f34-1bd2-43c7-be33-cf5cc02a62ea,40.15,,,25.0,,20220419


In [19]:
#SQL query for testing
sql_result = conn.execute(
    """ SELECT
            scryfall_cards.card_id,
            scryfall_cards.name,
            scryfall_cards.set_name,
            scryfall_cards.reserved,
            scryfall_cards.rarity,
            scryfall_cards.normal,
            scryfall_daily_prices.usd,
            scryfall_daily_prices.usd_foil,
            scryfall_daily_prices.date_time
        FROM 
            scryfall_cards
        LEFT JOIN 
            scryfall_daily_prices
            ON 
                scryfall_cards.card_id = scryfall_daily_prices.card_id
        WHERE
            scryfall_cards.name = 'Replenish'
            AND
            scryfall_cards.set_name = 'Urza''s Destiny'
;""")


#Obtains the table column names because sqlite query returns only the data.
colums_names = [column[0] for column in sql_result.description]
#Creates a pandas dataframe qith the query data and the column names.
sql_df= pd.DataFrame.from_records(data = sql_result.fetchall(), columns = colums_names)

sql_df.head(10)

Unnamed: 0,card_id,name,set_name,reserved,rarity,normal,usd,usd_foil,date_time
0,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,Urza's Destiny,1,rare,https://c1.scryfall.com/file/scryfall-cards/no...,114.42,925.99,20220417
1,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,Urza's Destiny,1,rare,https://c1.scryfall.com/file/scryfall-cards/no...,114.42,925.99,20220418
2,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,Urza's Destiny,1,rare,https://c1.scryfall.com/file/scryfall-cards/no...,114.42,925.99,20220419
3,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,Urza's Destiny,1,rare,https://c1.scryfall.com/file/scryfall-cards/no...,115.18,590.0,20220420
4,7fd2fe13-bbc0-42b7-bc42-3b51910ce118,Replenish,Urza's Destiny,1,rare,https://c1.scryfall.com/file/scryfall-cards/no...,115.55,590.0,20220421


#### Close connection to DB

In [22]:
#Close connection
conn.close()

In [20]:
#SQL query for testing
sql_result = conn.execute(
    """ SELECT
            scryfall_cards.card_id,
            scryfall_cards.name,
            scryfall_cards.set_name,
            scryfall_cards.reserved,
            scryfall_cards.rarity,
            scryfall_cards.normal,
            scryfall_daily_prices.usd,
            scryfall_daily_prices.usd_foil,
            scryfall_daily_prices.date_time
        FROM 
            scryfall_cards
        LEFT JOIN 
            scryfall_daily_prices
            ON 
                scryfall_cards.card_id = scryfall_daily_prices.card_id
;""")


#Obtains the table column names because sqlite query returns only the data.
colums_names = [column[0] for column in sql_result.description]
#Creates a pandas dataframe qith the query data and the column names.
sql_df= pd.DataFrame.from_records(data = sql_result.fetchall(), columns = colums_names)

sql_df.to_csv('scryfall_' + date_time + '.csv', sep=";", index=False)