# Preface
The goal is to create a recommendation system for MTG Players to be able to use. 

~~~
response = requests.get('https://api.scryfall.com/bulk-data') #to get all data API endpoints
~~~
~~~
    import json
        def jprint(obj):
            text = json.dumps(obj, sort_keys = True, indent = 4)
            print(text)
    
            jprint(response.json())
~~~

# Observations
* Brought in both oracle and card datasets. upon inspection, the oracle will be best to go with. Less repeated cards, easier to work with, unique id_identifiers

Ratio of missing values = the number of missing values / total number of observations * 100


* mtgo_foil_id is not needed.
* Flavor text is not needed. We can add a line about where to find all flavor text, or import into database at later time for search recommendations
* security_stamp is not needed.
* preview is not needed for modeling purposes
* arena_id could be useful for an online rec system
* watermark not needed
* produced mana can be removed. If not, we need to replace NaN values with just Not Applicable
* all parts could be viewed as a target for combo model
* object not needed. every object is a card-type object
* lang contains only 10 japanese cards out of 26000. Removed column.
* type can help fix color column. We can use type to tell if it's an artifact, then create a colorless condition
* mana_cost: could be fixed. It has a few cards that are duplicates. Dropped those. Then there are leftovers that are dual faced cards. Costs are different for each. We could fill these with cmc costs instead or look at cards individually for this issue. For now, dropped column.
* Fixed:
    * power 
    * toughness
    * edhrec_rank

In [1]:
import requests
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

In [9]:
class Data_Handling(object):
    def wrangle(filepath):
        """
        Wrangles in filepath. Tells the difference between JSON and CSV and creates dataframe.
    
        NA Values are handled as:
    
        edhrec_rank: replaces na values with incremental counter from the last recommended value to the length of the dataframe.
        power and toughness: na values represent cards that are non creature. Replaced with a zero.
        """
        if filepath.endswith(".json"):
            df = pd.read_json(filepath)
        else:
            df = pd.read_csv(filepath)
        
        #Fix NA Values for edhrec_rank
        if 'edhrec_rank' in df.columns:
            edh_fix = df[df['edhrec_rank'].isna() == True]
            counter = 22665 # Max rank + 1
    
            edh_fix.edhrec_rank = range(counter, (counter + len(edh_fix)))
            df.loc[edh_fix.index, :] = edh_fix[:]
            df['edhrec_rank'] = df['edhrec_rank'].astype(int)
    
        # Fix power column
        if 'power' in df.columns:
            df['power'].loc[df['power'].isna() == True] = 0
        
        # Fix Toughness Columns
        if 'toughness' in df.columns:
            df['toughness'].loc[df['toughness'].isna() == True] = 0
            
        # Fix CMC Column
        if 'cmc' in df.columns:
            df['cmc'].loc[17411] = 1.0
            df['cmc'] = df['cmc'].astype(int)
        
        if 'oracle_id' in df.columns:
            df.set_index('oracle_id', inplace=True)
        
        if 'colors' in df.columns:
            df['colors'].fillna(0, inplace=True)

        return df
    def drop_cols(df):
        # Drops all columns with greater than 35% NA values
        #Drops mtgo_id column, which has a high number of NA values as well
        drop_cols = [col for col in df.columns if (df[col].isna().sum() / len(df) *100) > 35]
        drop_cols.append('mtgo_id')
        df.drop(columns = drop_cols, inplace=True)
        
        return df
    
    def modeling_prep_mtg_oracle(df):
        # Drop columns for modeling purposes
        drop_cols = ['id', 'multiverse_ids', 'tcgplayer_id', 'cardmarket_id', 'lang', 'object', 
                     'released_at', 'uri', 'scryfall_uri', 'layout', 'highres_image', 'image_status', 
                     'image_uris', 'games', 'frame', 'full_art', 'textless', 'booster', 'story_spotlight', 'prices',
                     'legalities', 'reserved', 'foil', 'nonfoil', 'card_back_id', 'artist', 'artist_ids', 'illustration_id', 
                     'border_color', 'oversized', 'finishes', 'scryfall_set_uri', 'rulings_uri', 'promo', 'set_uri', 'set_search_uri', 
                     'reprint', 'variation', 'set_id', 'prints_search_uri', 'collector_number', 'digital']
        df.drop(columns = drop_cols, inplace= True)

        return df

In [10]:
df = Data_Handling.wrangle('https://c2.scryfall.com/file/scryfall-bulk/oracle-cards/oracle-cards-20220412211242.json')
df = Data_Handling.modeling_prep_mtg_oracle(df)
df = Data_Handling.drop_cols(df)
df.head()

Unnamed: 0_level_0,name,mana_cost,cmc,type_line,oracle_text,colors,color_identity,keywords,set,set_name,set_type,rarity,edhrec_rank,related_uris,power,toughness
oracle_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0004ebd0-dfd6-4276-b4a6-de0003e94237,Static Orb,{3},3,Artifact,"As long as Static Orb is untapped, players can...",[],[],[],7ed,Seventh Edition,core,rare,2631,{'gatherer': 'https://gatherer.wizards.com/Pag...,0,0
0006faf6-7a61-426c-9034-579f2cfcfa83,Sensory Deprivation,{U},1,Enchantment — Aura,Enchant creature\nEnchanted creature gets -3/-0.,[U],[U],[Enchant],m14,Magic 2014,core,common,21574,{'gatherer': 'https://gatherer.wizards.com/Pag...,0,0
0007c283-5b7a-4c00-9ca1-b455c8dff8c3,Road of Return,{G}{G},2,Sorcery,Choose one —\n• Return target permanent card f...,[G],[G],[Entwine],c19,Commander 2019,commander,rare,4080,{'gatherer': 'https://gatherer.wizards.com/Pag...,0,0
000d5588-5a4c-434e-988d-396632ade42c,Storm Crow,{1}{U},2,Creature — Bird,Flying (This creature can't be blocked except ...,[U],[U],[Flying],9ed,Ninth Edition,core,common,12416,{'gatherer': 'https://gatherer.wizards.com/Pag...,1,2
000e5d65-96c3-498b-bd01-72b1a1991850,Walking Sponge,{1}{U},2,Creature — Sponge,{T}: Target creature loses your choice of flyi...,[U],[U],[],ulg,Urza's Legacy,expansion,uncommon,18864,{'gatherer': 'https://gatherer.wizards.com/Pag...,1,1


### More EDA/Cleaning: For after modeling

* Change color values for blue from U to B, then remove them from a list.
* Fix mana values for all cards.

In [3]:
# df['colors'] = df['colors'].str[0]
# df['colors'] = df['colors'].str.replace('U', 'B')

In [2]:
# mana = df[df['mana_cost'].isna() == True]
# mana_keep = mana[mana['mtgo_id'].isna() == False]
# mana_drop = mana[mana['mtgo_id'].isna() == True]

# mana_keep[mana_keep['mana_cost'].isna() == True]