# Load M19 dataset
This notebook describes how to load the standardized M19 dataset.

In [1]:
import os
import pickle

import pandas as pd

import draftsimtools as ds

Show standardized dataset

In [2]:
data_folder = "./standardized_m19/"
display(os.listdir(data_folder))

['drafts_test.pkl',
 'drafts_tensor_test.pkl',
 'drafts_train.pkl',
 'drafts_tensor_train.pkl',
 'standardized_m19_rating.tsv']

### 1. Load draft ratings

In [3]:
cur_set = pd.read_csv(data_folder + 'standardized_m19_rating.tsv', delimiter="\t")
display(cur_set)

Unnamed: 0,Name,Casting Cost 1,Casting Cost 2,Card Type,Rarity,Rating,Color Vector
0,Abnormal_Endurance,1B,none,Instant,C,2.2,"[0, 0, 1, 0, 0]"
1,Act_of_Treason,2R,none,Spell,C,2.0,"[0, 0, 0, 1, 0]"
2,Aegis_of_the_Heavens,1W,none,Instant,U,1.9,"[1, 0, 0, 0, 0]"
3,Aerial_Engineer,2UW,none,Creature,U,3.0,"[1, 1, 0, 0, 0]"
4,Aether_Tunnel,1U,none,Spell,U,2.0,"[0, 1, 0, 0, 0]"
5,Aethershield_Artificer,3W,none,Creature,U,2.4,"[1, 0, 0, 0, 0]"
6,Ajani's_Last_Stand,2WW,none,Spell,R,3.1,"[2, 0, 0, 0, 0]"
7,Ajani's_Pridemate,1W,none,Creature,U,3.0,"[1, 0, 0, 0, 0]"
8,Ajani's_Welcome,W,none,Spell,U,1.7,"[1, 0, 0, 0, 0]"
9,Ajani_Adversary_of_Tyrants,2WW,none,Planeswalker,M,4.4,"[2, 0, 0, 0, 0]"


### 2. Load drafts with cardnames

In [4]:
def load_data(path):
    """
    Load a pickle file from disk. 
    """
    with open(path, "rb") as f:
        return pickle.load(f)

In [5]:
drafts_train = load_data(data_folder + 'drafts_train.pkl')
drafts_test = load_data(data_folder + 'drafts_test.pkl')

Show the train/test split

In [6]:
print(len(drafts_train), len(drafts_test))

86359 21590


Show the first 2 picks of the first draft

In [7]:
print(drafts_train[0][:2])

[['Volcanic_Dragon', 'Lena_Selfless_Champion', 'Exclusion_Mage', 'Gallant_Cavalry', 'Dragon_Egg', 'Manalith', 'Goblin_Instigator', 'Salvager_of_Secrets', 'Gearsmith_Guardian', 'Bogstomper', 'Disperse', 'Trumpet_Blast', 'Duress', 'Walking_Corpse', 'Island'], ['Departed_Deckhand', 'Star-Crowned_Stag', 'Meteor_Golem', 'Aviation_Pioneer', 'Oreskos_Swiftclaw', 'Abnormal_Endurance', 'Gearsmith_Guardian', 'Divination', 'Centaur_Courser', 'Mind_Rot', 'Blanchwood_Armor', 'Talons_of_Wildwood', 'Root_Snare', 'Mountain']]


### 3. Load drafts with indices 
The drafts_tensor format replaces cardnames with indices.

This was previously referred to as the "Intermediate Draft Representation"

In [8]:
drafts_tensor_train = load_data(data_folder + 'drafts_tensor_train.pkl')
drafts_tensor_test = load_data(data_folder + 'drafts_tensor_test.pkl')

Show size of draft tensors

In [9]:
print(drafts_tensor_train.shape, drafts_tensor_test.shape)

(86359, 45, 15) (21590, 45, 15)


Show the first 2 picks of the first draft

In [10]:
drafts_tensor_train[0, :2, :]

array([[258, 128,  74,  86,  61, 141,  96, 197,  88,  23,  54, 245,  65,
        260, 121],
       [ 48, 219, 146,  18, 166,   0,  88,  55,  31, 150,  20, 232, 192,
        153,   0]], dtype=int16)

### 4. Creating One Hot Encoded Dataset

The one hot encoded representation is useful for training most ML models. 

Currently, this representation is dynamically generated from the intermediate representation. 

In the future, the data may be serialized in this format to improve training performance. 

First, create a cardname -> index mapping

In [11]:
le = ds.create_le(cur_set["Name"].values)
print(le.classes_[:5])

['Abnormal_Endurance' 'Act_of_Treason' 'Aegis_of_the_Heavens'
 'Aerial_Engineer' 'Aether_Tunnel']


Then, define dynamically generated datasets

In [12]:
train_dataset = ds.DraftDataset(drafts_tensor_train, le)
test_dataset = ds.DraftDataset(drafts_tensor_test, le)

### 5. Using the One Hot Encoded Dataset

In [13]:
x, y = train_dataset[10]

#### Input Representation  
x is a vector of length 2n, where n is the number of cards in the set.  

x[:n] represent the counts of cards already picked by the user.  
x[n:2n] represent the cards in the current pack (1 if card is present in pack). 

In [14]:
print(x)

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 

#### Output Representation
y is a vector of length n
y[i]=1, where i is the index of the card picked by the user.

In [15]:
print(y)

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0])


#### Numpy Conversion
By default, x and y are torch tensors. They can converted to numpy arrays using x.numpy() and y.numpy().