In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
df = pd.read_csv('./hearthstone_collectible_df.csv')
df.head().T

Unnamed: 0,0,1,2,3,4
artist,,,,,
attack,,,,,
cardId,HERO_09,HERO_01,HERO_07,HERO_08,HERO_06
cardSet,Basic,Basic,Basic,Basic,Basic
collectible,True,True,True,True,True
cost,,,,,
dbfId,813,7,893,637,274
durability,,,,,
faction,Neutral,Neutral,Neutral,Neutral,Neutral
flavor,,,,,


In [3]:
df.shape

(2012, 27)

## Cleaning

Things to clean up:
 - the elite column seems to be true if the card is a legendary, empty otherwise. Check this, and if so, it's extraneous information, so we can just drop the elite column.
 - Not every card has a race, so we can change NaNs to 'general', as is documented here:
  https://hearthstone.gamepedia.com/Minion
 - Change NaN  in 'text' to empty string (if the card has no text)
 - Faction is either 'Horde', 'Alliance', or 'Neutral' - change all NaNs to neutral.

In [4]:
df[(df['elite'].isna()) & (df['rarity'] == 'Legendary')]

Unnamed: 0,artist,attack,cardId,cardSet,collectible,cost,dbfId,durability,faction,flavor,...,name,playerClass,race,rarity,text,type,elite,classes,multiClassGroup,armor


In [5]:
# they're all the same - we can drop the elite column!

df.drop(columns = ['elite'], inplace = True)

Now let's change the NaNs to general in the race column:

In [6]:
df['race'].value_counts()

Beast        161
Mech         104
Elemental     77
Demon         47
Dragon        44
Murloc        33
Pirate        27
Totem          6
All            1
Name: race, dtype: int64

In [7]:
df['race'] = df['race'].fillna('General')

In [8]:
df['race'].value_counts().sum()

2012

And now the empty strings in text:

In [9]:
df['text'] = df['text'].fillna(' ')

In [10]:
df.text.isnull().sum()

0

Finally the empty strings in faction:

In [11]:
df['faction'] = df['faction'].fillna('Neutral')

In [12]:
df.faction.value_counts()

Neutral     1935
Alliance      51
Horde         26
Name: faction, dtype: int64

And we can drop the redundant 'collectible' column:

In [13]:
df.drop(columns = ['collectible'], inplace = True)

In [14]:
df.shape

(2012, 25)

In [15]:
df[df['durability'].notna()].T

Unnamed: 0,29,73,110,113,114,225,239,264,273,317,...,1831,1834,1851,1904,1908,1925,1942,1958,1959,1981
artist,Glenn Rane,Lucas Graciano,Ryan Sook,Stefan Kopinski,Brian Huang,Nate Bowden,Cyril Van Der Haegen,Daren Bader,Efrem Palacios,John Polidora,...,J. Axer,M. Alvares & M. Azevedo,Jason Kang,Vladimir Kafanov,Vlad Botos,Jakub Kasper,Jim Nelson,Jason Kang,Akkapoj T.,L. Lullabi & K. Turovec
attack,1,3,4,5,3,2,3,2,1,2,...,0,3,4,0,2,0,2,4,3,0
cardId,CS2_091,CS2_106,CS2_097,CS2_112,CS2_080,EX1_247,EX1_536,EX1_133,EX1_366,EX1_567,...,TRL_317,TRL_304,TRL_325,DAL_568,DAL_571,DAL_177,DAL_563,DAL_720,DAL_063,DAL_378
cardSet,Basic,Basic,Basic,Basic,Basic,Classic,Classic,Classic,Classic,Classic,...,Rastakhan's Rumble,Rastakhan's Rumble,Rastakhan's Rumble,Rise of Shadows,Rise of Shadows,Rise of Shadows,Rise of Shadows,Rise of Shadows,Rise of Shadows,Rise of Shadows
cost,1,3,4,5,5,2,3,3,3,5,...,5,5,6,2,2,3,4,4,4,6
dbfId,383,401,847,304,421,960,1662,391,643,352,...,50086,50014,50056,52490,52496,51971,52482,52617,51738,52089
durability,4,2,2,2,4,3,2,2,5,8,...,0,3,4,0,2,0,0,2,2,0
faction,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,...,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral,Neutral
flavor,Prince Malchezaar was a collector of rare weap...,"During times of tranquility and harmony, this ...","It Slices, it Dices. You can cut a tin can wit...",No… actually you should fear the Reaper.,Guaranteed to have been owned by a real assass...,"Yo, that's a nice axe.",First Lesson: Put the pointy end in the other ...,Perdition's Blade is Ragnaros's back-up weapon...,I dub you Sir Loin of Beef!,Orgrim Doomhammer gave this legendary weapon t...,...,"If you’re burning and you know it, wave your h...",Only two things in life are certain: death and...,"“Griftah here with de Sul’chop. One chop, you ...",Some might call this a lightforgery.,It knows every secret you have left.,If only you’d let it go to voicemail…,"Eager to please, even if it kills him.",Kobolds informally refer to its effect as a de...,Whoso pulleth out this wrench from the toolbox...,The Kirin Tor have always been lax in enforcin...
health,,,,,,,,,,,...,0,,,0,0,,2,,,


In [16]:
df.columns

Index(['artist', 'attack', 'cardId', 'cardSet', 'cost', 'dbfId', 'durability',
       'faction', 'flavor', 'health', 'howToGet', 'howToGetGold', 'img',
       'imgGold', 'locale', 'mechanics', 'name', 'playerClass', 'race',
       'rarity', 'text', 'type', 'classes', 'multiClassGroup', 'armor'],
      dtype='object')

In [17]:
df['cardSet'].value_counts()

Classic                         237
Basic                           142
Journey to Un'Goro              135
Rise of Shadows                 135
The Boomsday Project            135
Rastakhan's Rumble              135
Kobolds & Catacombs             135
Knights of the Frozen Throne    135
Whispers of the Old Gods        134
The Grand Tournament            132
Mean Streets of Gadgetzan       132
The Witchwood                   129
Goblins vs Gnomes               123
One Night in Karazhan            45
The League of Explorers          45
Blackrock Mountain               31
Naxxramas                        30
Hall of Fame                     22
Name: cardSet, dtype: int64

In [18]:
value_count_list = ['attack',  'cardSet', 'cost', 
       'durability', 'faction',  'health', 
       'playerClass', 'race',
       'rarity',  'type',  'classes', 'armor']

for column in value_count_list:
    print(f'The {column} values are: ')
    print(df[column].value_counts())
    print('')
    

The attack values are: 
2.0     335
3.0     308
4.0     208
1.0     195
5.0     167
6.0      73
0.0      69
7.0      52
8.0      33
9.0      14
10.0      4
12.0      3
20.0      1
Name: attack, dtype: int64

The cardSet values are: 
Classic                         237
Basic                           142
Journey to Un'Goro              135
Rise of Shadows                 135
The Boomsday Project            135
Rastakhan's Rumble              135
Kobolds & Catacombs             135
Knights of the Frozen Throne    135
Whispers of the Old Gods        134
The Grand Tournament            132
Mean Streets of Gadgetzan       132
The Witchwood                   129
Goblins vs Gnomes               123
One Night in Karazhan            45
The League of Explorers          45
Blackrock Mountain               31
Naxxramas                        30
Hall of Fame                     22
Name: cardSet, dtype: int64

The cost values are: 
3.0     365
2.0     361
4.0     316
5.0     248
1.0     239
6.0     

In [31]:
def f(x):
    if x in range(21):
        return 1
    else:
        return 0

In [32]:
df['attack_true'] = df['attack'].map(f)

In [33]:
df['attack_true'].value_counts()

1    1462
0     550
Name: attack_true, dtype: int64

In [None]:
pd.get_dummies(df, columns=['intl_plan','vmail_plan'], drop_first=True)

dummy columns for each feature that has a nan (attack, armor, health, etc)

dummy all the attack/health/durability/armor values


from the given cards find the best deck 
gym.ai

monte carlo simulations on decks (random plays)


network analysis (clustering)*** this is the way to go!