# Introduction
This work is inspired by [this paper](https://www.elie.net/publication/i-am-a-legend) from  Elie and Celine Bursztein and will try to reproduce their findings applying some different ideas.

#Cards data

## Load the JSON data

Let's start loading the game cards data from [Hearthstone JSON](http://hearthstonejson.com/) and loading it to python using the json library.

In [296]:
import json
import numpy
import os.path
import pandas
import re

all_sets_filename = os.path.join('data', 'AllSets.json')

# Uncomment the following lines to update the date file
#import urllib
#urllib.urlretrieve ('http://hearthstonejson.com/json/AllSets.json', all_sets_filename)

with open(all_sets_filename) as fp:
    all_card_sets = json.load(fp, encoding='utf-8')

## Collectible cards

Let's filter only the collective cards from all the available sets of cards.

In [297]:
all_cards = sum((v for k, v in all_card_sets.items()
                 if k not in ('Debug', 'Credits', 'Missions', 'System')), list())
# Select only collectible cards
all_collectible_cards = [card for card in all_cards
                         if u'collectible' in card and card['collectible']]
# Remove heroes
all_collectible_cards = [card for card in all_collectible_cards
                         if 'type' in card and card['type'] != 'Hero']
len(all_collectible_cards)

535

## Card tags

In [298]:
tags = set()
for card in all_collectible_cards:
    tags.update(set(card.keys()))
tags

{u'artist',
 u'attack',
 u'collectible',
 u'cost',
 u'durability',
 u'elite',
 u'faction',
 u'flavor',
 u'health',
 u'howToGet',
 u'howToGetGold',
 u'id',
 u'inPlayText',
 u'mechanics',
 u'name',
 u'playerClass',
 u'race',
 u'rarity',
 u'text',
 u'type'}

In [299]:
# Only interested in these tags for pricer purposes
interest_tags = {u'attack', u'cost', u'durability', u'health', u'id', u'mechanics', u'name',
                 u'playerClass', u'text', u'type'}
all_collectible_cards = [{k: v for k, v in card.items() if k in interest_tags}
                         for card in all_collectible_cards]

## Card mechanics

In [300]:
mechanics = set()
for card in all_collectible_cards:
    if 'mechanics' in card:
        mechanics.update(card['mechanics'])
mechanics

{u'AdjacentBuff',
 u'AffectedBySpellPower',
 u'Aura',
 u'Battlecry',
 u'Charge',
 u'Combo',
 u'Deathrattle',
 u'Divine Shield',
 u'Enrage',
 u'Freeze',
 u'HealTarget',
 u'ImmuneToSpellpower',
 u'Poisonous',
 u'Secret',
 u'Silence',
 u'Spellpower',
 u'Stealth',
 u'Taunt',
 u'Windfury'}

## Card types

In [301]:
types = set()
for card in all_collectible_cards:
    if 'type' in card:
        types.add(card['type'])
types

{u'Minion', u'Spell', u'Weapon'}

# The model
Card analysis will be done based on the following base model equation:
$$cost + intrinsic = attack * attack\_cost + health * health\_cost$$
The *intrinsic* value represents the cost of having *that* card in your deck and also can be viewed as the *slot_cost*.

## Modelling the cards
With the previous model, let's create a matrix with all the information to work with.

In [302]:
all_collectible_cards_df = pandas.DataFrame(all_collectible_cards)
all_collectible_cards_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 535 entries, 0 to 534
Data columns (total 10 columns):
attack         357 non-null float64
cost           535 non-null int64
durability     18 non-null float64
health         339 non-null float64
id             535 non-null object
mechanics      279 non-null object
name           535 non-null object
playerClass    306 non-null object
text           517 non-null object
type           535 non-null object
dtypes: float64(3), int64(1), object(6)
memory usage: 33.4+ KB


Let's add the intrinsic value to the matrix. Since it's in the left side of the equation, it's negative to move it to the right side for later calculation.

In [303]:
all_collectible_cards_df['intrinsic'] = -1

## Vanilla minions modelling
To test the model, let's extract the coefficients for *attack* and *health* with only the minions with no text (vanilla minions). Only neutral minions will be considered because class cards usually are better than the average to make a difference, and we want to model the core of the game.

In [304]:
vanilla_minions_df = pandas.DataFrame(
    all_collectible_cards_df[(all_collectible_cards_df['type'] == 'Minion') &
                             (all_collectible_cards_df['playerClass'].isnull()) &
                             (all_collectible_cards_df['text'].isnull())])
vanilla_minions_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14 entries, 238 to 517
Data columns (total 11 columns):
attack         14 non-null float64
cost           14 non-null int64
durability     0 non-null float64
health         14 non-null float64
id             14 non-null object
mechanics      0 non-null object
name           14 non-null object
playerClass    0 non-null object
text           0 non-null object
type           14 non-null object
intrinsic      14 non-null int64
dtypes: float64(3), int64(2), object(6)
memory usage: 1008.0+ bytes


In [305]:
def cards_pricing(df, columns, coeffs=None, print_coeffs=False, price_column=None):
    a = df.as_matrix(columns)
    if coeffs is None:
        b = df.as_matrix(['cost'])
        coeffs = numpy.linalg.lstsq(a, b)[0]
        if print_coeffs:
            print(pandas.DataFrame(coeffs.T, columns=columns))
    df[price_column or 'price'] = numpy.dot(a, coeffs).T[0]
    return coeffs

vanilla_columns = ['attack', 'health', 'intrinsic']
vanilla_coeffs = cards_pricing(vanilla_minions_df, vanilla_columns, print_coeffs=True)

     attack    health  intrinsic
0  0.583559  0.492024   0.750517


In [306]:
intrinsic = vanilla_coeffs[2][0]
vanilla_minions_df['ratio'] = (vanilla_minions_df['price'] -  vanilla_minions_df['cost']) / \
                              (vanilla_minions_df['cost'] + intrinsic)
vanilla_minions_df[['name', 'cost', 'price', 'ratio']].sort('ratio', ascending=False)

Unnamed: 0,name,cost,price,ratio
238,Wisp,0,0.325066,0.433122
502,Salty Dog,5,5.30249,0.052602
293,Boulderfist Ogre,6,6.195004,0.028887
474,Lost Tallstrider,4,4.135373,0.028496
295,Chillwind Yeti,4,4.043838,0.009228
299,Core Hound,7,6.961632,-0.00495
289,Bloodfen Raptor,2,1.984207,-0.005742
497,Puddlestomper,2,1.984207,-0.005742
517,Spider Tank,3,2.968255,-0.008464
400,War Golem,7,6.778562,-0.028571


These results can be bad to anyone with some experience, because *Wisp* is listed in the first place with a great distance to the second and *River Crocolisk* is on the low end. But this was only a example of how the model works. More complex examples below.

## Adding simple mechanics
To enrich the model, let's add simple mechanics to a minion-only matrix.

### Breaking the cards text
Card text must be broke into pieces to be able to select only the cards with simple mechanics.

In [307]:
simple_mechanics = set(['Charge', 'Stealth', 'Windfury', 'Taunt', 'Divine Shield'])

html_tag_pat = re.compile(r'</?[^>]+>')
def text_mechanics(text):
    # Remove presentation characters
    raw_text = html_tag_pat.sub('', text).replace('\n', ' ')
    raw_text = re.sub(r'[ ]+', ' ', raw_text)
    # Remove notes (between '(' and ')')
    replaced_text = raw_text  # re.sub(r'\([^\)]+\)', '', raw_text)
    # Remove the simple mechanics from the text (they will appear in the card text as well)
    for simple_mechanic in simple_mechanics:
        replaced_text = replaced_text.replace(simple_mechanic, '')
    replaced_text = replaced_text.replace('."', '"')
    replaced_text = replaced_text.strip(' .,')
    # if replaced_text:
    #     mechanics = map(lambda x: x.strip(), replaced_text.split('.'))
    return replaced_text
    # return set(map(lambda x: x.strip(), (x for x in replaced_text.split('.') if x)))

all_collectible_cards_df['text_mechanics'] = all_collectible_cards_df.apply(
    lambda row: (text_mechanics(row['text']) or None)
                if isinstance(row['text'], basestring)
                else None, axis=1)

mechanics_minions_df = pandas.DataFrame(
    all_collectible_cards_df[(all_collectible_cards_df['type'] == 'Minion') &
                             (all_collectible_cards_df['playerClass'].isnull()) &
                             (all_collectible_cards_df['text_mechanics'].isnull())])
mechanics_minions_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 47 entries, 18 to 520
Data columns (total 12 columns):
attack            47 non-null float64
cost              47 non-null int64
durability        0 non-null float64
health            47 non-null float64
id                47 non-null object
mechanics         33 non-null object
name              47 non-null object
playerClass       0 non-null object
text              33 non-null object
type              47 non-null object
intrinsic         47 non-null int64
text_mechanics    0 non-null object
dtypes: float64(3), int64(2), object(7)
memory usage: 3.5+ KB


### Charge
Charge mechanic is proportional to the *attack* value.

In [308]:
mechanics_minions_df['charge'] = mechanics_minions_df.apply(
    lambda row: row['attack']
    if (isinstance(row['mechanics'], list) and 'Charge' in row['mechanics'])
    else 0, axis=1)

### Taunt
Taunt mechanic is proportional to the *health* value because the minion must be killed first.

In [309]:
mechanics_minions_df['taunt'] = mechanics_minions_df.apply(
    lambda row: 1
    if (isinstance(row['mechanics'], list) and 'Taunt' in row['mechanics'])
    else 0, axis=1)

### Stealth
Stealth mechanic is proportional to the *attack* value because in almost every case the minion will be able to attack. Also it's inverse proportional to health (the less health, the more important it is).

In [310]:
mechanics_minions_df['stealth'] = mechanics_minions_df.apply(
    lambda row: 1 # row['attack'] / row['health']
    if (isinstance(row['mechanics'], list) and 'Stealth' in row['mechanics'])
    else 0, axis=1)

### Windfury
Windfury mechanic is proportional to the *attack* value because this mechanic allows to attack twice.

In [311]:
mechanics_minions_df['windfury'] = mechanics_minions_df.apply(
    lambda row: 1 # row['attack']
    if (isinstance(row['mechanics'], list) and 'Windfury' in row['mechanics'])
    else 0, axis=1)

### Divine Shield
mechanic is proportional to the *attack* value because in almost every case the minion will be able to attack twice. Also it's inverse proportional to health (the less health, the more important it is).

In [312]:
mechanics_minions_df['divine shield'] = mechanics_minions_df.apply(
    lambda row: 1 # row['attack'] / row['health']
    if (isinstance(row['mechanics'], list) and 'Divine Shield' in row['mechanics'])
    else 0, axis=1)

Let's price this bunch of minions

In [313]:
mechanics_columns = ['attack', 'health', 'intrinsic'] + map(lambda x: x.lower(), simple_mechanics)

# Price under the "vanilla coeffs"
cards_pricing(mechanics_minions_df, vanilla_columns, coeffs=vanilla_coeffs,
              price_column='vanilla_price')
# Compute and price under the "mechanics coeffs"
mechanics_coeffs = cards_pricing(mechanics_minions_df, mechanics_columns, print_coeffs=True,
                                 price_column='mechanics_price')

     attack    health  intrinsic     taunt    charge   stealth  windfury  \
0  0.635724  0.527588   1.065303  0.589275  0.577848  0.434125  1.329194   

   divine shield  
0       1.180883  


In [314]:
intrinsic = mechanics_coeffs[2][0]
mechanics_minions_df['ratio'] = (mechanics_minions_df['mechanics_price'] - 
                                 mechanics_minions_df['cost']) #/ \
                                #(mechanics_minions_df['cost'] + intrinsic)
mechanics_minions_df[['name', 'cost', 'vanilla_price', 'mechanics_price', 'ratio']].sort(
    'ratio', ascending=False)

Unnamed: 0,name,cost,vanilla_price,mechanics_price,ratio
195,Shieldbearer,1,1.217579,1.634323,0.634323
520,Target Dummy,0,0.233531,0.579147,0.579147
502,Salty Dog,5,5.30249,5.495119,0.495119
293,Boulderfist Ogre,6,6.195004,6.442158,0.442158
241,Young Dragonhawk,1,0.325066,1.427203,0.427203
414,Annoy-o-Tron,2,0.81709,2.395754,0.395754
299,Core Hound,7,6.961632,7.294155,0.294155
20,Argent Squire,1,0.325066,1.278892,0.278892
450,Force-Tank MAX,8,6.778562,8.258765,0.258765
474,Lost Tallstrider,4,4.135373,4.22367,0.22367


# Some statistics

## Breaking the text

In [315]:
html_tag_pat = re.compile(r'</?[^>]+>')
def text_breaker(text):
    # Remove presentation characters
    raw_text = html_tag_pat.sub('', text).replace('\n', ' ').replace('  ', ' ')
    # Remove comments
    uncommented_text = re.sub(r'\([^\)]+\)', '', raw_text)
    # replaced_text = re.sub(r'\.', '_', uncommented_text)
    replaced_text = uncommented_text.replace('.', '_')
    return set(map(lambda x: x.strip(), replaced_text.split('_')))

all_collectible_cards_df['text_mechanics'] = all_collectible_cards_df.apply(
    lambda row: text_breaker(row['text']) if isinstance(row['text'], basestring) else [] , axis=1)

#all_collectible_cards_df[~all_collectible_cards_df['text'].isnull() & all_collectible_cards_df['text'].str.contains('"')]['text'].values

## Short text mechanics

In [316]:
text_mechanics = set()
for card in all_collectible_cards_df['text_mechanics']:
    text_mechanics.update(card)
[x for x in text_mechanics if len(x) < 14]

[u'',
 u'Draw a card',
 u'Draw 4 cards',
 u'Charge',
 u"Can't Attack",
 u'Stealth',
 u'Draw 3 cards',
 u'Gain 5 Armor',
 u'Windfury',
 u'Horribly',
 u'Divine Shield',
 u'Charge Taunt',
 u'Draw 2 cards',
 u'"',
 u'Taunt',
 u'Then, it dies',
 u'Overload:']

## Vanilla test

In [317]:
known_mechanics = set(['Charge', 'Stealth', 'Windfury', 'Taunt', 'Divine Shield'])

vanilla_minions_df = pandas.DataFrame(
    all_collectible_cards_df[(all_collectible_cards_df['type'] == 'Minion')])

vanilla_minions_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 339 entries, 0 to 534
Data columns (total 12 columns):
attack            339 non-null float64
cost              339 non-null int64
durability        0 non-null float64
health            339 non-null float64
id                339 non-null object
mechanics         233 non-null object
name              339 non-null object
playerClass       110 non-null object
text              325 non-null object
type              339 non-null object
intrinsic         339 non-null int64
text_mechanics    339 non-null object
dtypes: float64(3), int64(2), object(7)
memory usage: 25.2+ KB


In [318]:
# Set/modify the pricing attributes based on mechanics
vanilla_minions_df['charge'] = vanilla_minions_df.apply(
    lambda row: row['attack'] if (isinstance(row['mechanics'], list) and 'Charge' in row['mechanics']) else 0, axis=1)
vanilla_minions_df['stealth'] = vanilla_minions_df.apply(
    lambda row: row['attack'] if (isinstance(row['mechanics'], list) and 'Stealth' in row['mechanics']) else 0, axis=1)
vanilla_minions_df['windfury'] = vanilla_minions_df.apply(
    lambda row: row['attack'] if (isinstance(row['mechanics'], list) and 'Windfury' in row['mechanics']) else 0, axis=1)
vanilla_minions_df['taunt'] = vanilla_minions_df.apply(
    lambda row: row['health'] if (isinstance(row['mechanics'], list) and 'Taunt' in row['mechanics']) else 0, axis=1)
vanilla_minions_df['divine_shield'] = vanilla_minions_df.apply(
    lambda row: row['attack'] if (isinstance(row['mechanics'], list) and 'Divine Shield' in row['mechanics']) else 0, axis=1)

In [319]:
vanilla_minions_df[['name', 'cost', 'attack', 'health', 'charge', 'stealth', 'windfury', 'taunt', 'divine_shield']]

Unnamed: 0,name,cost,attack,health,charge,stealth,windfury,taunt,divine_shield
0,Abomination,5,4,4,0,0,0,4,0
1,Abusive Sergeant,1,2,1,0,0,0,0,0
2,Acolyte of Pain,3,1,3,0,0,0,0,0
3,Al'Akir the Windlord,8,3,5,3,0,3,5,3
4,Alarm-o-Bot,3,0,3,0,0,0,0,0
5,Aldor Peacekeeper,3,3,3,0,0,0,0,0
6,Alexstrasza,9,8,8,0,0,0,0,0
7,Amani Berserker,2,2,3,0,0,0,0,0
9,Ancient Brewmaster,4,5,4,0,0,0,0,0
10,Ancient Mage,4,2,5,0,0,0,0,0


In [320]:
a = vanilla_minions_df.as_matrix(['attack', 'health', 'intrinsic', 'charge', 'stealth', 'windfury', 'taunt', 'divine_shield'])
b = vanilla_minions_df.as_matrix(['cost'])
cost_per_point = numpy.linalg.lstsq(a, b)[0]
cost_per_point

array([[ 0.51013597],
       [ 0.53345198],
       [-0.09774491],
       [ 0.15715708],
       [-0.0054474 ],
       [ 0.0519493 ],
       [-0.06106898],
       [ 0.30533601]])

In [321]:
card_value = cost_per_point[2][0]
vanilla_minions_df['value'] = numpy.dot(a, cost_per_point).T[0]
vanilla_minions_df['boost'] = (vanilla_minions_df['value'] + card_value) / \
    (vanilla_minions_df['cost'] + card_value)
vanilla_minions_df[['name', 'cost', 'value', 'boost']].sort('boost', ascending=False)

Unnamed: 0,name,cost,value,boost
274,Zombie Chow,1,2.718373,2.904531
90,Flame Imp,1,2.695057,2.878689
11,Ancient Watcher,2,4.805549,2.474854
71,Dust Devil,1,2.317453,2.460178
532,Warbot,1,2.208237,2.339130
359,Northshire Cleric,1,2.208237,2.339130
143,Mana Wyrm,1,2.208237,2.339130
148,Millhouse Manastorm,2,4.272097,2.194423
398,Voidwalker,1,2.025030,2.136075
195,Shieldbearer,1,1.987277,2.094233


## 2-cost minions

In [322]:
two_cost_minions_df = all_collectible_cards_df[(all_collectible_cards_df['type'] == 'Minion') &
                                               (all_collectible_cards_df['cost'] == 2)]
print('2-cost minion mean attack: {:.2f}'.format(two_cost_minions_df.attack.mean()))
print('2-cost minion mean health: {:.2f}'.format(two_cost_minions_df.health.mean()))

2-cost minion mean attack: 1.88
2-cost minion mean health: 2.45
