# **Predicción del win rate**

En este notebook utilizaremos una base de datos adicional a la propuesta en la descripción del problema. En este caso usaremos la base de datos disponible en https://www.kaggle.com/datasets/gaborfodor/hearthstone-decks, la cual cuenta con el winrate de mazos que han sido utilizados en un mínimo de 1000 partidas, de manera que nos entrega una herramienta muy útil para la predicción del mismo.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import sklearn
import warnings
import pickle

from sklearn.preprocessing import LabelEncoder
from sklearn.impute import KNNImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import f1_score
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder

## Predicción de winrate

En esta sección adaptamos un random forest para predecir el win rate de mazos, lo cuál será necesario para la comparación en la calidad de los mazos recomendados.

### BDD win rate
A continuación cargamos la base de datos que usaremos para estimar el win rate

In [2]:
data = pd.read_csv("data/top_hearthstone_decks_20200221.csv")
data["wr"] = data["wr"]*100
data.describe()

Unnamed: 0,dust,wr,games,duration,wins
count,736.0,736.0,736.0,736.0,736.0
mean,10952.608696,54.141848,5912.092391,7.989402,3285.07894
std,5012.298316,3.547108,17250.58778,1.649571,9865.974255
min,1300.0,37.1,1000.0,4.9,459.8
25%,6695.0,52.9,1400.0,6.7,739.375
50%,10660.0,55.0,2400.0,8.05,1297.75
75%,14935.0,56.6,4625.0,9.0,2527.65
max,21280.0,60.2,350000.0,13.5,201250.0


In [3]:
data.head()

Unnamed: 0,type,dust,wr,games,duration,card_0,card_1,card_2,card_3,card_4,...,card_22,card_23,card_24,card_25,card_26,card_27,card_28,card_29,wins,hero
0,Resurrect Priest,9840,60.2,1100,11.0,Forbidden Words,Penance,Bad Luck Albatross,Breath of the Infinite,Grave Rune,...,,,,,,,,,662.2,Priest
1,Dragon Hunter,5280,59.8,43000,5.6,Blazing Battlemage,Dwarven Sharpshooter,Tracking,Corrosive Breath,Faerie Dragon,...,,,,,,,,,25714.0,Hunter
2,Highlander Hunter,15600,59.3,1800,7.2,Blazing Battlemage,Crystallizer,Dwarven Sharpshooter,Springpaw,Tracking,...,Dragonmaw Poacher,Houndmaster Shaw,Faceless Corruptor,Zilliax,Veranus,Dinotamer Brann,Siamat,Dragonqueen Alexstrasza,1067.4,Hunter
3,Dragon Hunter,5340,59.1,3000,5.8,Blazing Battlemage,Dwarven Sharpshooter,Tracking,Corrosive Breath,Faerie Dragon,...,,,,,,,,,1773.0,Hunter
4,Mech Paladin,6240,58.9,1100,5.5,Blessing of Wisdom,Crystology,Glow-Tron,Hot Air Balloon,Mecharoo,...,,,,,,,,,647.9,Paladin


### BDD original del proyecto

In [3]:
url = 'data/data.csv'
data_for_rec = pd.read_csv(url)
data_for_rec.head()

Unnamed: 0,craft_cost,date,deck_archetype,deck_class,deck_format,deck_id,deck_set,deck_type,rating,title,...,card_20,card_21,card_22,card_23,card_24,card_25,card_26,card_27,card_28,card_29
0,9740,2016-02-19,Unknown,Priest,W,433004,Explorers,Tavern Brawl,1,Reno Priest,...,374,2280,2511,2555,2566,2582,2683,2736,2568,2883
1,9840,2016-02-19,Unknown,Warrior,W,433003,Explorers,Ranked Deck,1,RoosterWarrior,...,1781,1781,2021,2021,2064,2064,2078,2510,2729,2736
2,2600,2016-02-19,Unknown,Mage,W,433002,Explorers,Theorycraft,1,Annoying,...,1793,1801,1801,2037,2037,2064,2064,2078,38710,38710
3,15600,2016-02-19,Unknown,Warrior,W,433001,Explorers,,0,Standart pay to win warrior,...,1657,1721,2018,2296,2262,336,2729,2729,2736,2760
4,7700,2016-02-19,Unknown,Paladin,W,432997,Explorers,Ranked Deck,1,Palamix,...,2027,2029,2029,2064,2078,374,2717,2717,2889,2889


In [5]:
data_for_rec[data_for_rec["rating"]>=10]

Unnamed: 0,craft_cost,date,deck_archetype,deck_class,deck_format,deck_id,deck_set,deck_type,rating,title,...,card_20,card_21,card_22,card_23,card_24,card_25,card_26,card_27,card_28,card_29
121,4320,2016-02-19,Unknown,Druid,W,432795,Explorers,Ranked Deck,20,Cereza's Midrange Druid - European Winter Prel...,...,1784,1914,2064,2064,2078,2262,2792,2792,38319,38319
137,8120,2016-02-19,Miracle Rogue,Rogue,S,432773,Explorers,Ranked Deck,303,Standard Miracle Rogue,...,1117,1158,1158,268,268,1651,2884,2884,38403,38578
145,2640,2016-02-19,Unknown,Druid,W,432750,Explorers,Ranked Deck,42,F2P Mech Druid To LEGEND w/ gameplay & guide,...,2053,2053,2064,2064,2070,2070,2782,2782,2792,2792
261,4200,2016-02-18,Unknown,Warlock,W,432518,Explorers,Ranked Deck,118,[S23+24 Legend EU] Cycle Zoo,...,2013,2078,2093,2093,2288,2288,2724,2895,2895,2949
262,1100,2016-02-18,Unknown,Hunter,W,432517,Explorers,Ranked Deck,71,69.6% winrate hybrid hunter,...,1783,1783,2011,2011,2064,2064,2260,2260,2490,2641
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
346138,6060,2015-04-01,Unknown,Mage,W,218088,Undertaker Nerf,Ranked Deck,35,Echo of Medivh HEALER mage INSANE FUN AND EFFE...,...,1941,2037,2037,2042,2044,2044,2057,2057,2078,2262
346143,13680,2014-05-03,Unknown,Druid,W,50395,Live Patch 5170,,82,Team Five,...,2279,2500,2509,2511,2521,2683,2910,2925,38526,38669
346147,8240,2014-05-03,Unknown,Mage,S,50427,Live Patch 5170,Ranked Deck,20,The new Mage !,...,790,790,825,825,906,1004,1004,1080,1087,374
346203,6640,2016-12-07,Pirate Rogue,Rogue,S,697067,Gadgetzan,Ranked Deck,35,Rufeng's #1 China Legend Pirate Rogue (27-1),...,2490,2490,2715,2715,2767,39698,39698,40465,40608,40608


In [4]:
import json
url = 'data/refs.json'
with open(url, 'r', encoding='utf-8') as file:
    data_dict = json.load(file)
# cantidad de cartas
print(len(data_dict))
data_dict[5]

3117


{'cardClass': 'NEUTRAL',
 'cost': 2,
 'dbfId': 19292,
 'id': 'LOEA15_2',
 'name': 'Unstable Portal',
 'playerClass': 'NEUTRAL',
 'set': 'LOE',
 'text': '<b>Hero Power</b>\nAdd a random minion to your hand. It costs (3) less.',
 'type': 'HERO_POWER'}

Ahora combinamos las cartas disponibles en ambas bases de datos para transformar las cartas en una variable numérica:

In [6]:
data_dict_total_cards = []
for i in range(len(data_dict)):
  if data_dict[i]['id']!='PlaceholderCard':
    data_dict_total_cards.append(data_dict[i]['name'])
data_dict_total_cards = set(data_dict_total_cards)
print(len(data_dict_total_cards))

2460


In [7]:
total_cards1 = []
for i in range(30):
  total_cards1.append(data[f'card_{i}'].unique())
total_cards2 = set(total_cards1[0])
for i in range(1,30):
  diff = set(total_cards1[i])-total_cards2
  total_cards2 = total_cards2.union(diff)
print(f'totalcards2 = ',len(total_cards2))

totalcards2 =  672


In [8]:
total_cards = list(data_dict_total_cards.union(total_cards2))
print(len(total_cards))

2952


In [9]:
name_to_position = {name: idx for idx, name in enumerate(total_cards)}
for i in range(30):
  data[f'card_{i}'] = data[f'card_{i}'].map(name_to_position)
  data[f'card_{i}'] = data[f'card_{i}'].fillna(-1).astype(int)

In [None]:
data.head()
X_num = data.drop(columns=["wr","games","duration","type","wins","hero"])
X_cat = pd.DataFrame(data["hero"])
Y = data["wr"]

encoder = OneHotEncoder(sparse_output=False, drop=None) 
encoded_categorical = encoder.fit_transform(X_cat)
encoded_df = pd.DataFrame(encoded_categorical, columns=encoder.get_feature_names_out(X_cat.columns))
final_df = pd.concat([X_num, encoded_df], axis=1)
print(final_df)

X_train, X_test, Y_train, Y_test = train_test_split(final_df, Y, test_size = 0.3, random_state = 42)
rf = RandomForestRegressor(n_estimators = 150, max_features = 'sqrt', max_depth = 40, random_state = 18).fit(X_train, Y_train)

      dust  card_0  card_1  card_2  card_3  card_4  card_5  card_6  card_7  \
0     9840    2422    1885    1555    1199       3     562    1998     799   
1     5280    1715      82     447     944    1941    2384     279    1164   
2    15600    1715    2907      82    1399     447    1941     279    2439   
3     5340    1715      82     447     944    1941    2384     279    1164   
4     6240    2288    2307    1470    1171    1487    2609    1834    2147   
..     ...     ...     ...     ...     ...     ...     ...     ...     ...   
731   9800    1122    2608     209    1427    2008     245    1199    2688   
732  17300    1485    1527     908    2917    2874    1300    2343    1229   
733  15200    2943    1836    1601    2818    2428    1271    1765    1035   
734   6440    1485    1527     908    2917     331    1682    2343    1229   
735   9640    2874    2410    1639    1229      48    2781    1982    1743   

     card_8  ...  card_29  hero_Druid  hero_Hunter  hero_Mage  

In [12]:
with open("encoder", "wb") as f:
    pickle.dump(encoder, f)

In [13]:
X_train.shape

(515, 40)

In [14]:
prediction = rf.predict(X_test)

In [15]:
from sklearn.metrics import mean_squared_error, r2_score
mse = mean_squared_error(Y_test, prediction)
rmse = mse**.5
print(mse)
print(rmse)

6.717061173613324
2.5917293789308564


In [16]:
import pickle

# Save the model
with open("win_rate_RF.pkl", "wb") as f:
    pickle.dump(rf, f)

### Pedir predicción de un mazo cualquiera

In [None]:
mazo_gen_num = data_for_rec.drop(columns=['date', 'deck_archetype', 'deck_class', 'deck_format',
       'deck_id', 'deck_set', 'deck_type', 'rating', 'title', 'user']).iloc[125].to_frame().transpose().reset_index()
mazo_gen_cat = pd.DataFrame(data_for_rec["deck_class"]).iloc[125].to_frame()
encoded_categorical = encoder.transform(mazo_gen_cat)
encoded_df = pd.DataFrame(encoded_categorical, columns=encoder.get_feature_names_out())
final_df = pd.concat([mazo_gen_num, encoded_df], axis=1).drop(columns=["index"]).rename(columns={"craft_cost":"dust"})
final_df
rf.predict(final_df)



array([51.81033333])

In [None]:
mazo_gen_num = data_for_rec.drop(columns=['date', 'deck_archetype', 'deck_class', 'deck_format',
       'deck_id', 'deck_set', 'deck_type', 'rating', 'title', 'user']).reset_index()
mazo_gen_cat = pd.DataFrame(data_for_rec["deck_class"]).rename(columns={"deck_class":"hero"})
encoded_categorical = encoder.transform(mazo_gen_cat)
encoded_df = pd.DataFrame(encoded_categorical, columns=encoder.get_feature_names_out())
final_df = pd.concat([mazo_gen_num, encoded_df], axis=1).drop(columns=["index"]).rename(columns={"craft_cost":"dust"})
final_df
results = rf.predict(final_df)

In [19]:
print(np.mean(results),np.max(results),np.min(results))

51.850054636174754 56.049999999999976 47.37666666666662


## NN

In [10]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
import numpy as np
data.head()
X_num = data.drop(columns=["wr","games","duration","type","wins","hero"])
X_cat = pd.DataFrame(data["hero"])
Y = data["wr"]

encoder = OneHotEncoder(sparse_output=False, drop=None)
encoded_categorical = encoder.fit_transform(X_cat)
encoded_df = pd.DataFrame(encoded_categorical, columns=encoder.get_feature_names_out(X_cat.columns))
final_df = pd.concat([X_num, encoded_df], axis=1)
print(final_df)

X_train, X_test, y_train, y_test = train_test_split(final_df, Y, test_size=0.2, random_state=42)
model = keras.Sequential([
    layers.Input(shape=(40,)),
    layers.Dense(32, activation='relu'),
    layers.GaussianDropout(0.6),
    layers.Dense(50, activation='sigmoid'),
    layers.GaussianDropout(0.6),
    layers.Dense(32, activation='relu'),
    layers.Dense(1)
])
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))
test_loss = model.evaluate(X_test, y_test)
print(f"Test Loss: {test_loss:.4f}")

      dust  card_0  card_1  card_2  card_3  card_4  card_5  card_6  card_7  \
0     9840     820     322    1638    2395     720     401    2770     701   
1     5280    2032    1898     916    2539    1145     607    2376    1640   
2    15600    2032    2079    1898    1304     916    1145    2376    2073   
3     5340    2032    1898     916    2539    1145     607    2376    1640   
4     6240     597    2175     291     238    1730     142     631     519   
..     ...     ...     ...     ...     ...     ...     ...     ...     ...   
731   9800     594    2614    1954    1413    2506    1274    2395    1864   
732  17300    1291     601    1559    1193    1182     850    2921    1325   
733  15200    1717    1296    1087     954    2578    2627    1406    2779   
734   6440    1291     601    1559    1193    1551    1199    2921    1325   
735   9640    1182    2160    1151    1325    1417    1496    1825    1350   

     card_8  ...  card_29  hero_Druid  hero_Hunter  hero_Mage  

In [None]:
mazo_gen_num = data_for_rec.drop(columns=['date', 'deck_archetype', 'deck_class', 'deck_format',
       'deck_id', 'deck_set', 'deck_type', 'rating', 'title', 'user']).reset_index()
mazo_gen_cat = pd.DataFrame(data_for_rec["deck_class"]).rename(columns={"deck_class":"hero"})
encoded_categorical = encoder.transform(mazo_gen_cat)

encoded_df = pd.DataFrame(encoded_categorical, columns=encoder.get_feature_names_out())
final_df = pd.concat([mazo_gen_num, encoded_df], axis=1).drop(columns=["index"]).rename(columns={"craft_cost":"dust"})
final_df
results = model.predict(final_df)

[1m10820/10820[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 1ms/step


In [22]:
print(np.mean(results),np.max(results),np.min(results))

54.19741 60.23579 45.583202


In [None]:
model.save("win_rate_NN.keras")