### Predicting Pokemon stats with ML

In this notebook we train an XGBoost model to predict one of the Pokemon stats given the other stats, the Pokemon type, and whether the Pokemon is Legendary.

In [1]:
import pandas as pd
import numpy as np
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV

In [2]:
data = pd.read_csv('../../data/poke_data/Pokemon.csv')
data.head()

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


In [3]:
# First drop the number, Name, Total, and Generation columns
data = data.drop(columns=['#', 'Name', 'Total', 'Generation'])
data.head()

Unnamed: 0,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Legendary
0,Grass,Poison,45,49,49,65,65,45,False
1,Grass,Poison,60,62,63,80,80,60,False
2,Grass,Poison,80,82,83,100,100,80,False
3,Grass,Poison,80,100,123,122,120,80,False
4,Fire,,39,52,43,60,50,65,False


In [4]:
# Map Legendary to 1 and Non-Legendary to 0
data['Legendary'] = data['Legendary'].map({False: 0, True: 1})

# One-hot encode the Type 1 and Type 2 columns
data = data.fillna('None')  # Fill NaN values in Type columns with 'None'
data = pd.get_dummies(data, columns=['Type 1', 'Type 2'], drop_first=False)
data.head()

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Legendary,Type 1_Bug,Type 1_Dark,Type 1_Dragon,...,Type 2_Grass,Type 2_Ground,Type 2_Ice,Type 2_None,Type 2_Normal,Type 2_Poison,Type 2_Psychic,Type 2_Rock,Type 2_Steel,Type 2_Water
0,45,49,49,65,65,45,0,False,False,False,...,False,False,False,False,False,True,False,False,False,False
1,60,62,63,80,80,60,0,False,False,False,...,False,False,False,False,False,True,False,False,False,False
2,80,82,83,100,100,80,0,False,False,False,...,False,False,False,False,False,True,False,False,False,False
3,80,100,123,122,120,80,0,False,False,False,...,False,False,False,False,False,True,False,False,False,False
4,39,52,43,60,50,65,0,False,False,False,...,False,False,False,True,False,False,False,False,False,False


To start we predict the Attack stat

In [11]:
# Split the data into features and target variable
X = data.drop(columns=['Attack'])
y = data['Attack']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

In [12]:
# Define and train a test model
model = XGBRegressor(n_estimators=1000, learning_rate=0.01, random_state=42)
model.fit(X_train, y_train)

# Get the RSME
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE: {rmse:.2f}')
# The RMSE of about 25 is not that bad, given Attack can vary widely from about 5 to 190

RMSE: 24.08


In [13]:
# Do a grid search to find the best hyperparameters
param_grid = {
    'n_estimators': [100, 500, 1000, 2000],
    'learning_rate': [0.005, 0.01, 0.05, 0.1]
}

model = XGBRegressor(random_state=42)
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=5, verbose=1)
grid_search.fit(X_train, y_train)
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = np.sqrt(-grid_search.best_score_)
print(f'Best parameters: {best_params}')
print(f'Best RMSE: {best_score:.2f}')

Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best parameters: {'learning_rate': 0.1, 'n_estimators': 100}
Best RMSE: 22.63


In [14]:
best_model = grid_search.best_estimator_
# Evaluate the best model on the test set
y_pred_best = best_model.predict(X_test)
rmse_best = np.sqrt(mean_squared_error(y_test, y_pred_best))
print(f'Best model RMSE on test set: {rmse_best:.2f}')

Best model RMSE on test set: 24.12


The best parameters from the grid search happen to give a worse test set error, but given the very small difference we will just keep those parameters. This RMSE of about 23/24 is kinda decent I think. For Pokemon with a low Attack stat (like 40) is a relatively large error while for a large stat (like 120) it's not.

Let's check if this is a sort of average error, and the predictions for Pokemon with low stats would have smaller errors while for higher stats they are larger.

In [15]:
# Determine median Attack
median_attack = data['Attack'].median()
print(f'Median Attack: {median_attack}')

# Select data from the test sets with Attack below and above that
y_test_below_median = y_test[y_test < median_attack]
y_test_above_median = y_test[y_test >= median_attack]
X_test_below_median = X_test.loc[y_test < median_attack]
X_test_above_median = X_test.loc[y_test >= median_attack]

# Apply the model to both datasets
y_pred_below_median = best_model.predict(X_test_below_median)
y_pred_above_median = best_model.predict(X_test_above_median)
rmse_below_median = np.sqrt(mean_squared_error(y_test_below_median, y_pred_below_median))
rmse_above_median = np.sqrt(mean_squared_error(y_test_above_median, y_pred_above_median))
print(f'RMSE for Attack below median: {rmse_below_median:.2f}')
print(f'RMSE for Attack above median: {rmse_above_median:.2f}')

Median Attack: 75.0
RMSE for Attack below median: 19.99
RMSE for Attack above median: 27.51


So indeed the error seems to scale with the Attack stat itself. Then we would have an error of about 23/24 on an Attack of about 75, which is about 1/3 of the stat. This is larger than I'd like, so improvements would be nice; we attempt this in another notebook.

For now let's do the same for the other stats as well.

In [9]:
stats = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
for stat in stats:
    X = data.drop(columns=[stat])
    y = data[stat]
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
    
    model = XGBRegressor(random_state=42)
    grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=5, verbose=1)
    grid_search.fit(X_train, y_train)
    
    best_score = np.sqrt(-grid_search.best_score_)
    print(f'Best RMSE for {stat}: {best_score:.2f}')

Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for HP: 22.66
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for Attack: 22.63
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for Defense: 23.12
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for Sp. Atk: 22.35
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for Sp. Def: 20.83
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best RMSE for Speed: 23.22


It seems like all errors are fairly close to each other, being about 20-something. So the same conclusions apply to the other stats as to the Attack stat.