# Introduction
In this notebook I'm going to try and predict the type(s) of Pokemon based on other features, such as their ability, weight, height and stats. Originally I'd wanted to do a multi-class, multi-label model, but I was having trouble getting that to work. Instead, I decided to settle for building the Type 1 & Type 2 predictions separately.

I found a dataset on Kaggle to start off with, but soon realised it was missing useful features like Egg group(s), and contained errors due to how the data scraping was done.

I found a much more complete set of data online, which I used to enhance my model:  https://github.com/veekun/pokedex/tree/master/pokedex/data/csv

Some of the problems in the original data were related to Pokemon with multiple forms, like Alolan forms. Since multiple heights & weights were listed, they'd been extracted as NAN, so needed correcting. These Pokemon also had problems related to their typings, with normal & Alolan types mixed up. For simplicity, I reverted everything to its normal form.

Additionally, Genderless Pokemon had NAN values for their male percentage, since it wasn't listed in terms of a male/female ratio. To handle this, I changed the values to 0, and made a new Genderless feature.

Finally, the original dataset did not cover the handful of new Pokemon from Ultra Sun / Ultra Moon, beyond Magearna. For simplicity, I deleted these entries from veekun's data, rather than trying to fill in the missing entries. I might lose a tiny bit of predictive power, but there are still 801 other Pokemon left.

Throughout, I used an XGBoost model to make my predictions. I started by looking at predicting Type 1 on a .9/.1 train/test split, before fitting to the entire Pokedex. Then I tried predicting each generation based on the other 6. Finally, I repeated this, but instead looking at Type 2. I had originally expected that stat distributions would be important features, but I instead found that some surprising results.

I hope you all find this interesting! I'm still learning Python, so apologies for instances of bad code practice (I'm sure there are many!), and would be happy to know better or more elegant solutions.

# Preparations
Firstly, I loaded a bunch of packages,  in case I wanted to run a variety of different models. Since I just settled on using XGBoost, some of them never got used.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
import gc
import time
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
import xgboost as xgb
from xgboost import plot_importance
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeClassifierCV,RidgeClassifier,LogisticRegression,LogisticRegressionCV
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.ensemble import ExtraTreesClassifier, RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
from sklearn.preprocessing import OneHotEncoder
from sklearn import metrics
import seaborn as sns
print(os.listdir("../input"))
from sklearn import tree
from sklearn.grid_search import GridSearchCV
from sklearn.feature_selection import SelectFromModel
import warnings
warnings.filterwarnings("ignore")

# Any results you write to the current directory are saved as output.


# Loading and Modifying Data

I started by loading the data from veekun, and the data that I found on Kaggle. The former of which was much more thorough, and already contained multiple categorical features encoded as numbers, such as abilities or shape. Both sets required some degree of checking, cleaning and reorganizing. With veekun's data this mostly meant pulling out the relevant information I wanted, and reformatting it. For example, Ability data was split over multiple rows if the Pokemon had more than one ability.

The data I found on Kaggle mostly needed cleaning, because of how they'd scraped information from Serebii (a famous Pokemon website). For example, how Alolan Pokemon were handled lead to errors.

So I wasn't working with too many features, I decided to leave all categorical data that I used in numerical form, rather than using one-hot encoding. 3 Ability slots with the potential for hundreds of abilities would have made those features too unwieldy.

From veekun's data I decided that Egg groups, Abilities, and miscellaneous information like Colour ID would be useful to include. Egg group for example should help finding Water and Flying Pokemon, whilst many abilities are directly tied to a Pokemon's type. Since veekun's data included Pokemon above Magearna, and in some cases alternative forms or Mega Evolutions, I had to trim off the ends of the data in order to fit with my original set. Where necessary, I also renamed columns to make merging and understanding easier.

veekun's species data had a lot of overlap with the Kaggle user's set, but with some useful extra columns. Namely, color_id, shape_id and habitat_id. For example, lots of Fire Pokemon are Red, so that might be useful in the models. I extracted these and added them into my dataset.

Finally, veekun's data had the abilities encoded numerically, and labelled based on their slot. As a note, all Pokemon can have up to 3 abilities, with the 3rd being a special Hidden ability, usually only available under certain conditions / promotions. As with the egg data, I converted the duplicate rows to additional columns to get Ability 1-3.

Notable problems in the original data are mostly due to certain entries having odd formatting. For example, I had to correct Minior's capture rate, due to two listings. There were similar problems with Alolan Pokemon, or others with multiple forms. If more than one height / weight was listed, the scraping returned NAN, and if the Alolan form had extra types compared to the original versions, this was incorrectly pulled from the pages. I corrected all of these to make the data more useable.

Next, the original data contained lots of features directly indicative of a Pokemon's type, such as their effectiveness against other types, so those had to be dropped.

Finally, I opted to just use all of the numerical data, in the hope it would be enough to make reasonable models.

In [None]:
#Read data
path = '../input/'
egg_df=pd.read_csv(path+"egg-group-data-for-pokemon/pokemon_egg_groups.csv")
species_df=pd.read_csv(path+"pokemon-species/pokemon_species.csv")
abilities_df=pd.read_csv(path+"abilities/pokemon_abilities.csv")

#Split duplicates off & combine back
egg2_df=pd.DataFrame.copy(egg_df)
egg2_df=egg_df.loc[egg_df['species_id'].duplicated(), :]
egg_df.drop_duplicates('species_id',inplace=True)
merged = egg_df.merge(egg2_df,on="species_id",how='outer')
merged.fillna(0,inplace=True)

#Rename columns to simpler form.
merged.rename(index=str,columns={"egg_group_id_x":"egg_group_1"},inplace=True)
merged.rename(index=str,columns={"egg_group_id_y":"egg_group_2"},inplace=True)

#Drop last 6 columns
merged.drop(merged.tail(6).index,inplace=True)

#Rename
merged.rename(index=str,columns={"species_id":"pokedex_number"},inplace=True)

#Make a new smaller dataframe
species_trim_df=pd.DataFrame()
species_trim_df["pokedex_number"]=species_df['id']
species_trim_df["color_id"]=species_df['color_id']
species_trim_df["shape_id"]=species_df['shape_id']
species_trim_df["habitat_id"]=species_df['habitat_id']
species_trim_df.drop(species_trim_df.tail(6).index,inplace=True)

#Trim all below Magearna off
abilities_df = abilities_df[abilities_df.pokemon_id < 802]

#Make 3 new columns
abilities_df["Ability1"]=0
abilities_df["Ability2"]=0
abilities_df["Ability3"]=0

#Assign values to the 3 columns based on the ability slot (1-3)
abilities_df["Ability1"] = abilities_df.ability_id.where(abilities_df.slot == 1,0)
abilities_df["Ability2"] = abilities_df.ability_id.where(abilities_df.slot == 2,0)
abilities_df["Ability3"] = abilities_df.ability_id.where(abilities_df.slot == 3,0)

#Split duplicates off into new dataframes 
#3 abilities on some means it needs to be split twice
#I'm sure there's an easier way to do this
abilities_df2=pd.DataFrame.copy(abilities_df)
abilities_df2=abilities_df.loc[abilities_df['pokemon_id'].duplicated(), :]
abilities_df.drop_duplicates('pokemon_id',inplace=True)
abilities_df3=pd.DataFrame.copy(abilities_df2)
abilities_df3=abilities_df2.loc[abilities_df2['pokemon_id'].duplicated(), :]
abilities_df2.drop_duplicates('pokemon_id',inplace=True)

#Drop extra columns
abilities_df.drop(['ability_id','is_hidden','slot'],axis=1,inplace=True)
abilities_df2.drop(['ability_id','is_hidden','slot'],axis=1,inplace=True)
abilities_df3.drop(['ability_id','is_hidden','slot'],axis=1,inplace=True)

#Combine everything back
abilities_df=abilities_df.set_index('pokemon_id').add(abilities_df2.set_index('pokemon_id'),fill_value=0).reset_index()
abilities_df=abilities_df.set_index('pokemon_id').add(abilities_df3.set_index('pokemon_id'),fill_value=0).reset_index()

#Rename pokemon_id to pokedex number to allow for merging.
abilities_df.rename(index=str,columns={"pokemon_id":"pokedex_number"},inplace=True)

#Read Kaggle data
path = '../input/'
pokemon_df=pd.read_csv(path+"pokemon/pokemon.csv")

Name_df=pd.DataFrame()
Name_df["name"]=pokemon_df["name"].copy()

#Fix Minior's capture rate
pokemon_df.capture_rate.iloc[773]=30

#Change the type
pokemon_df['capture_rate']=pokemon_df['capture_rate'].astype(str).astype(int)

#Merge all my data.
pokemon_df=pokemon_df.merge(merged,on="pokedex_number",how='outer')
pokemon_df=pokemon_df.merge(species_trim_df,on="pokedex_number",how='outer')
pokemon_df=pokemon_df.merge(abilities_df,on="pokedex_number",how='outer')

#Remove against columns
pokemon_df.drop(list(pokemon_df.filter(regex = 'against')), axis = 1, inplace = True)
#Correct the spelling error
pokemon_df.rename(index=str,columns={"classfication":"classification"},inplace=True)

#Change nan to 'none'
pokemon_df.type2.replace(np.NaN, 'none', inplace=True)

#Drop Pokedex number for now
pokemon_df.drop("pokedex_number",axis=1,inplace=True)
pokemon_df.drop("generation",axis=1,inplace=True)

#First find the NAs.
index_height = pokemon_df['height_m'].index[pokemon_df['height_m'].apply(np.isnan)]
index_weight = pokemon_df['weight_kg'].index[pokemon_df['weight_kg'].apply(np.isnan)]
index_male   = pokemon_df['percentage_male'].index[pokemon_df['percentage_male'].apply(np.isnan)]

#Manually replace the missing heights & weights using the Kanto version etc
pokemon_df.height_m.iloc[18]=0.3
pokemon_df.height_m.iloc[19]=0.7
pokemon_df.height_m.iloc[25]=0.8
pokemon_df.height_m.iloc[26]=0.6
pokemon_df.height_m.iloc[27]=1.0
pokemon_df.height_m.iloc[36]=0.6
pokemon_df.height_m.iloc[37]=1.1
pokemon_df.height_m.iloc[49]=0.2
pokemon_df.height_m.iloc[50]=0.7
pokemon_df.height_m.iloc[51]=0.4
pokemon_df.height_m.iloc[52]=1.0
pokemon_df.height_m.iloc[73]=0.4
pokemon_df.height_m.iloc[74]=1.0
pokemon_df.height_m.iloc[75]=1.4
pokemon_df.height_m.iloc[87]=0.9
pokemon_df.height_m.iloc[88]=1.2
pokemon_df.height_m.iloc[102]=2.0
pokemon_df.height_m.iloc[104]=1.0
pokemon_df.height_m.iloc[719]=0.5
pokemon_df.height_m.iloc[744]=0.8

pokemon_df.weight_kg.iloc[18]=3.5
pokemon_df.weight_kg.iloc[19]=18.5
pokemon_df.weight_kg.iloc[25]=30.0
pokemon_df.weight_kg.iloc[26]=12.0
pokemon_df.weight_kg.iloc[27]=29.5
pokemon_df.weight_kg.iloc[36]=9.9
pokemon_df.weight_kg.iloc[37]=19.9
pokemon_df.weight_kg.iloc[49]=0.8
pokemon_df.weight_kg.iloc[50]=33.3
pokemon_df.weight_kg.iloc[51]=4.2
pokemon_df.weight_kg.iloc[52]=32.0
pokemon_df.weight_kg.iloc[73]=20.0
pokemon_df.weight_kg.iloc[74]=105.0
pokemon_df.weight_kg.iloc[75]=300.0
pokemon_df.weight_kg.iloc[87]=30.0
pokemon_df.weight_kg.iloc[88]=30.0
pokemon_df.weight_kg.iloc[102]=120.0
pokemon_df.weight_kg.iloc[104]=45.0
pokemon_df.weight_kg.iloc[719]=9.0
pokemon_df.weight_kg.iloc[744]=25.0

#Create a Genderless column to separate them from the all-female cases.
pokemon_df["Genderless"]=0
pokemon_df["Genderless"].loc[list(index_male)]=1

#Replace all the NANs with zeros in the % male
pokemon_df.percentage_male.replace(np.NaN, 0, inplace=True)

#Check the typings of the pokemon with Alolan forms & fix
#I'm sure this can be done much more elegantly
pokemon_df.type2.iloc[18]='none'
pokemon_df.type2.iloc[19]='none'
pokemon_df.type2.iloc[25]='none'
pokemon_df.type2.iloc[26]='none'
pokemon_df.type2.iloc[27]='none'
pokemon_df.type2.iloc[36]='none'
pokemon_df.type2.iloc[37]='none'
pokemon_df.type2.iloc[49]='none'
pokemon_df.type2.iloc[50]='none'
pokemon_df.type2.iloc[51]='none'
pokemon_df.type2.iloc[52]='none'
pokemon_df.type2.iloc[87]='none'
pokemon_df.type2.iloc[88]='none'
pokemon_df.type2.iloc[104]='none'

#Lets start with just the numerical data for now.
num_features=pokemon_df.select_dtypes(include=np.number)
num_features=num_features.columns

print("The Type models will be built using the following features")
print(list(num_features))

These include, all 6 of a Pokemon's stats, and their combined total, how many steps it takes for an egg of that type to hatch, how happy the Pokemon is when you catch it, how hard the Pokemon is to catch, how much experience they need to reach level 100, their height and weight, their male ratio and if they are genderless, if they are legendary or not, their egg groups and abilities, their primary color, their body shape and where they live.

# Features & Targets

With my numerical features now decided, I created new dataframes that would contain just the features and the targets, in the latter case Type 1 and Type 2 of the Pokemon.

In [None]:
features=pd.DataFrame()
targets=pd.DataFrame()
targets2=pd.DataFrame()
features[num_features]=pokemon_df[num_features]
targets["type1"]=pokemon_df["type1"]
targets=np.ravel(targets)
targets2["type2"]=pokemon_df["type2"]
targets2=np.ravel(targets2)

# XGBoost for Type 1 with a 0.9/0.1 train/test split

For my very first model, I did a 0.9/0.1 training / test split on the full Pokedex, and measured the accuracy of an XGBoost model at guessing Type 1 of the Pokemon in the Test set.

Before going further, it's worth setting some benchmarks for prediction. There are 18 types, so random guessing would have about a 5% accuracy. The most common type is Water, at about 14%, so another strategy would be to just guess Water all the time. Anything better than that is an improvement.

In [None]:
#Train Test Split
train_features,test_features,train_targets,test_targets = train_test_split(features,targets,test_size=0.1,random_state=1)

I started by fitting a model based on all the features, and trying to optimise the XGBoost parameters. For my dataset I found that most of these had no effect, or a detrimental effect on my results. The most significant improvement I found was setting the maximum depth to 5.

It soon became clear that it was relatively easy to reach 100% accuracy on the training set, with only a couple of features. However, this would only get about 40% accuracy on the test set, suggesting underfitting. On the other hand, if all the features were used

Most of these barely affected the results, but I found that using a maximum depth of 5 would improve my results, so continued to use this value throughout.

Since it's very likely that using all the features would lead to overfitting on the training data, I used forwards and backwards searches for feature selection, to improve my Test accuracy. I found that it was very easy to get 100% accuracy on the training set, with only a few features, but that this would only lead to about 40% accuracy on the Test set. At the other end, using all the features would overfit to the training data.

To find a balance, I did feature selection to find the best test accuracy.

In [None]:
use_feat=list(['egg_group_1','Ability1','egg_group_2','base_egg_steps','color_id',
              'Ability3','base_happiness'])
#82.72 % accuracy
#Best sub-set of features on the training set.

In [None]:
#XGB parameters
model_xgb=xgb.XGBClassifier(learning_rate =0.1,
 n_estimators=1000,
 max_depth=5,
 min_child_weight=1,
 gamma=0,
 subsample=0.8,
 colsample_bytree=0.8,
 #objective= 'binary:logistic',
 nthread=4,
 scale_pos_weight=1,
 seed=27,
 reg_alpha=0,
 reg_lambda=1,
)

In [None]:
model_xgb.fit(train_features[use_feat], train_targets)
train_pred=model_xgb.predict(train_features[use_feat])
test_pred = model_xgb.predict(test_features[use_feat])

# evaluate predictions
train_accuracy = accuracy_score(train_targets, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(test_targets, test_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
sns.set(font_scale=0.8)
xgb.plot_importance(model_xgb)

After non-exhaustive searching, my best result was 82.72% test accuracy (with 99.03 % train accuracy), using a model with 7 features, which will all likely need an explanation for non-experts.

The features are: Egg group 1, Egg Group 2, Ability 1,  Ability 3, base egg steps, color id and base happiness.

It's obvious how some of these features are related to a Pokemon's types, but not others.

Egg Groups 1 & 2 indicate which other Pokemon a specific species is capable of breeding with. For example, there are several different Water egg groups, which primarily include Water Pokemon. Similarly for groups like Bug, Fairy & Flying. Others like Field group are less obvious, but could be used to exclude non terrestrial Pokemon.

Ability 1 & 3 are the special abilities a Pokemon can have, with 1 being their normal ability, and 3 being their Hidden ability, usually obtained under special conditions. It's no surprise that these work well, because fire Pokemon often have abilities that power up fire moves and so on. So even though the algorithm doesn't know an ability improve the attack power of fire Pokemon, it can see that it's always on fire Pokemon. Some other abilities are found on a variety of different Pokemon, meaning extra information is needed to improve the predicitons.

Color ID is associated with the main colour of the Pokemon. It doesn't cover every single colour (for example I don't think there's one for orange), but will give a slight indication of the Pokemon's appearance. Many times, a Pokemon's colour scheme is fairly simplistic, with fire normally red, grass green and water blue. Some Pokemon will however completely throw this off, like Scizor, who is a Bug/Steel that is bright red. This could be useful in combination with other features.

Base egg steps is a measure of how many steps it would take to hatch an egg of that type. Some Pokemon are very quick, others very slow. At first, you might wonder how on earth this is related to a Pokemon's type, since the rate at which eggs hatch will usually be all over the place for any given type. However, there are certain types, like Dragons, which tend to have slower hatching eggs, so might help separate them out from other types.

The oddest inclusion is the base happiness, which I didn't even realised varied before I started this project. In Pokemon, happiness is a measure of how much the Pokemon likes you, and can power up certain moves, or trigger certain evolutions. Base happiness is how happy they are when you catch them. The default value is 70, but certain types can be higher (Fairies) or lower (Dark), and some special Pokemon can even have 0.

In [None]:
# Output a plot of the confusion matrix.
labels =list(set(test_targets))
cm = metrics.confusion_matrix(test_targets, test_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

Another way to check how the model performs is to look at the confusion matrix for the test data. It's clear that most types get a perfect match, but others are less successful. The worst of which is Flying, which always gets misclassified as Dragon. This is potentially because many flying Pokemon have flying as their 2nd type, with the 1st type being relatively rare.

Other notable problems are seen for Ground and Psychic, which are misclassified as several other types.

A few of the mistakes are not surprising, given the similarities between the types, and the fact that they are usually associated with each other. For example, the ice/water and steel/rock confusion.

# XGBoost on the Full Pokedex

Before going any further, I decided it would be a good idea to explore fitting to the entire Pokedex, without worrying about making any new predictions.  As mentioned above it became clear that a good model for the training data could be built using very few features.

In [None]:
#Best 3 feature: for the final feature Defense /  HP / Sp Defense / Speed @ 100%
#use_feat = list(['Ability1','weight_kg','defense'])

#What about if I exclude abilities, since I feel these are too closely linked to Type?
use_feat = list(['weight_kg','base_total','defense'])
# Special attack also works

In [None]:
sns.set(font_scale=0.8)
model_xgb.fit(features[use_feat], targets)
y_pred = model_xgb.predict(features[use_feat])
# evaluate predictions
accuracy = accuracy_score(targets, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
xgb.plot_importance(model_xgb)

I managed to find two subsets of features, which could both reach 100% accuracy with only 3 features. 

In the first case, Ability 1, Weight and then one out of either Defense, HP, Special Defense or Speed was all you needed. In fact, a 2 feature model could already reach over 99% accuracy, with the choice of several stats helping with the last 1 or 2 classifications.

In the second case, Weight, total base stats, and one out of Defense or Special attack could also get 100% accuracy.

As mentioned above, it's not surprising that Ability 1 is a good predictor, and it makes intuitive sense that their stats would be important in some way. For example, most electric Pokemon are faster than most rock or steel Pokemon.

What surprised me most was how useful the weight proved to be. I knew that, for example, Ghosts are usually lighter than Rock or Steel types, but did not think there was a significant difference between other types, like Fire and Water.

# XGBoost for each of the 7 Generations of Pokemon

My next step was to make Type 1 predictions for all of the Pokemon within a certain generation, based on a model trained on the other 6 generations.
I wanted to see which generations were easier or harder to predict compared to the others, and thus make a statement on which generations were more normal or unusual.

It quickly became apparent that the subset of features that gave the highest accuracy could vary from generation to generation, meaning that it is difficult to select a subset that will perform well for all 7 generations.

If  I used all of the features, I was able to get accuracies of  61.59, 62.00, 48.90, 52.34, 48.08, 38.89 and 42.5 % on generations 1 to 7 respectively. From this alone, it's clear that generations 1 & 2 can be modelled more accurately, and that later generations are harder to model, possibly due to more diverse typings and designs.

With these as the baseline, I then performed feature selection for all 7 generations, to find the best accuracy. Here I report the best feature selections I was able to find, but cannot guarantee they are the absolute minimum.  Since my XGBoost parameters include some degree of subsampling, I had not realised at the time that feature order could sometimes matter. I have not properly explored this effect in this data, due to the large time commitment it would involve.

In [None]:
#Split features & targets into each generation.
Gen1_features=features[0:151]
Gen2_features=features[151:251]
Gen3_features=features[251:386]
Gen4_features=features[386:493]
Gen5_features=features[493:649]
Gen6_features=features[649:721]
Gen7_features=features[721:801]
Gen1_targets=targets[0:151]
Gen2_targets=targets[151:251]
Gen3_targets=targets[251:386]
Gen4_targets=targets[386:493]
Gen5_targets=targets[493:649]
Gen6_targets=targets[649:721]
Gen7_targets=targets[721:801]
Gen1_targets=np.ravel(Gen1_targets)
Gen2_targets=np.ravel(Gen2_targets)
Gen3_targets=np.ravel(Gen3_targets)
Gen4_targets=np.ravel(Gen4_targets)
Gen5_targets=np.ravel(Gen5_targets)
Gen6_targets=np.ravel(Gen6_targets)
Gen7_targets=np.ravel(Gen7_targets)

In [None]:
#Recombine 6 of them, in 7 different ways, to make my different training sets
#Ordering of the features & targets should be the same!
#But doesn't have to be necessarily in numerical order
Gens_not1_features=pd.concat([Gen2_features,Gen3_features,Gen4_features,Gen5_features,Gen6_features,Gen7_features],axis=0)
Gens_not2_features=pd.concat([Gen1_features,Gen3_features,Gen4_features,Gen5_features,Gen6_features,Gen7_features],axis=0)
Gens_not3_features=pd.concat([Gen2_features,Gen1_features,Gen4_features,Gen5_features,Gen6_features,Gen7_features],axis=0)
Gens_not4_features=pd.concat([Gen2_features,Gen3_features,Gen1_features,Gen5_features,Gen6_features,Gen7_features],axis=0)
Gens_not5_features=pd.concat([Gen2_features,Gen3_features,Gen4_features,Gen1_features,Gen6_features,Gen7_features],axis=0)
Gens_not6_features=pd.concat([Gen2_features,Gen3_features,Gen4_features,Gen5_features,Gen1_features,Gen7_features],axis=0)
Gens_not7_features=pd.concat([Gen2_features,Gen3_features,Gen4_features,Gen5_features,Gen6_features,Gen1_features],axis=0)
Gens_not1_targets=np.concatenate((Gen2_targets,Gen3_targets,Gen4_targets,Gen5_targets,Gen6_targets,Gen7_targets),axis=0)
Gens_not2_targets=np.concatenate((Gen1_targets,Gen3_targets,Gen4_targets,Gen5_targets,Gen6_targets,Gen7_targets),axis=0)
Gens_not3_targets=np.concatenate((Gen2_targets,Gen1_targets,Gen4_targets,Gen5_targets,Gen6_targets,Gen7_targets),axis=0)
Gens_not4_targets=np.concatenate((Gen2_targets,Gen3_targets,Gen1_targets,Gen5_targets,Gen6_targets,Gen7_targets),axis=0)
Gens_not5_targets=np.concatenate((Gen2_targets,Gen3_targets,Gen4_targets,Gen1_targets,Gen6_targets,Gen7_targets),axis=0)
Gens_not6_targets=np.concatenate((Gen2_targets,Gen3_targets,Gen4_targets,Gen5_targets,Gen1_targets,Gen7_targets),axis=0)
Gens_not7_targets=np.concatenate((Gen2_targets,Gen3_targets,Gen4_targets,Gen5_targets,Gen6_targets,Gen1_targets),axis=0)

In [None]:
#Iterate forwards
#use_feat=list(['attack', 'base_egg_steps', 'base_happiness', 'base_total',
#       'capture_rate', 'defense', 'experience_growth', 'height_m', 'hp',
#       'percentage_male', 'sp_attack', 'sp_defense', 'speed', 'weight_kg',
#       'is_legendary', 'egg_group_1', 'egg_group_2', 'color_id', 'shape_id',
#      'habitat_id', 'Ability1', 'Ability2', 'Ability3', 'Genderless'])
#use_feat=list(['attack',
#       'capture_rate', 'experience_growth',
#         'speed', 'weight_kg',
#         'shape_id','sp_attack',
#      'habitat_id', 'Ability1',  'Ability3'])
#for i in use_feat:
#    feats=['height_m', 'percentage_male','Genderless', 'hp','defense','base_total','base_egg_steps','is_legendary','egg_group_1','egg_group_2','sp_defense',
#           'base_happiness', 'Ability2','color_id']
#    feats.insert(0,i),
#    print("Adding")
#    print(i)
#    model_xgb.fit(Gens_not5_features[feats], Gens_not5_targets)
#    test_pred = model_xgb.predict(Gen5_features[feats])
#    test_accuracy = accuracy_score(Gen5_targets, test_pred)
#    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))

In [None]:
#Iterate backwards
#use_feat=list(['attack', 'base_egg_steps', 'base_happiness', 'base_total',
#       'capture_rate', 'defense', 'experience_growth', 'height_m', 'hp',
#       'percentage_male', 'sp_attack', 'sp_defense', 'speed', 'weight_kg',
#       'is_legendary', 'egg_group_1', 'egg_group_2', 'color_id', 'shape_id',
#      'habitat_id', 'Ability1', 'Ability2', 'Ability3', 'Genderless'])
#
#use_feat=list([ 'base_egg_steps',
#       'capture_rate', 
#         
#       'egg_group_1',
#      ])
#
#for i in use_feat:
#    feats=use_feat.copy()
#    print ("Remove")
#    print(i)
#    feats.remove(i)
#    model_xgb.fit(Gens_not7_features[feats], Gens_not7_targets)
#    test_pred = model_xgb.predict(Gen7_features[feats])
#    test_accuracy = accuracy_score(Gen7_targets, test_pred)
#    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))

# Generation 1
The best accuracy I could get for Generation 1 was 70.20 %, which I got by dropping the male percentage and speed from the full set of features. This results in a fairly complicated model. By comparison, I could get ~66% with only 5 features.

In [None]:
#Generation 1 model
use_feat=list(['attack', 'base_egg_steps', 'base_happiness', 'base_total',
       'capture_rate', 'defense', 'experience_growth', 'height_m', 'hp',
        'sp_attack', 'sp_defense',  'weight_kg',
       'is_legendary', 'egg_group_1', 'egg_group_2', 'color_id', 'shape_id',
      'habitat_id', 'Ability1', 'Ability2', 'Ability3', 'Genderless'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not1_features[use_feat], Gens_not1_targets)
Gen1_T1_pred = model_xgb.predict(Gen1_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen1_targets, Gen1_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen1_targets))
cm = metrics.confusion_matrix(Gen1_targets, Gen1_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

You can see that Grass and Bug are predicted with 100% accuracy, and Normal and Water are also fairly accurate.

The worst performing by far are Dragons and Ghost, which get misclassified as Water and Psychic. This is possibly not surprising, because there are only 3 of each in Generation 1.

Other types appear to be commonly misclassified as Normal, Water and Psychic.

Some of the mistakes sort of make sense, like the water/ice, fairy/normal and fighting / ground problems, since the types are sort of similar. Others like fire/electricity make no sense to me.

Something that will be interesting to see later is whether some of these misclassifications were actually the Pokemon's 2nd type.

# Generation 2
For Generation 2, I was able to get a test accuracy of 74% by reducing the features to 12:
base_happiness,  capture_rate, defense, experience_growth, weight_kg, egg_group_1, egg_group_2, color_id, Ability2, Ability3

In [None]:
#Generation 2 model
use_feat=list(['base_happiness',
       'capture_rate', 'defense', 'experience_growth',
        'weight_kg',
        'egg_group_1', 'egg_group_2', 'color_id',
        'Ability2', 'Ability3'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not2_features[use_feat], Gens_not2_targets)
Gen2_T1_pred = model_xgb.predict(Gen2_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen2_targets, Gen2_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen2_targets))
cm = metrics.confusion_matrix(Gen2_targets, Gen2_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

One clear improvement over Generation 1 is that for all types, there are at least some correct predictions.

In this case, there are perfect matches for poison, grass, water, ghost and bug, with good match for normal. So far, it looks like Water, Grass, Bug and Normal types are the easiest to predict.

Once again, problems with water / ice and fire / electric appear, along with a new rock / steel problem. This latter is understandable, since they are both heavy with high defenses, features which are both included in the model.

Some types like Psychic and Ground are a bit of a mess, but at least nothing is completely misclassified this time.

Despite there fairly good overall accuracy, there are a few worrying mistakes, like mistaking a fairy for dark, and fire for water.

# Generation 3

For Generation 3, I was able to get ~63% accuracy by using 14 features:
 base_egg_steps, base_total, capture_rate, experience_growth, height_m, hp, percentage_male, sp_attack, egg_group_1, egg_group_2, color_id, habitat_id,  Ability3, and Genderless.

In [None]:
#Generation 3 model
use_feat=list([ 'base_egg_steps',  'base_total','capture_rate', 'experience_growth',
               'height_m', 'hp','percentage_male', 'sp_attack','egg_group_1',
               'egg_group_2', 'color_id', 'habitat_id',  'Ability3', 'Genderless'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not3_features[use_feat], Gens_not3_targets)
Gen3_T1_pred = model_xgb.predict(Gen3_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen3_targets, Gen3_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen3_targets))
cm = metrics.confusion_matrix(Gen3_targets, Gen3_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As the prediction accuracy gets lower, the confusion matrix gets messier. As with the other cases, there are a few types with good or perfect accuracy, but we're getting an increasing number of types with predictions all over the place, or that fail to get any predictions correct at all.

As with Generation 2, Ghost type is predicted with 100% accuracy, in contrast to the complete failure in Generation 1. I suspect this might be because the Ghosts in Generation 1 were dual type, whilst those in Generations 2 and 3 are pure Ghost.

Unlike previous generations, there is no confusion between Fire and Electric this time, with both getting 100% accuracy. Although there are some incorrect predictions associated with these types for other Pokemon. Bug, Water and Fighting also do well, although they all have some misclassifications. Looking at the mistakes, like water or rock for a bug type, I do wonder if it's related to their second type again.

Unexpectedly, Grass and Normal both have issues this time, often getting confused for each other.

The Ice / Water  and Rock / Steel confusions still seem to persist, although in both cases the amount of correct classifications seems to be getting worse.

Dark and Ground predictions fail completely, with no correct predictions, and others like Dragon or poison are regularly mistaken for many different types.

# Generation 4

For Generation 4, I was able to build a model with ~67% accuracy, which used 11 features:
base_egg_steps, hp, sp_attack, sp_defense, is_legendary, egg_group_1, egg_group_2, shape_id, Ability1, Ability3,  and speed.

In [None]:
#Generation 4 model
use_feat=list(['base_egg_steps','hp','sp_attack','sp_defense','is_legendary'
               ,'egg_group_1', 'egg_group_2', 'shape_id','Ability1',  'Ability3','speed'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not4_features[use_feat], Gens_not4_targets)
Gen4_T1_pred = model_xgb.predict(Gen4_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen4_targets, Gen4_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen4_targets))
cm = metrics.confusion_matrix(Gen4_targets, Gen4_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As usual, the algorithm performs better on some types than others.

Bug, Fighting, Dragon, Fairy, and rock are all predicted with 100% accuracy, with Water, Pyschic and possibly Fire also performing well.

Every type has at least one correct prediction, although Electric and Poison appear to perform particularly badly. Others like Normal have reasonable accuracy, but show a range of misclassifications.

Unlike previous Generations, the Ice / Water and Rock / Steel confusions do not appear, suggesting something different about those Pokemon in this generation. The Electric / Fire problem is still present though.

Some of the misclassifications here are particularly worrying, because they get the complete opposite types, for example Fire mistaken for Water, or Grass and Ice for Fire.

# Generation 5
For Generation 5, I was able to build a model with ~56% accuracy, which used 10 features:

Genderless, hp, base_egg_steps, is_legendary, egg_group_1, egg_group_2, sp_defense, base_happiness, Ability2, and color_id.

Unfortunately, this appears to be the limits of the accuracy with the current features.

In [None]:
#Generation 5 model
#10 feat for 56.41
use_feat=list(['Genderless', 'hp','base_egg_steps','is_legendary','egg_group_1',
               'egg_group_2','sp_defense','base_happiness', 'Ability2','color_id'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not5_features[use_feat], Gens_not5_targets)
Gen5_T1_pred = model_xgb.predict(Gen5_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen5_targets, Gen5_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen5_targets))
cm = metrics.confusion_matrix(Gen5_targets, Gen5_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As the prediction accuracy edges ever closer to 50%, I'm expecting to see some serious problems crop up in the confusion matrix. This is especially clear here.

Whilst Bug and Grass are still consistent performers, along with good results for Normal, Fire, Fighting and Rock, most of the others are a complete mess.

Poison, Flying and Electric all get 0% accuracy, with flying always misclassified as Dragon. Most other types are all over the place, most surprising of which is Water, which usually does well. Dark and Ground for example are misclassified as nearly half the other types. 

One possible explanation for the poor performance for Generation 5 is that is was meant as somewhat of a reboot of the series. Unlike most other Pokemon games, it featured none of the old Pokemon in the main game, with the region initially entirely populated by new Pokemon.

# Generation 6

For Generation 6, I was barely able to get the accuracy above 50%, with my best result at ~53%, using 15 features:

 base_egg_steps, base_happiness, base_total, capture_rate, height_m, sp_defense, speed, weight_kg, egg_group_1, color_id, shape_id, habitat_id,  Ability2, Ability3, and Genderless.
 
By comparison,  a much simpler model, only using 6 features was able to get 51.39% accuracy. Namely:

capture_rate, egg_group_1,  color_id, shape_id, Ability2, Ability3

Given that all 6 are contained in the more complicated model, I question the usefulness of adding 9 extra features to gain about 1.5% accuracy.

In [None]:
#Generation 6 model
#52.78 at the moment.
use_feat=list([ 'base_egg_steps', 'base_happiness', 'base_total',
       'capture_rate',   'height_m', 
        'sp_defense', 'speed', 'weight_kg',
        'egg_group_1',  'color_id', 'shape_id',
      'habitat_id',  'Ability2', 'Ability3', 'Genderless'])
#51.39
#use_feat=list(['capture_rate', 'egg_group_1',  'color_id', 'shape_id','Ability2', 'Ability3'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not6_features[use_feat], Gens_not6_targets)
Gen6_T1_pred = model_xgb.predict(Gen6_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen6_targets, Gen6_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen6_targets))
cm = metrics.confusion_matrix(Gen6_targets, Gen6_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

Even though the accuracy is slightly worse than Generation 5, the confusion matrix looks a bit less messier. The algorithm is apparently a bit more consistent about what it gets wrong.

Surprisingly, 6 types get 100% accuracy, Bug, Dragon, Ice, Grass, Water and Electric.

However, 5 also get 0% accuracy, Steel, Dark, Fairy, Poison and Flying.

Others like Psychic, Fire, Normal and Ghost are more of a mixed bag.

Many of the incorrect predictons appear to be due to assigning Water, Grass or Normal to other types, possible suggesting the model is overfitting to those types in the training data.

# Generation 7
For Generation 7, I found that a simple model just using 2 features, egg_group_1 and experience_growth, gave the best accuracy, of ~59%. Any additional features only reduced this accuracy.

In [None]:
#Generation 7 model
#58.75 best so far
use_feat=list(['egg_group_1','experience_growth'])
sns.set(font_scale=0.8)
model_xgb.fit(Gens_not7_features[use_feat], Gens_not7_targets)
Gen7_T1_pred = model_xgb.predict(Gen7_features[use_feat])

# evaluate predictions
test_accuracy = accuracy_score(Gen7_targets, Gen7_T1_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen7_targets))
cm = metrics.confusion_matrix(Gen7_targets, Gen7_T1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

Since this is a fairly barebones model, I would expect that it makes a fairly narrow range of predictions, and sacrifices accuracy on certain types, to improve it on others. The confusion matrix shows that this is mostly true.

7 whole types are excluded from the model, with no predictions made for any of Dark, Ground, Fairy, Electric, Fighting, Poison or Steel. I assume that Egg group and Experience growth are simply too narrows to identify any of these. Instead, the model has mostly focused on 3 or 4 types, Normal, Psychic, Water and Rock. I'm not surprised that Normal and Water are picked out like this, since Water for example has very obvious egg groups. How the model got to Psychic and Rock seems less clear, but I think lots of Psychic Pokemon tend to fall into the 'Humanoid' egg group. 

Possibly due to the rarity of the type, the model gets 100% accuracy for Dragon, along with good accuracy for Normal, Water and Pyschic. The latter ones are to be expected, given the over-abundance of predictions for those types. At least some of them are bound to be correct!

For all the remaining types, the narrow model means it never made incorrect assignments of types like Grass, Fire and Bug, even if it wasn't always able to correctly identify all examples of that type in the first place. 

# Type 1 Summary

With models developed for all 7 generations, are there any general trends amongst the results? Are certain types harder to predict than others? Are certain Generations more complicated than others? Do certain features appear a lot in the models, or barely at all?

If we just rank the Generations based on overall accuracy, it's clear that earlier Generations are easier to predict, with the best result for Generation 2 (74%). Everything before Generation 5 manages over 60% accuracy (1: 70.20%, 2: 74%, 3: 63%, 4 67.29%), whilst those after fall below (5: 56.41 %, 6: 52.78 %, 7: 58.75%), with Generation 6 being the worst. 

The good performance on Generations 1 & 2 is likely because the series was just getting started, and some of the designs were slightly less outlandish from now. They also established general ideas for future generations (starters, early normal / flying Pokemon, Legendaries), which get repeated throughout. As mentioned earlier, it is likely that Generations 5 and onwards are harder to predict, due to the soft-reboot of Generation 5, and the changes to the design team (new members etc).

4 types stand out as generally having high accuracy across all the models, although they do sometimes dip in certain Generations. These are Normal, Water, Bug and Grass. For the latter 3, this is likely because they all have their own Egg groups (or several for Water), which would help a lot with the classifications. I assume Normal is a baseline type, which appears fairly average compared to everything else.

A few models do well on other types, like Ghost, Dragon, Fire and Electric, although other times the former two are missed completely, or the latter two are mistaken for each other. Dragon and Ghost are rare enough that sometimes only single families exist per Generation, so it's likely the models would either get everything right, or everything wrong. I'm still not sure why Fire and Electric get confused for each other so often.

It's a bit more difficult to say which types are hard to predict, because they usually have a mixed bag across generations. Poison seems to stand out, with 0% accuracy for many generations. I assume this is due to the relative rarity of Type 1 poison Pokemon, with it usually appearing instead as Type 2. Ice also seems particularly difficult, often misclassified as Water.

The Ice / Water problem, and others like Rock / Steel are understandable given their similarities. Many Pokemon are mixed Water / Ice, and likely share similar traits. Rock and Steel both tend to be heavy, with strong defenses, with rock sometimes becoming Steel as the Pokemon evolves. As mentioned several times earlier, there are also cases of complete opposite misclassifications, like Water / Fire or Fire / Ice, so I'd be interested to dig into why those happened, because I'd have thought they were far enough separated as to not happen.

Egg groups seems to be the most powerful predictors for type, with Egg group 1 appearing in all the models, and Egg group 2 in most of them. As mentioned several times, this is not surprising, because several types basically have Egg groups assigned to them (Water, Bug, Grass) . Problems can arise when Pokemon the Pokemon has this as their second Type, or when they belong to the Egg group, but don't have the type at all. An example of the latter would be Inkay, who is Dark / Psychic, but belongs to 2 Water groups.

Abilities are also fairly common features, with Ability 3, the  Hidden ability, appearing a surprising amount. This is once again, like due to the fact that some types have exclusive abilities, which would be an even stronger effect for the rare hidden abilities. Confusion would come from others, like Pressure, which is fairly generic across types.

Colour and Shape also appear a lot, which intuitively make sense. Fire Pokemon tend to be Red, Water tend to be Blue, and Flying Pokemon tend to be Bird shaped.

Before doing this, I'd expected the actual main stats, HP, Attack, Special Attack, Defense, Special Defense and Speed to be much more useful at making predictions than they actually were. Several of them appear across the models, but in only about half of them.

Other less intuitive features like base happiness, capture rates, experience growth, or egg steps appear in a few models, which surprised me, because I didn't know too much about the range of some of them before this project. These can often be indicators of rare types and Pokemon, whose values for these features appear outside the norm.

Features related to Gender and Legendary status are relatively rare. This is likely because Legendaries span most types now, and that when there are gender ratio differences, it's hard to confine them to certain types.

Physical attributes like height and weight rarely appear in the models, likely because these are probably afterthoughts to the design process.

# Predictions of Type 2

With my preliminary analysis for Type 1 now complete, I turned my attention to Type 2, and repeated the process. Before doing any work, I expected that Type 2 was slightly harder to predict, because some secondary types can be slightly more esoteric, and I had to introduce a new 'None' category. It also turned out that the None category introduces some serious imbalance issues to the dataset, and drastically shift the baseline predictions.

# Full Pokedex

In [None]:
use_feat=list(['speed', 'weight_kg','Ability1','Ability3']) 
#-> 100%
#use_feat = list(['Ability1','weight_kg','defense']) 
#-> 99.88%

#use_feat = list(['weight_kg','base_total','defense']) 
#-> 99.88%


In [None]:
sns.set(font_scale=0.8)
plt.figure(figsize=(40,40))
model_xgb.fit(features[use_feat], targets2)
y_pred = model_xgb.predict(features[use_feat])
# evaluate predictions
accuracy = accuracy_score(targets2, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
xgb.plot_importance(model_xgb)

Fitting to the full Pokedex was relatively easy, with 100% accuracy possible with just 4 features. The 3 feature models I developed for Type 1 only perform marginally worse, at 99.88% accuracy, meaning only 1 Pokemon is misclassified. At first glance, this suggests that Type 2 is marginally more difficult to predict than Type 1.

# Predicting Across Generations

In [None]:
Gen1_targets2=targets2[0:151]
Gen2_targets2=targets2[151:251]
Gen3_targets2=targets2[251:386]
Gen4_targets2=targets2[386:493]
Gen5_targets2=targets2[493:649]
Gen6_targets2=targets2[649:721]
Gen7_targets2=targets2[721:801]
Gen1_targets2=np.ravel(Gen1_targets2)
Gen2_targets2=np.ravel(Gen2_targets2)
Gen3_targets2=np.ravel(Gen3_targets2)
Gen4_targets2=np.ravel(Gen4_targets2)
Gen5_targets2=np.ravel(Gen5_targets2)
Gen6_targets2=np.ravel(Gen6_targets2)
Gen7_targets2=np.ravel(Gen7_targets2)
Gens_not1_targets2=np.concatenate((Gen2_targets2,Gen3_targets2,Gen4_targets2,Gen5_targets2,Gen6_targets2,Gen7_targets2),axis=0)
Gens_not2_targets2=np.concatenate((Gen1_targets2,Gen3_targets2,Gen4_targets2,Gen5_targets2,Gen6_targets2,Gen7_targets2),axis=0)
Gens_not3_targets2=np.concatenate((Gen2_targets2,Gen1_targets2,Gen4_targets2,Gen5_targets2,Gen6_targets2,Gen7_targets2),axis=0)
Gens_not4_targets2=np.concatenate((Gen2_targets2,Gen3_targets2,Gen1_targets2,Gen5_targets2,Gen6_targets2,Gen7_targets2),axis=0)
Gens_not5_targets2=np.concatenate((Gen2_targets2,Gen3_targets2,Gen4_targets2,Gen1_targets2,Gen6_targets2,Gen7_targets2),axis=0)
Gens_not6_targets2=np.concatenate((Gen2_targets2,Gen3_targets2,Gen4_targets2,Gen5_targets2,Gen1_targets2,Gen7_targets2),axis=0)
Gens_not7_targets2=np.concatenate((Gen2_targets2,Gen3_targets2,Gen4_targets2,Gen5_targets2,Gen6_targets2,Gen1_targets2),axis=0)

When I first started looking at Type 2, I hadn't realised quite how many Pokemon were lacking a second type. This leads to a fairly imbalanced type distribution compared to Type 1, and means that I need to use a different baseline to compare prediction against. Namely, what would the accuracy be if I simply guessed 'none' for every single Pokemon?

In the full Pokedex, none appears nearly 400 times as a 2nd type, flying nearly 100 times, then the rest usually about 30 or less, with Normal being the rarest. That's quite a serious imbalance for none and flying! One possible solution to this problem is to add weightings to the model, but I did not find this particularly helped. Tests with Generation 2 did not improve the overall accuracy, but did take less features to reach that accuracy. As such, I have modelled without weights for now.

In [None]:
#Count the number of times each type appears
from collections import Counter
Counter(targets2)

In [None]:
print ("Full Pokedex none percentage")
print(Counter(targets2)['none']/len(targets2)*100 )
print ("Generation 1 none percentage")
print(Counter(Gen1_targets2)['none']/len(Gen1_targets2)*100)
print ("Generation 2 none percentage")
print(Counter(Gen2_targets2)['none']/len(Gen2_targets2)*100)
print ("Generation 3 none percentage")
print(Counter(Gen3_targets2)['none']/len(Gen3_targets2)*100)
print ("Generation 4 none percentage")
print(Counter(Gen4_targets2)['none']/len(Gen4_targets2)*100)
print ("Generation 5 none percentage")
print(Counter(Gen5_targets2)['none']/len(Gen5_targets2)*100)
print ("Generation 6 none percentage")
print(Counter(Gen6_targets2)['none']/len(Gen6_targets2)*100)
print ("Generation 7 none percentage")
print(Counter(Gen7_targets2)['none']/len(Gen7_targets2)*100)

Most generations have nearly a 50% none rate, with the later ones dropping down slightly, to ~36% by Generation 7. This means that in order for my model to make any meaningful improvements, it really has to beat these baselines, otherwise a model that simply guessed 'None' every time would beat it. 

An alternative might be to look at how well the model does on predicting the types which do exist, since this is likely a separate problem from predicting the nones. In fact, this turned out to be a good method for feature selection, because improving the type accuracy tended to improve the full model accuracy in the long run, even with a few dips to the none accuracy here and there.

To start with, I simply made predictions using all the features, which managed accuracies of 55.63 % / 60.00 % / 47.41 % / 52.34 % / 55.77 % / 34.72 % / 43.75 % for Generations 1-7. Most made no meaningful improvements over the 'all none' model, or actually made a worse model! At the very least, we need to have models better than just guessing None all the time.

Initial testing showed that it's relatively easy to get correct predictions for Flying Pokemon, by using for example shape_id, since most Flying Pokemon have it as their secondary type. Adding more features often doesn't improve the overall accuracy though. This was because new correct type guesses were often counter-balanced with incorrect guesses which should have been none.

As before, I tried to find the best feature selections, but the process is non-exhaustive, so I may have missed some minor improvements. Additionally, just for time reasons, I limited the extent of some of my searches.

In [None]:
#Getting weights for generations other than 2.
weights = np.zeros(len(Gens_not2_targets2))
for i in range(len(Gens_not2_targets2)):
    weights[i]=Counter(Gens_not2_targets2)['none']/Counter(Gens_not2_targets2)[Gens_not2_targets2[i]]
#weights

In [None]:
#none_list=[i for i, j in enumerate(Gen7_targets2) if j == 'none']
#type_list=[i for i, j in enumerate(Gen7_targets2) if j != 'none']
#Iterate forwards
#use_feat=list(['attack', 'base_egg_steps', 'base_happiness', 'base_total',
#       'capture_rate', 'defense', 'experience_growth', 'height_m', 'hp',
#       'percentage_male', 'sp_attack', 'sp_defense', 'speed', 'weight_kg',
#       'is_legendary', 'egg_group_1', 'egg_group_2', 'color_id', 'shape_id',
#      'habitat_id', 'Ability1', 'Ability2', 'Ability3', 'Genderless'])
#use_feat=list(['attack',  'base_total',
#        'defense', 'experience_growth', 'height_m', 'hp',
#        'sp_attack', 'sp_defense', 'speed',
#        'color_id',
#     'Ability1'])
#for i in use_feat:
#    feats=['weight_kg','capture_rate','base_happiness', 'Genderless','base_egg_steps','Ability3', 'shape_id','egg_group_2', 'egg_group_1',
#           'habitat_id','percentage_male', 'Ability2' ,'is_legendary']
#    feats.insert(13,i)
#    print("Adding")
#    print(i)
#    model_xgb.fit(Gens_not7_features[feats], Gens_not7_targets2)
#    test_pred = model_xgb.predict(Gen7_features[feats])
#    test_accuracy = accuracy_score(Gen7_targets2, test_pred)
#    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
#    test_type_accuracy = accuracy_score(Gen7_targets2[type_list], test_pred[type_list])
#    print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
#    test_none_accuracy = accuracy_score(Gen7_targets2[none_list], test_pred[none_list])
#    print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))

In [None]:
#Iterate backwards
#use_feat=list(['attack', 'base_egg_steps', 'base_happiness', 'base_total',
#       'capture_rate', 'defense', 'experience_growth', 'height_m', 'hp',
#       'percentage_male', 'sp_attack', 'sp_defense', 'speed', 'weight_kg',
#       'is_legendary', 'egg_group_1', 'egg_group_2', 'color_id', 'shape_id',
#      'habitat_id', 'Ability1', 'Ability2', 'Ability3', 'Genderless'])
#
#use_feat=list(['shape_id','sp_attack','habitat_id', 'egg_group_2'])
#
#for i in use_feat:
#    feats=use_feat.copy()
#    print ("Removing")
#    print(i)
#    feats.remove(i)
#    model_xgb.fit(Gens_not6_features[feats], Gens_not6_targets2)
#    test_pred = model_xgb.predict(Gen6_features[feats])
#    test_accuracy = accuracy_score(Gen6_targets2, test_pred)
#    print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
#    test_type_accuracy = accuracy_score(Gen6_targets2[type_list], test_pred[type_list])
#    print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
#    test_none_accuracy = accuracy_score(Gen6_targets2[none_list], test_pred[none_list])
#    print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))

# Generation 1

I was able to reach ~69% total accuracy, and ~43% accuracy on Pokemon with a 2nd type, using 9 features:

Ability3, shape_id, egg_group_2, Ability1, is_legendary, Ability2, experience_growth, base_egg_steps, and percentage_male

It appears that features related to Abilities, and breeding in some form are important.

In [None]:
none_list=[i for i, j in enumerate(Gen1_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen1_targets2) if j != 'none']
sns.set(font_scale=0.8)
#43.28/68.87
use_feat=list(['Ability3', 'shape_id','egg_group_2','Ability1','is_legendary','Ability2', 'experience_growth', 'base_egg_steps','percentage_male'])
#41.79 / 68.21 (save for later)
#use_feat=list(['egg_group_2','Ability3', 'shape_id','Ability1','Ability2', 'percentage_male','experience_growth'])
model_xgb.fit(Gens_not1_features[use_feat], Gens_not1_targets2)
train_pred=model_xgb.predict(Gens_not1_features[use_feat])
Gen1_T2_pred = model_xgb.predict(Gen1_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not1_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen1_targets2, Gen1_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen1_targets2[type_list], Gen1_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen1_targets2[none_list], Gen1_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen1_targets2))
cm = metrics.confusion_matrix(Gen1_targets2, Gen1_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As expected, the current model vastly overestimates the number of Pokemon which lack a 2nd type, assigning none for many types which should exist. Notable examples are Poison and Fighting, which are practically never identified as any type. In a few cases, it incorrectly assigns other types where there should be none, such as steel, grass or fighting.

Steel, Rock and Fairy all have 100% accuracy, with reasonable performance for Flying, middling for Psychic, and bad for Ground.

Some types, like Ice, Grass, Fighting, Water, and Poison are never correctly predicted by the model.

2 misclassifications particularly stand out to me, which I think are explained by their type 1 pairings. Water is always misclassified as Rock, which might have something to do with the fact the Fossil Pokemon are all Rock / Water, so the algorithm is getting the order wrong. A closer look at the predictions shows that this is the case. Since Poison is often paired with Grass as the 2nd type, it doesn't surprise me that grass is always misclassified as poison here.

Something interesting to note when looking closer at the predictions is that the Charmander family is predicted to all be part Fighting, likely due to the abundance of Fire / Fighting starters.

In [None]:
Simpler_XGB_predictions_df=pd.DataFrame()
Simpler_XGB_predictions_df["Type1"]=0
Simpler_XGB_predictions_df["Type1"]=Gen1_T1_pred
Simpler_XGB_predictions_df["Type2"]=0
Simpler_XGB_predictions_df["Type2"]=Gen1_T2_pred
Simpler_XGB_predictions_df.to_csv("Simpler_XGB_Predictions.csv",index=False)

# Generation 2
I found that a 9 feature model was able to get 75% overall accuracy, and 53% accuracy on those Pokemon with 2nd types. These features included:

shape_id, Ability1, Ability2, experience_growth, Ability3, percentage_male, egg_group_2, base_happiness and color_id.

Interestingly, 7 of these are shared with the Generation 1 features, suggesting these models are slightly more transferable than those for Type 1.

I found that by adding weightings to the model, I was able to get the same accuracy for only 7 features, dropping base_happiness and egg_group 2. However, I did not find models that improved the accuracy yet.

In [None]:
none_list=[i for i, j in enumerate(Gen2_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen2_targets2) if j != 'none']
sns.set(font_scale=0.8)
use_feat=list(['shape_id','Ability1','Ability2','experience_growth','Ability3','percentage_male','egg_group_2','base_happiness','color_id'])
#75% total accuracy, with 53% on types

# Gen 2 with weights
#feats=['shape_id','Ability1','Ability2','Ability3','percentage_male','color_id','experience_growth']
#Also 75%
model_xgb.fit(Gens_not2_features[use_feat], Gens_not2_targets2)
train_pred=model_xgb.predict(Gens_not2_features[use_feat])
Gen2_T2_pred = model_xgb.predict(Gen2_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not2_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen2_targets2, Gen2_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen2_targets2[type_list], Gen2_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen2_targets2[none_list], Gen2_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen2_targets2))
cm = metrics.confusion_matrix(Gen2_targets2, Gen2_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As before, None is massively over-predicted, with many types purely predicted as None. This includes Poison, Fire, Fighting, Dragon, Dark, and Electric. However, when predictions are made for Pokemon with 2nd types, they are actually fairly accurate.

Fairy, Ice and Grass are predicted with 100% accuracy, with good performance for flying, and reasonable for Psychic and Ground. In most cases, the incorrect predictions for these types are simply due to being assigned None. This just leaves Rock and Steel, which get misclassified as Flying for some reason. As far as I can tell, it thinks Scizor and Shuckle have their 2nd type as Flying, possibly due to this being a common pairing with Bug type.

Thankfully, this time the model doesn't think Cyndaquil and co are part fighting!

# Generation 3

The best model I've found so far included just 3 features, and got an overall accuracy of ~59%, with ~29% Type accuracy. More complicated models, with up to 12 features were only able to reproduce these values. I did not look any further. The 3 features are:

experience_growth, shape_id, and Ability2.

These are all common to the previous 2 models.

Since this is a very simple model, I expect it will do well on 2 or 3 types, and basically ignore the rest.

In [None]:
none_list=[i for i, j in enumerate(Gen3_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen3_targets2) if j != 'none']
sns.set(font_scale=0.8)
#58.52/28.79
use_feat=list(['experience_growth','shape_id','Ability2'])
model_xgb.fit(Gens_not3_features[use_feat], Gens_not3_targets2)
train_pred=model_xgb.predict(Gens_not3_features[use_feat])
Gen3_T2_pred = model_xgb.predict(Gen3_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not3_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen3_targets2, Gen3_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen3_targets2[type_list], Gen3_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen3_targets2[none_list], Gen3_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen3_targets2))
cm = metrics.confusion_matrix(Gen3_targets2, Gen3_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As expected, this model ignores most of the types, with 8 types completely empty, and vastly overpredicts a lack of a second type. However, when it does make predictions about other types, they tend to be good.

Fairy and Rock have a 100% accuracy, whilst Flying, Bug and Poison are decent.

The model also predicts too many Flying types, incorrectly assigning types like Dragon and poison.

This is generally a pretty poor model, but I couldn't find anything better.

# Generation 4

My best model so far for Generation 4 used 11 features, the majority of which are common to other models.

defense, Ability1, Ability2, is_legendary, Genderless, base_happiness, percentage_male, shape_id,  base_egg_steps, Ability3 and egg_group_1.

This has an overall accuracy of ~64% and a Type accuracy of ~37%.

In [None]:
none_list=[i for i, j in enumerate(Gen4_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen4_targets2) if j != 'none']
sns.set(font_scale=0.8)
# Full 63.55 / Type 37.04
use_feat=list(['defense','Ability1','Ability2','is_legendary', 'Genderless','base_happiness','percentage_male',
           'shape_id', 'base_egg_steps','Ability3','egg_group_1'])

model_xgb.fit(Gens_not4_features[use_feat], Gens_not4_targets2)
train_pred=model_xgb.predict(Gens_not4_features[use_feat])
Gen4_T2_pred = model_xgb.predict(Gen4_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not4_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen4_targets2, Gen4_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen4_targets2[type_list], Gen4_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen4_targets2[none_list], Gen4_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen4_targets2))
cm = metrics.confusion_matrix(Gen4_targets2, Gen4_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

This model performs well on a few types, but leaves most of the types blank, with many wrongly classified as None. In nearly all cases that it predicts a type, it gets the prediction correct, with the one exception of misclassifying dark as flying. As with the other models, the main problem is that it simply defaults to None in too many cases.

Poison, Rock, Fairy, and Psychic all perform well, with 100% accuracy for each. Flying is also predicted correctly with reasonable accuracy. Steel, Ground and Grass are less successful, but as mentioned above, this is due to the fact that the model can't gain enough information to assign any label other than None to most of the Pokemon.

The bug entry can be ignored, because I think there's a divide by zero issue going on in the code somewhere.

# Generation 5

The best model I found for Generation 5 included 14 features, and got an overall accuracy of 61.54%, and a type accuracy of 30.67%. The features included:

sp_attack, capture_rate, percentage_male, egg_group_1, is_legendary, shape_id, Ability1, base_happiness, hp, egg_group_2, sp_defense, Ability3, speed, and base_total


It's also notable that 60.26% accuracy could be achieved with only a 2 feature model (shape_id and experience_growth), and 29.33 % type accuracy with a 3 feature model (shape_id, Ability1, hp), so it's questionable that the small gains are really worth the huge added complexity.

In [None]:
none_list=[i for i, j in enumerate(Gen5_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen5_targets2) if j != 'none']
sns.set(font_scale=0.8)
#2 feat model
# 21.33 / 60.26 %
#use_feat=list(['shape_id','experience_growth'])
# 3 feat model
#29.33/58.33
#use_feat=list(['shape_id','Ability1','hp'])
#14 feat model
#30.67/61.54
use_feat=list(['sp_attack', 'capture_rate', 'percentage_male','egg_group_1','is_legendary','shape_id', 'Ability1', 'base_happiness','hp','egg_group_2',
           'sp_defense', 'Ability3', 'speed', 'base_total'])

model_xgb.fit(Gens_not5_features[use_feat], Gens_not5_targets2)
train_pred=model_xgb.predict(Gens_not5_features[use_feat])
Gen5_T2_pred = model_xgb.predict(Gen5_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not5_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen5_targets2, Gen5_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen5_targets2[type_list], Gen5_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen5_targets2[none_list], Gen5_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen5_targets2))
cm = metrics.confusion_matrix(Gen5_targets2,Gen5_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As always the None predictions dominate this model, with huge numbers of types misidentified as None.

Poison, Flying and Pyschic all have good accuracy, with Psychic the only 100% in this set. It's also worth noting that Poison and Flying are over-predicted in this model, with several other Pokemon wrongly assigned these types.

Several other types, such as Rock, Electric, Fighting and Steel have a few correct guesses. The rest are either guessed wrong, or simply assigned None. A notable mistake in this model is all the Ground Pokemon predicted as Grass.

# Generation 6

The best model I was able to find achieved an overall accuracy of 51.39 %, and a Type accuracy of 21.95 %, using 10 features:

experience_growth, base_happiness, Ability2, egg_group_1, shape_id, sp_attack, habitat_id, egg_group_2, is_legendary, and height_m.

It's clear that this is still a fairly bad model, since it barely makes over 50% accuracy, and those related to actual types can't even get it right 1/4th of the time.

It' s also questionable if this complexity is worth it, because a model with just the shape_id was able to get a 50% overall accuracy.

In [None]:
none_list=[i for i, j in enumerate(Gen6_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen6_targets2) if j != 'none']
sns.set(font_scale=0.8)
# Best
#21.95/51.39
use_feat=list(['experience_growth','base_happiness','Ability2', 'egg_group_1','shape_id','sp_attack','habitat_id', 'egg_group_2','is_legendary'
          , 'height_m'])
model_xgb.fit(Gens_not6_features[use_feat], Gens_not6_targets2)
train_pred=model_xgb.predict(Gens_not6_features[use_feat])
Gen6_T2_pred = model_xgb.predict(Gen6_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not6_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen6_targets2, Gen6_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen6_targets2[type_list], Gen6_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen6_targets2[none_list], Gen6_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen6_targets2))
cm = metrics.confusion_matrix(Gen6_targets2, Gen6_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

As always this model has too many predictions of None, and too few predictions for most other types. It appears to perform well with relatively rare second types, like Fairy, Dragon and Ghost, in addition to 100% accuracy for Fighting. As in previous models, it can get predictions of these types correct, but still assigns some of them to None.

Most other types lack any predictions at all.

There also seems to be some confusion over Dragon and Flying types, with both making incorrect predictions of the other type.

# Generation 7

The best model I found only achieved a 50% overall accuracy, and 29.41 % Type accuracy, using 6 features:

base_egg_steps, Ability3, shape_id, egg_group_2, egg_group_1,and habitat_id.

More complicated models just tended to get a lower accuracy for both measure, up to 14 features, where I stopped looking.

In [None]:
none_list=[i for i, j in enumerate(Gen7_targets2) if j == 'none']
type_list=[i for i, j in enumerate(Gen7_targets2) if j != 'none']
sns.set(font_scale=0.8)
#50%/29.41 %
use_feat=list(['base_egg_steps','Ability3', 'shape_id','egg_group_2', 'egg_group_1','habitat_id'])

model_xgb.fit(Gens_not7_features[use_feat], Gens_not7_targets2)
train_pred=model_xgb.predict(Gens_not7_features[use_feat])
Gen7_T2_pred = model_xgb.predict(Gen7_features[use_feat])
# evaluate predictions
train_accuracy = accuracy_score(Gens_not7_targets2, train_pred)
print("Train Accuracy: %.2f%%" % (train_accuracy * 100.0))
test_accuracy = accuracy_score(Gen7_targets2, Gen7_T2_pred)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
test_type_accuracy = accuracy_score(Gen7_targets2[type_list], Gen7_T2_pred[type_list])
print("Test Type Accuracy: %.2f%%" % (test_type_accuracy * 100.0))
test_none_accuracy = accuracy_score(Gen7_targets2[none_list], Gen7_T2_pred[none_list])
print("Test None Accuracy: %.2f%%" % (test_none_accuracy * 100.0))
xgb.plot_importance(model_xgb)
# Output a plot of the confusion matrix.
labels =list(set(Gen7_targets2))
cm = metrics.confusion_matrix(Gen7_targets2, Gen7_T2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

The model appears to be predicting Ground quite a lot, because it not only gets all the Ground Pokemon correct, but also incorrectly assigns all Fire and Bug to Ground as well.

Outside of this Fairy, Flying and Dragon all fare decently.

Everything else is pretty much empty, or completely wrong. Most notably for Ice and Poison, which are always assigned to other single types.

Also a few types are incorrectly assigned to Pokemon that don't have them.

# Type 2 Overall

The overall accuracy of the Type 2 models are not significantly different from those for Type 1, but the large number of None entries makes this a far less impressive achievement. The Generations can be ordered: 2 (75%) > 1 (69%) > 4 (64%) > 5 (60%) > 3 (59%) > 6 (51%) > 7 (50%). It's also notable that Generation 6 has the worst predictions of those Pokemon with types, at 22%

None aside, most types are badly predicted by these models. Some notable exceptions are Rock and Fairy, which usually show high accuracies, with Pyschic and Flying also performing well in general. Steel and Poison are often correctly predicted, but not as frequently as they appear in that Generation.

Most of the good features also appeared for Type 1, and many are shared between the 7 models. Shape, Abilities, and Egg groups are in nearly all the models, with Experience Growth, percentage male and base happiness also appearing in several. Direct stats, like HP, only appear rarely, and some like weight, not at all.

# Type 1 & Type 2 combined

As a final example, I combined my predictions for all 7 Generations back together to create a new Machine Learning based Pokedex.

As well as checking the overall accuracy for each type, I also explored whether a type was guessed correctly, but in the wrong slot.

In [None]:
#Combine Gens 1-7 predictions
Type1_pred=np.concatenate((Gen1_T1_pred,Gen2_T1_pred,Gen3_T1_pred,Gen4_T1_pred,Gen5_T1_pred,Gen6_T1_pred,Gen7_T1_pred),axis=0)
Type2_pred=np.concatenate((Gen1_T2_pred,Gen2_T2_pred,Gen3_T2_pred,Gen4_T2_pred,Gen5_T2_pred,Gen6_T2_pred,Gen7_T2_pred),axis=0)

In [None]:
#Get accuracies:
print("Overall Accuracy")
Type1_accuracy = accuracy_score(targets, Type1_pred)
print("Type 1 Accuracy: %.2f%%" % (Type1_accuracy * 100.0))
Type2_accuracy = accuracy_score(targets2, Type2_pred)
print("Type 2 Accuracy: %.2f%%" % (Type2_accuracy * 100.0))

In [None]:
labels =list(set(targets))
cm = metrics.confusion_matrix(targets, Type1_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Type 1 Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

For Type 1, nearly every single type has the greatest density of predictions for the correct type, with a few exceptions.

Water, Bug and Grass are the types predicted with the highest accuracy, whilst Flying is the worst, with a 0% accuracy. The success of the former 3 is likely related to the predictive power of the Egg groups. The poor performance for Flying is likely because it usually exists as a sub-type.

The only other type with a significant proportion of wrong predictions, is Fairy, which is predicted at Grass more often than Fairy. At first glance, I'm not sure why this is, but guess the two types might share some similarities?

Fire and Electric are commonly confused for each other, which I still don't quite understand. Similarly, Ghost and Pyschic are often confused for each other, probably due to both being special focused types that often mirror each other. The Ice  / Water and Rock / Steel confusions are also visible, likely due to the similarities between the types.

In [None]:
labels =list(set(targets2))
cm = metrics.confusion_matrix(targets2, Type2_pred,labels)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
sns.set(font_scale=4)
plt.figure(figsize=(20,20))
ax = sns.heatmap(cm_normalized, cmap="bone_r")
ax.set_aspect(1)
ax.set_xticklabels(labels)
ax.set_yticklabels(labels)
plt.xticks(rotation=90)
plt.yticks(rotation=0)
plt.title("Type 2 Confusion matrix")
plt.ylabel("True label")
plt.xlabel("Predicted label")
plt.show()

None clearly dominates the Type 2 predictions, so it's hard to say too much about these results. The fact flying is the second most abundant sub-type is also clear.

Dark, Bug, Fire, Water and Normal are the worst performing sub-types, with no correct predictions for any of them.

Flying, Fairy and Rock are mostly predicted correctly, with the wrong predictions often as None.

The remaining types all have some correct predictions, but are dominated by incorrect assignments to None.

I noticed for Generation 1 earlier, that there's the possibility some of the types have been assigned correctly, but in the wrong order, so I'm curious how often this has happened.

In [None]:
#Any mismatch?
print("Right type, but wrong order?")
Type1_accuracy = accuracy_score(targets2, Type1_pred)
print("Type 1 predictions which match Type 2: %.2f%%" % (Type1_accuracy * 100.0))
Type2_accuracy = accuracy_score(targets, Type2_pred)
print("Type 2 predictions which match Type 1: %.2f%%" % (Type2_accuracy * 100.0))

It's a fairly rare occurance for both Types 1 and 2, at < 5%, but it does happen occasionally.

To look at the Machine Learning Pokedex is more detail, I want to know other details like, how many Pokemon it got completely correct, how many half correct, and how many it got correct, but in the wrong order.

In [None]:
ML_dex_df=Name_df.copy()
ML_dex_df["Type 1"]=targets
ML_dex_df["Type 2"]=targets2
ML_dex_df["Predicted Type 1"]=Type1_pred
ML_dex_df["Predicted Type 2"]=Type2_pred

In [None]:
ML_dex_df["Type 1 Correct"] = np.where(ML_dex_df["Type 1"]==ML_dex_df["Predicted Type 1"], 1, 0)
ML_dex_df["Type 2 Correct"] = np.where(ML_dex_df["Type 2"]==ML_dex_df["Predicted Type 2"], 1, 0)
ML_dex_df["Type 1 Mismatch"]= np.where(ML_dex_df["Type 1"]==ML_dex_df["Predicted Type 2"], 1, 0)
ML_dex_df["Type 2 Mismatch"]= np.where(ML_dex_df["Type 2"]==ML_dex_df["Predicted Type 1"], 1, 0)
ML_dex_df["Both Correct"]   = np.where((ML_dex_df["Type 1 Correct"]==1) & (ML_dex_df["Type 2 Correct"]==1), 1, 0)
ML_dex_df["Wrong Order"]   = np.where((ML_dex_df["Type 1 Mismatch"]==1) & (ML_dex_df["Type 2 Mismatch"]==1), 1,0)
ML_dex_df["All Wrong"]   = np.where((ML_dex_df["Type 1 Mismatch"]==0) & (ML_dex_df["Type 2 Mismatch"]==0)
                                    &(ML_dex_df["Type 1 Correct"]==0) & (ML_dex_df["Type 2 Correct"]==0), 1,0)

In [None]:
print("The number of Pokemon predicted correctly is:")
print(ML_dex_df["Both Correct"].sum())
print("The number of Pokemon predicted correctly, but in the wrong order is:")
print(ML_dex_df["Wrong Order"].sum())
print("The number of Pokemon predicted completely incorrectly is:")
print(ML_dex_df["All Wrong"].sum())
print("The number of Pokemon predicted half right is therefore:")
print("379")

I'm suprised the model only got 87 Pokemon completely wrong, given the accuracy for each type was only ~60%. I would guess this means that lots of errors for 1 type lined up with 
The Pokemon with types in the wrong order are all Rock / Water fossil Pokemon

In [None]:
Mix_up = ML_dex_df[(ML_dex_df['Wrong Order'] == 1)].index.tolist()
Wrong = ML_dex_df[(ML_dex_df['All Wrong'] == 1)].index.tolist()

In [None]:
Mixed_up= ML_dex_df.name.iloc[Mix_up]
print(list(Mixed_up))

The Pokemon which the model made no correct predictions for are:

In [None]:
Failed = ML_dex_df.name.iloc[Wrong]
print(list(Failed))

Lots of the incorrect predictions apply to the more unique Pokemon, such as Legendaries (mostly from later generations), Ultra Beasts, or rare combinations like Sableye.

Some of the mistakes are really odd, like Water / Flying for Ditto.

It's no surprise that Inkay and Malamar confuse the mode, since they're both Dark / Psychic, but in 2 Water Egg groups, leading to the Water / Rock predictions.

# Future Prospects

To extend this work further, I plan to revisit Generation 1 again, and try to improve the overall accuracy for both Type slots. This will likely be done by adding more features, such as evolutionary information, or by doing feature engineering.