# Predicting Pokemon Types Based on Their Stats

### Introduction

***Pokemon Overview***

Pokemon is a card game, video game title, and movie/TV franchise. In the video games, each pokemon has one or two types, as well as 6 attribute stats. There are a total of 18 possible types (see list below), and each pokemon has a primary typing, and might also have a secondary typing. The 6 stats are HP (health), Attack, Defense, Special Attack, Special Defense, and Speed. 

***This Project***

The purpose of this project is to see if, through simple machine learning techniques, we are able to predict the typing of a pokemon based on its stats. If we are able to confidently predict a pokemon's type, then the models have discovered a learnable pattern between types and stats. If not, then the game of Pokemon is fairly well-balanced, avoiding certain typings to have either dominating or underpowered stats. 

***PokeAPI***

[PokeAPI](https://pokeapi.co/) was used to get the stats and typings of all 898 pokemon. The documentation for this free API can be found [here](https://pokeapi.co/docs/v2#stats). The first code cell in this notebook runs the `pip install` command to install the API needed for organizing the data.

***Typings List***

*   Normal
*   Fire
*   Water
*   Grass
*   Electric
*   Ice
*   Fighting
*   Poison
*   Ground
*   Flying
*   Psychic
*   Bug
*   Rock
*   Ghost
*   Dark
*   Dragon
*   Steel
*   Fairy


### Data Pre-Processing

When installing on Colab for the first time, this might cause errors. No restart is needed, as long as the `import pokepy` command in the next cell works fine.

In [None]:
# Install API
! pip install pokepy

Collecting pokepy
  Downloading https://files.pythonhosted.org/packages/53/8b/30a71543e709f643ad0520b24d99a2c501d5ef6512a090d15053ced36f5b/pokepy-0.6.1-py2.py3-none-any.whl
Collecting requests==2.21.*
[?25l  Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)
[K     |████████████████████████████████| 61kB 1.9MB/s 
Collecting beckett==0.8.*
  Downloading https://files.pythonhosted.org/packages/35/6a/d57cb6949e7a308d46800168899b54a0bd72cff171ba560f858ccdc030b4/beckett-0.8.0-py2.py3-none-any.whl
Collecting idna<2.9,>=2.5
[?25l  Downloading https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl (58kB)
[K     |████████████████████████████████| 61kB 3.7MB/s 
[?25hCollecting inflect==0.2.5
[?25l  Downloading https://files.pythonhosted.org/packages/66/15/2d176749884cbeda0c92e0d09e1303ff53a973eb3c6b

Use API to import data on all 898 pokemon

In [None]:
import pokepy
import numpy as np
from tqdm import tqdm

# Access API
client = pokepy.V2Client()

# Initialize database
number_of_pokemon = 898
database          = np.zeros((number_of_pokemon,6), dtype=int)
types             = []

# Iterate over pokemon
for i in tqdm(range(1, number_of_pokemon+1)):
  current_pokemon = client.get_pokemon(i)
  # Get current pokemon's stats
  for j in range(6):
    database[i-1,j] = current_pokemon.stats[j].base_stat
  # Get current pokemon's primary type
  current_type = current_pokemon.types[0].type.name
  types.append(current_type)

# Make types into array
types = np.array(types)

100%|██████████| 898/898 [00:28<00:00, 31.11it/s]


Use one-hot-encoding for labels, permute data to split into train/test sets

In [None]:
import keras
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Encode types to numbers
types_float = LabelEncoder().fit_transform(types)

# Randomly permute the Train/Test split
X_train, X_test, y_train, y_test = train_test_split(database, types_float)

# Save float labels
y_train_float = y_train
y_test_float  = y_test

# Convert float labels to One-Hot-Encoding
num_classes = 18    # total number of types
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test  = keras.utils.to_categorical(y_test, num_classes)

### Models

In [None]:
# Imports for all models
from sklearn import metrics

Decision Tree Model

In [None]:
from sklearn import tree

# Train
model_tree = tree.DecisionTreeClassifier(max_depth=10)
model_tree.fit(X_train, y_train)

# Predict
y_pred_tree = model_tree.predict_proba(X_test)
y_pred_tree = np.array(y_pred_tree)[:,:,1].T           # make array, get positive preds, reshape (transpose)

# Evaluate
auroc_tree    = metrics.roc_auc_score(y_test, y_pred_tree)
accuracy_tree = metrics.accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred_tree, axis=1))

print('AUROC:\t ', auroc_tree)
print('ACCURACY:', accuracy_tree)

AUROC:	  0.5533291852824997
ACCURACY: 0.12


Random Forest Model

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Train
model_forest = RandomForestClassifier(n_estimators=50, max_depth=10)
model_forest.fit(X_train, y_train)

# Predict
y_pred_forest = model_forest.predict_proba(X_test)
y_pred_forest = np.array(y_pred_forest)[:,:,1].T       # make array, get positive preds, reshape (transpose)

# Evaluate
auroc_forest    = metrics.roc_auc_score(y_test, y_pred_forest)
accuracy_forest = metrics.accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred_forest, axis=1))

print('AUROC: ', auroc_forest)
print('ACCURACY:', accuracy_forest)

AUROC:  0.6560672191027561
ACCURACY: 0.19555555555555557


K-Nearest Neighbors (KNN) Model

In [None]:
from sklearn.neighbors import KNeighborsClassifier

k = 26

# Train
model_knn = KNeighborsClassifier(n_neighbors=k)
model_knn.fit(X_train, y_train)

# Predict
y_pred_knn = model_knn.predict_proba(X_test)
y_pred_knn = np.array(y_pred_knn)[:,:,1].T       # make array, get positive preds, reshape (transpose)

# Evaluate
auroc_knn    = metrics.roc_auc_score(y_test, y_pred_knn)
accuracy_knn = metrics.accuracy_score(np.argmax(y_test, axis=1), np.argmax(y_pred_knn, axis=1))

print('AUROC: ', auroc_knn)
print('ACCURACY:', accuracy_knn)

AUROC:  0.6740974342930934
ACCURACY: 0.21333333333333335


K-Means Model (Unsupervised Learning)

In [None]:
from sklearn.cluster import KMeans

k = 18

# Train
model_kmeans = KMeans(n_clusters=k)
model_kmeans.fit(X_train)

# Predict
y_pred_kmeans = model_kmeans.predict(X_test)

# Evaluate
accuracy_kmeans = metrics.accuracy_score(np.argmax(y_test, axis=1), y_pred_kmeans)

print('ACCURACY:', accuracy_kmeans)

ACCURACY: 0.06222222222222222


Neural Network Sequential Model

In [None]:
from keras.models import Sequential
from keras.layers import Dense

# Build model
model_NN = Sequential()
model_NN.add(Dense(6, activation='relu'))
model_NN.add(Dense(12, activation='relu'))
model_NN.add(Dense(18, activation='relu'))
model_NN.add(Dense(num_classes, activation='softmax'))

model_NN.compile(loss='categorical_crossentropy',
                 optimizer='SGD',
                 metrics=['accuracy', 'AUC'])

# Train model
model_NN.fit(X_train, y_train,
             batch_size=50,
             epochs=20,
             verbose=1,
             validation_data=(X_test, y_test))

# Evaluate
evaluation_NN = model_NN.evaluate(X_test, y_test, verbose=0)

print()
print('Test loss:\t', evaluation_NN[0])
print('Test accuracy:\t', evaluation_NN[1])
print('Test AUC:\t', evaluation_NN[2])

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test loss:	 2.719932794570923
Test accuracy:	 0.1599999964237213
Test AUC:	 0.705143392086029


### Write-Up

***Data Format***

As we are trying to predict the primary type of a pokemon based on its stats, the input (X) is an integer array of the 6 different stat values for a pokemon, and the label (Y) is the pokemon's primary type. Since the 18 different types are strings, they had to be converted to integers (each type is assigned a unique number). These categorical labels were then one-hot-encoded, to avoid introducing a bias based on how far apart single-integer labels are from each other.


***Train/Test Split***

SciKit-Learn's `train_test_split` function was used to separate the entire database into two separated sets of data - one for training the models and one for evaluating their performance. The test set was used to evaluate each model after it had been trained, but no further adjustments (i.e., hyperparameter tuning) were made after this point, to ensure the cleanliness of the test set. 

***Evaluation Metrics***

For this project, there are two main metrics we care about - accuracy and AUROC. As we are trying to predict the primary type of a pokemon based on its stats, the accuracy of our predictions is a rather self-explanatory metric for basic performance. Since there are 18 different types, randomly guessing a pokemon's type would have a 1/18 chance of being correct, or an accuracy of ~5.5%.

However, that performance is solely based on hard predictions. Most models used (with the exception of K Means, as it is unsupervised) can generate probabilistic predictions though, which show the probability of an example corresponding to each individual label. In order to evaluate the performance of a model's probabilistic predictions, the AUROC (area under ROC curve) metric is used. An AUROC value of 0.5 corresponds to randomly guessing, and 1.0 indicates a perfect model. 

***Model Selection***

* **Decision Tree:** One of the simpler machine learning models, the decision tree model was used as somewhat of a baseline. If there is an obvious, learnable pattern between typing and stats, a decision tree should be able to pick up on it.

* **Random Forest:** A combination of multiple decision trees, the random forest model is one of the simpler ensemble models, but often has success in categorical machine learning competitions on Kaggle. This model was included to see if a significant improvement from a single decision tree could be achieved.

* **K-Nearest Neighbors:** Our classification problem has 18 distinct labels that we are trying to properly assign new pokemon to, so we can place each example in high dimensional space (6D, as each stat is used as a dimension) and choose its label based on the labels of its closest neighbors. This model should be significantly better than a single decision tree.

* **K-Means:** Classification problems with known labels often lend themselves to supervised learning, but this unsupervised clusering model was included to demonstrate the difference in performance betweeen supervised and unsupervised learning on a problem such as this. If there was a distinct separation between types based on a pokemon's stats, this model has the potential to perform well. However, if the separation between clusters is less obvious, you would not expect good performance.

* **Sequential Neural Network:** Neural networks can be tricky with such a small dataset, more specifically loss optimization and global minimum detection. However, the neural network included is quite shallow, and would be expected to perform within the same range as a well-tuned ensemble model or K-Nearest Neighbors model for a problem like this.

***Hyperparameter Tuning***

Since this dataset is fairly small, most of these models took less than a second to train and evaluate. Because of this, it was relatively easy to perform most of the hyperparameter tuning by hand. Most models had minimal hyperparameters to begin with, which further simplified this process. The specific tunings are stated below:

* **Decision Tree / Random Forest:** The max depth of the decision tree and random forest was set at 10, as leaving it unbounded seemed to cause slight overfitting. The number of trees in the random forest model was set to 50, as any extra didn't seem to improve performance.

* **K-Nearest Neighbors:** The k value of a K-Nearest Neighbors model should typically be roughly the square root of your input size. The training dataset consisted of 674 examples, so the k value was set to 26 (`sqrt(674) = 25.9`). Decreasing this k value strictly lowered both performance metrics, and increasing the k value had inconsistent effects, so I ended up using 26 for the final value.

* **K-Means:** Since there are 18 unique labels for this data, it wouldn't make sense to use any other k value for the K-Means model. Ideally, each cluster would represent a single type.

* **Sequential Neural Network:** This model was comprised of a few dense layers, gradually increasing the number of units in each layer until it reached the number of output labels (6, then 12, then 18). Each of these layers used relu activation, as I found that this works best for classification problems such as this one. A final softmax layer was used for the output, which is fairly standard. The model used categorical cross entropy loss, since there are 18 possible labels for categorization. Stochastic gradient descent was the best performing optimizer, although it still had trouble finding the global minimum. Like the rest of the models, this neural network's layers and hyperparameters were hand-tuned.

***Performance***

* **Random Guessing:** Below are the expected AUROC and accuracy from randomly guessing each pokemon's type. This is used as a baseline for the models trained in this project.
 * 0.500 AUROC
 * 0.055 Accuracy (1/18)

* **Decision Tree:** While its accuracy is double that of randomly guessing, it is still incredibly low. When looking at the AUROC, it is apparent that this model performed just slightly better than randomly guessing. This indicates a lack of an obvious, learnable pattern between stats and typing.
 * 0.553 AUROC
 * 0.120 Accuracy

* **Random Forest:** While there was significant improvement forming an ensemble with multiple decision trees, this model still left a lot to be desired. The AUROC value tells a story of the probabilistic predictions indicating a slight pattern being learned, but the accuracy makes it apparent that drastic improvement is needed in order to confidently make hard predictions.
 * 0.656 AUROC
 * 0.195 Accuracy

* **K-Nearest Neighbors:** This model had one of the best performances by both metrics, which would make sense, as this project is quite similar to something like the MNIST Iris dataset, where KNN models typically do quite well. Unfortunately, its performance is just slightly better than that of the random forest ensemble.
 * 0.674 AUROC
 * 0.213 Accuracy

* **K-Means:** As expected, this model's performance was on par with randomly guessing. If there was an obvious pattern between stats and typings, this model would probably be able to cluster them a bit better. However, based on the performance of the decision tree, this does not seem to be the case, so this unsupervised model was expected to perform poorly.
 * 0.062 Accuracy

* **Neural Network:** The neural network had the best probabilistic predictions, as evidinced by its (relatively) high AUROC value. Unfortunately, these probabilistic predictions did not correspond to higher accuracy in the hard predictions. In fact, there was a significant decrease in accuracy compared to the K-Nearest Neighbors model. This might be attributed to the model's fairly high loss value. It was more challenging than expected to minimize the categorical cross-entropy loss of this model between epochs, which could indicate many local minimums. The smaller size of this dataset might have also contributed to this. Overall, this was a fairly lackluster performance from a neural network on a classification problem.
 * 0.705 AUROC
 * 0.160 Accuracy
 * 2.720 Categorical Cross-Entropy Loss

***Overall Results and Conclusion***

While the best models were able to get better-than-random performance (~0.20 learned accuracy as opposed to the ~0.05 random-guessing accuracy, and ~0.70 learned AUROC rather than the ~0.50 random-guessing AUROC), the predictions did not improve past this. These results indicate the presence of a slight learnable pattern between a pokemon's stats and their primary typing, but not enough to confidently classify pokemon using a simple machine learning model. It could still be possible to more accurately predict typing based on stats alone, but I believe further feature engineering and a more complex model would be needed (see ***Possible Improvements***).

While these results might be underwhelming or disappointing, they are an indication that the Pokemon video games are fairly well-balanced when it comes to typings. There are not many cases of specific typings having too high or low stats when compared to other typings. 

***Possible Improvements***

There are a couple important things to note about improving this notebook:

1. **Use a validation set:** I split the dataset into a train and test set, and evaluated all of the models based on the same test set. While this would typically lead to overfitting towards the test set, I was not using these evaluations to tune hyperparameters (I re-permuted the data each time I tuned hyperparameters to avoid bias) or to pick a single best model to use. Since it became apparent that no obvious patter was going to arise from this dataset, I wanted to demonstrate how each model performed in relation to each other on this dataset, instead of trying to actually make confident predictions. In order to select a specific model and make more confident predictions, a validation set is needed. Split the dataset into a train, validate, and test set, use the training set to train all the models, the validation set to evaluate them, pick the best model, and perform the final evaluation on the held-out test set.

2. **Restructure the Problem:** Since predicting a pokemon's type based on its stats proved less effective than anticipated, it might be interesting to try to use the stats to predict typing *given two or three choices*. If you know a pokemon is a fire type for instance, randomly choose another type, and see if the model can correctly choose between the two types. This could also be accomplished by combining all the examples of fire types and water types, then trying to sort them. 

3. **Utilize the secondary typing of some pokemon:** This project only looked at a pokemon's primary typing, even though some pokemon have a secondary type as well, and the data is available in the API. I tried to include the secondary typings in two different ways, but the result of both experiments were on par or worse than the classification of just the primary typing, so I did not include it in the source code above. See below for the two secondary-typing experiments.

***Counting Pokemon with Secondary Typings Twice***

For this experiment, when I was pre-processing the data, I duplicated a pokemon's entry if it had multiple types, with each entry corresponding to each type. For example, Bulbasaur (the first pokemon in the list) has the stats 45/49/49/65/65/45, and types 'grass' and 'poison'. Thus, the two entries in the database were as follows:

`49/49/65/65/45, 'grass'`

`49/49/65/65/45, 'poison'`

Again, the results of this experiment were worse than only using the primary typing (lower AUROC and accuracy), but pre-processing code to accomplish it is below:

In [None]:
# EXAMPLE CODE FOR DUPLICATING POKEMON ENTRIES WITH DUAL TYPING
# =============================================================

number_of_pokemon = 898
 
database = []
types = []
 
# Iterate over pokemon
for i in tqdm(range(1,number_of_pokemon+1)):
 
  current_pokemon = client.get_pokemon(i)
 
  # Get current pokemon's stats
  current_stats = []
  for j in range(6):
    current_stats.append(current_pokemon.stats[j].base_stat)
  database.append(current_stats)
  # Get current pokemon's primary type
  types.append(current_pokemon.types[0].type.name)
 
  # Add pokemon to database again if it has 2 types
  if len(current_pokemon.types) is 2:
    # Re-add stats
    current_stats = []
    for j in range(6):
      current_stats.append(current_pokemon.stats[j].base_stat)
    database.append(current_stats)
    # Add second type
    types.append(current_pokemon.types[1].type.name)
 
database = np.array(database)
types = np.array(types)

***One-Hot-Encoding both Primary and Secondary Typings***

For this experiment, I one-hot-encoded both the primary and secondary typings of all pokemon that had multiple types. For example, if a pokemon was a Bug/Dark type, it's label vector would be 

`[1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]`

instead of

 `[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]` 
 
 (in this example, the encoding of the types are alphabetical, so 'bug' and 'dark' are the first two types - there are no actual pokemon with this particular typing combination)

The results of this experiment were noticably worse than only using the primary typing (lower AUROC and accuracy) when using a neural network or K-Means. The one-hot-encoding had to be done manually, since only some of the pokemon have a secondary type. The pre-processing code to accomplish it is below:

In [None]:
# EXAMPLE CODE FOR ONE-HOT-ENCODING BOTH TYPINGS FOR POKEMON
# ==========================================================

number_of_pokemon = 898

database = np.zeros((number_of_pokemon,6), dtype=int)
types = []

# Iterate over pokemon
for i in tqdm(range(1,number_of_pokemon+1)):
  current_pokemon = client.get_pokemon(i)
  # Get current pokemon's stats
  for j in range(6):
    database[i-1,j] = current_pokemon.stats[j].base_stat
  # Get current pokemon's types
  type1 = current_pokemon.types[0].type.name
  type2 = 'DELETE_THIS'
  if len(current_pokemon.types) is 2:
    type2 = current_pokemon.types[1].type.name
  types.append((type1, type2))

types = np.array(types)

In [None]:
# EXAMPLE CODE FOR ONE-HOT-ENCODING BOTH TYPINGS FOR POKEMON
# ==========================================================

# Define pokemon types
pokemon_types = ('bug', 'dark', 'dragon', 'electric', 'fairy', 'fire', 'fighting', 'flying', 'grass', 'ghost',
                 'ground', 'ice', 'normal', 'water', 'poison', 'psychic', 'rock', 'steel', 'DELETE_THIS')

# Initialize one-hot-vectors
types_one_hot = np.zeros((len(types), len(pokemon_types)))

# Iterate over each example
for i in range(len(types)):
  # First typing
  arr_mask = np.where(np.array(pokemon_types) == types[i,0],True,False)
  arr_index = np.arange(0, len(pokemon_types))[arr_mask]
  types_one_hot[i,arr_index] = 1
  # Second typing
  arr_mask = np.where(np.array(pokemon_types) == types[i,1],True,False)
  arr_index = np.arange(0, len(pokemon_types))[arr_mask]
  types_one_hot[i,arr_index] = 1

# Ignore last column ('DELETE_THIS' placeholder)
types_one_hot = types_one_hot[:,:len(pokemon_types)-1]

# Randomly permute the Train/Test split
X_train, X_test, y_train, y_test = train_test_split(database, types_one_hot)