# Introduction 

I am most interested in the work of the connectionists and neural networks. In this report, I will attempt to answer the following question:

Given an image of a Pokemon, can a convolutional neural network model be used to classify the Pokemon by type?

I had this inspiration from a dataset I recently found on Kaggle, shown here:

https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types

## Background 

I refer the reader to the following Wikipedia article for more information on Pokemon:

https://en.wikipedia.org/wiki/Pok%C3%A9mon_(video_game_series)

For the purposes of context for this report, Pokemon is shorthand for Pocket Monsters, and it is a Japanese video game series developed for Nintendo gaming systems. In it, the player goes on an adventure where they assemble a team of 6 creatures, train them up to become strong, and compete for the recognition of becoming the most powerful trainer in the game world. Players in Pokemon compete by battling them against each other. As of this writing, there are 890 unique Pokemon. The Kaggle dataset mentioned above contains only 809 Pokemon, and was not updated for the additional 81 Pokemon introduced in Pokemon Sword and Shield in November 2019.

Each Pokemon has a primary and possibly a secondary type. Not all Pokemon have a seconday type. For the purposes of simplicity, we will only use the primary type of a Pokemon as a class label. There are 18 unique types of Pokemon, which each type having its own strengths and weaknesses in battle with respect to other types. Examples of types include Fire, Water, Grass, Ground, or Electric. We will investigate in this report the ability of a CNN to distinguish Pokemon by type based on their appearance in images.

# Experiment 1- Original Dataset

First we load in the data:

In [1]:
import numpy as np
import cv2
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import scipy
import os
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

In [2]:
os.chdir('C:/Users/mkell/Dropbox/Spring 2020/Artificial Intelligence/pokemon-type-classifier/pokemon-classifier')

In [3]:
pokemon=pd.read_csv('pokemon.csv')
print(np.unique(pokemon['Type1'], return_counts=True))
pokemon=pokemon.sort_values('Name')
pokemon=pokemon.reset_index(drop=True)
pokemon

(array(['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire',
       'Flying', 'Ghost', 'Grass', 'Ground', 'Ice', 'Normal', 'Poison',
       'Psychic', 'Rock', 'Steel', 'Water'], dtype=object), array([ 72,  29,  27,  40,  18,  29,  53,   3,  27,  78,  32,  23, 105,
        34,  53,  46,  26, 114], dtype=int64))


Unnamed: 0,Name,Type1,Type2
0,abomasnow,Grass,Ice
1,abra,Psychic,
2,absol,Dark,
3,accelgor,Bug,
4,aegislash-blade,Steel,Ghost
...,...,...,...
804,zoroark,Dark,
805,zorua,Dark,
806,zubat,Poison,Flying
807,zweilous,Dark,Dragon


This is the original dataset of 809 Pokemon. We will use the column 'Type1' for labeling. Next we load in the images:

In [4]:
images=np.empty((len(os.listdir('images/original_images')), 120, 120, 3))
count=0

for root, dirs, files in os.walk('images/original_images'):
    for i, file in enumerate(files):        
        path = os.path.join(root, file) 
        img=cv2.imread(path)        
        images[count] = img        
        count=count+1        
        #print("Loaded file "+str(count)+ " of "+str(len(os.listdir('images/images')))+ " ")              

In [5]:
images.shape

(809, 120, 120, 3)

Some Pokemon types are intuitive and some are not. For example, the first and third Pokemon in the original dataset are Ice-type Pokemon, which is supported by their white, snowy appearance. However, the second Pokemon is Psychic-type, which is not immediately evident from its appearance. The difficulty of this task for humans is present because of this fact. Thus, we hope to see how difficult this task is for a neural network.

Note that the original dataset contains only 809 images in 18 classes. This is hardly enough data with which to train a model. Nevertheless, we will try and train a model on this small dataset to see what happens.

Next we preprocess the data:

In [6]:
#create labels
image_labels=np.array(pokemon['Type1'])

In [7]:
#normalize data
images/=255
images=images.astype('float32')

In [8]:
# integer encode
label_encoder = LabelEncoder()
image_labels = label_encoder.fit_transform(image_labels)
# one hot encode
onehot_encoder = OneHotEncoder(sparse=False)
image_labels = image_labels.reshape(len(image_labels), 1)
image_labels = onehot_encoder.fit_transform(image_labels)
image_labels = np.asarray(image_labels)

In [9]:
#split data into train/test sets
train_data, test_data, train_labels, test_labels=train_test_split(images, image_labels, test_size=0.3, shuffle=True)
train_data, val_data, train_labels, val_labels=train_test_split(train_data, train_labels, test_size=0.1, shuffle=True)

Next we define the model. Our input images are of size (120, 120, 3), as they are 120x120 RGB images. We use a Conv-Pool-Conv-Pool format for the network, doubling the number of filters in each convolutional layer. Once we have reduced the output to 512 1x1 images, we flatten the convolutional output, we use 3 Dense layers at the end of the network with 256, 128, and 64 nodes before pssing the output to our final softmax layer of 18 classes. All layers of the neural network except the final output layer have a Rectified Linear Unit, or ReLU activation function.

These choices for model architecture were based on past convolutional neural network designs in the field. For an optimizer, we use Adam, or adaptive gradient descent with momentum. This is the most widely accepted optimizer for convolutional neural networks in the literature. We use a learning rate of 0.0001 for the network. This was determined through trial and error of training the network. We train the network with 63% of the 809 Pokemon, validate it on 7% of the 809 Pokemon, and test it on 30% of the 809 Pokemon.

In [10]:
model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(5, 5), activation='relu', input_shape=(120, 120, 3)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=(5, 5), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=512, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(18, activation='softmax'))

adam=tf.keras.optimizers.Adam(lr=10**-4)

model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 116, 116, 32)      2432      
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 58, 58, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 56, 56, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 128)       204928    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 128)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 10, 10, 256)       2

In [11]:
mc=tf.keras.callbacks.ModelCheckpoint('best_pokemon_model_original.hdf5', monitor='val_loss', save_best_only=True)

hist=model.fit(train_data, train_labels, batch_size=1, epochs=30, verbose=1, callbacks=[mc], 
               validation_data=(val_data, val_labels))

Train on 509 samples, validate on 57 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [12]:
model=tf.keras.models.load_model('best_pokemon_model_original.hdf5')
test_results=model.evaluate(test_data, test_labels, verbose=0)
test_results

[2.724155082624145, 0.12757201]

Given that we have 18 classes of Pokemon type, if a network were randomly guessing, it would achieve an accuracy of 1/18=0.055. Achieving a test accuracy of 23% thus means the network is doing better than randomly guessing, though still has fairly low accuracy. This supports the conclusion that there are noticeable, yet inconsistent patterns in Pokemon appearance that signify type.

A dataset of 809 instances is likely too small to properly train a model. We also notice that the 18 classes in this dataset are imbalanced. The most common class is 'Water' with 114 instances, and the least common class is 'Flying' with 3 instances. We can attempt to resolve this through oversampling, making copies of existing samples. We do this in the next section.

# Experiment 2- Oversampling 

First we must balance the dataset. We first automatically label each image based on its classification from the Pokemon csv file, below:

In [5]:
#os.mkdir('images/labeled_images')

In [6]:
#labels=np.unique(pokemon['Type1'])

#for label in labels:
    #os.mkdir('images/labeled_images/'+label)

In [6]:
#os.chdir('images/labeled_images')

#labels=np.unique(pokemon['Type1'])

#for label in labels:
    #for i in range(len(pokemon)):
        #if pokemon['Type1'][i]==label:
            #cv2.imwrite(label+'/'+pokemon['Name'][i]+'.png', images[i])            

Then we balance the dataset:

In [4]:
labeled_images=np.empty((len(os.listdir('images/labeled_images/')), 120, 120, 3))
count=0

filenames=[]

for root, dirs, files in os.walk('images/labeled_images/Bug'):
    for i, file in enumerate(files):        
        path = os.path.join(root, file) 
        img=cv2.imread(path)        
        labeled_images[count] = img        
        count=count+1
        filenames.append(file)

i=2
while len(os.listdir('images/labeled_images/Bug'))<1000:    
    for j in range(len(labeled_images)):
        cv2.imwrite('images/labeled_images/Bug/'+filenames[j]+'_'+str(i)+'.png', labeled_images[j])
        if len(os.listdir('images/labeled_images/Bug'))<1000:
            break
    i=i+1

# References 

https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types
    
https://en.wikipedia.org/wiki/Pok%C3%A9mon_(video_game_series)

https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-ii-hyper-parameter-42efca01e5d7

https://www.youtube.com/watch?v=g2vlqhefADk&t=273s

https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/

https://towardsdatascience.com/deep-learning-unbalanced-training-data-solve-it-like-this-6c528e9efea6