### Image Classification - Pokemon Edition
To the human pokemon player, it's usually easy to guess what a pokemon's (primary) typing is just by looking at it -- anything that has wings is generally flying type; anything blue or a fish is usually water; scary-looking pokemon tend to be dark or poinson type (or both); bugs are bugs; if you obtained it from a fossil then it's rock; if it has a plant in its name--it's a grass type; and if it's yellow it's ~~pikachu~~ an electric type.  However, many of these assumptions are based on a human's background knowledge of nature or vocabulary, and can usually guess the pokemon typing with high accuracy.  But what if we strip away the name puns?  And what if we give them to a computer that knows nothing about references to nature and mythology?  **I want to know if a computer can guess pokemon types from their images** 

#### Dataset:
Google overrepresents pikachu when you search "pokemon", so to make a more fair dataset, I'm instead using sprites from this site: https://pokemondb.net/sprites

I'm using Generation 8 Sprites (clearest and most detailed art) for Generation 1 (only has 15 pokemon types instead of the 18 in Generation 8).  If there are gender differences in the pokemon sprites, only the female version will be downloaded.

#### Method:
Using the methods from HW 5.1, I will classify pokemon.  However, Imagenet alone is not the right pretrained base model for this task--Imagenet is trained on photographs while the data here are clipart-style sprites.  This is why the imagenet model will be finetuned with a sample of the pokemon sprites.

I will manually classify the types for Generation 1, sample a subportion, and use this to create a model; then remaining Generation 1 is the "untrained" data set that will be used to test the model.

Note:  for dual-type pokemon, pokemon will be classified by their primary typing (first-listed).

#### Hypothesis:
I believe that the model will have a difficult time classifying water and normal type pokemon due to the wide varity of sprite shapes and colors in these categories--and because both of these categories are so large and varied, neither can be considered the "default" category.  

While the flying type category is the most distinctive, for Generation 1 there are *no* pure-flying types, and furthermore all pokemon that are part flying have it as their second typing.  The good news is that this means there's an entire classification category that's been eliminated, but unfortunately it was the most visually distin tive category.  The distribution of winged pokemon to other types will likely decrease the accuracy of the model as overall pokemon form now contributes less.

I do forsee that there is too much noise in the training set -- with only 151 pokemon split across 15 types, there's not much training data in any one classification category.  Despite this, I think the Generation 1 pokemon sprites are simple and consistent enough to yeild a reasonable accurate classification model.

In [41]:
import os
#import socket
#socket.setdefaulttimeout(10)
from shutil import copyfile
from tqdm.notebook import tqdm

import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.applications.resnet import preprocess_input
import tensorflow.keras.layers as layers

import numpy as np 
from sklearn.metrics import classification_report   # pretty stats
from sklearn.metrics import confusion_matrix

import pandas as pd

#### Step 1: Create 'train',  'test', and 'validation' subsets from the 14 (remaining) classes

In [75]:
# helper function to split the images into train/test/validate sets
def custom_split_train_test_val(directory, sample_size=0.75):    # function from manual
    subsets = ['train', 'test', 'val']

    upper_dir = os.path.dirname(os.path.dirname(directory))
    for subset in subsets:
        if not os.path.exists(os.path.join(upper_dir, subset)):
            os.mkdir(os.path.join(upper_dir, subset))
    for label in os.listdir(directory):
        path = os.path.join(directory, label)
        if not os.path.isdir(path):
            continue
            
        images = [image for image in sorted(os.listdir(path)) if image[-3:] in ['jpg', 'gif', 'epg', 'png']]
        size_train = int((1 - sample_size) * len(images))
        size_test = int((len(images) - size_train) / 2)   
        
        for subset in subsets:
            assert not os.path.exists(os.path.join(upper_dir, subset, label)), "Path already exists, delete it first"
            os.mkdir(os.path.join(upper_dir, subset, label))
        for train_image in images[:size_train]:
            copyfile(os.path.join(path, train_image), 
                    os.path.join(upper_dir, 'train', label, train_image))
        for test_image in images[size_train:size_train + size_test]:
            copyfile(os.path.join(path, test_image), 
                    os.path.join(upper_dir, 'test', label, test_image))
        for val_image in images[size_train + size_test:]:
            copyfile(os.path.join(path, val_image), 
                    os.path.join(upper_dir, 'val', label, val_image))

IMAGE_DIR = 'pokemon/types/'     # directory where original classifications are stored
custom_split_train_test_val(IMAGE_DIR, sample_size=0.75)  # with so few pokemon per category, sample as many per category as possible


# custom split information
IMAGE_SIZE = (128,128)   # this is shy I used the dataset I did:  consistent image size
path = os.path.join('pokemon/')  # this is where the original images are stored

train_ds = image_dataset_from_directory(    # used to fine-tune the existing Imagenet model
    path + '/train/'  ,   
    shuffle=True,
    image_size= IMAGE_SIZE,
    label_mode='categorical',
    batch_size=64,  
    )

test_ds = image_dataset_from_directory(
    path + '/test/' ,
    shuffle=False,
    image_size=IMAGE_SIZE,
    label_mode='categorical',
    batch_size=16
)

val_ds = image_dataset_from_directory(
    path + '/val/' ,
    shuffle=False,
    image_size=IMAGE_SIZE,
    label_mode='categorical',
    batch_size=16
)


# improve speed by precessing multiple baches at once with tensorflow AUTOTUNE (optional)
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)
test_ds = test_ds.prefetch(buffer_size=AUTOTUNE)

Found 34 files belonging to 14 classes.
Found 55 files belonging to 14 classes.
Found 62 files belonging to 14 classes.


#### Step 2:  Train the classification model

In [76]:
# steps from manual
IMG_SHAPE = IMAGE_SIZE + (3,)
base_model = tf.keras.applications.ResNet50(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
base_model.trainable = False  # only get attributes rom resnet; to not add to training data


# using resnet to make a model
model = tf.keras.Sequential()
model.add(tf.keras.layers.Lambda(preprocess_input, name='preprocessing', input_shape=IMG_SHAPE))
model.add(base_model)
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dropout(0.3))
model.add(tf.keras.layers.Dense(4, activation='softmax'))  # 4 because there are 4 classes

model.summary()


# compile the model
callbacks = [
        tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                         min_delta=0, patience=2, verbose=0, 
                          mode='min', baseline=None, 
                      restore_best_weights=True)]

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.CategoricalCrossentropy()

model.compile(optimizer=optimizer, loss=loss, metrics='accuracy')

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
preprocessing (Lambda)       (None, 128, 128, 3)       0         
_________________________________________________________________
resnet50 (Functional)        (None, 4, 4, 2048)        23587712  
_________________________________________________________________
flatten_7 (Flatten)          (None, 32768)             0         
_________________________________________________________________
dense_14 (Dense)             (None, 1024)              33555456  
_________________________________________________________________
dropout_7 (Dropout)          (None, 1024)              0         
_________________________________________________________________
dense_15 (Dense)             (None, 4)                 4100      
Total params: 57,147,268
Trainable params: 33,559,556
Non-trainable params: 23,587,712
_________________________________

#### Step 3:  Assess the model

In [77]:
predictions = model.predict(test_ds, verbose=1)
y_pred = np.argmax(predictions, axis=1)
y_true = np.argmax(np.concatenate([labels.numpy() for images, labels in test_ds.take(-1)]), axis=1)

print(confusion_matrix(y_true, y_pred))  # this is a lot easier to read than the classification report

[[3 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 2 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 3 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 3 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [2 0 2 0 0 0 0 0 0 0 0 0 0 0]
 [2 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [3 0 7 0 0 0 0 0 0 0 0 0 0 0]
 [2 0 4 0 0 0 0 0 0 0 0 0 0 0]
 [1 1 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 3 0 0 0 0 0 0 0 0 0 0 0]
 [0 3 6 0 0 0 0 0 0 0 0 0 0 0]]


As you can see, the performance is absolutely **abysmal**.  No matter how many times I rerun the model, I only ever get images in 2 or 3 categories.

I have a feeling that there are too many categories and not enough data in each category.  Let's take a look at what the pokemon type distribution actually is:

In [78]:
# create lists with stats per category
type_list = []
pokemon_count = []
train_count = []
test_count = []
val_count = []
directory = "pokemon/types/"

for poke_type in os.listdir(directory):
    type_list.append(poke_type)    # add pokemon typing to list
    
    count = 0
    type_dir = directory + str(poke_type) + '/'
    for file in os.listdir(type_dir):
        count += 1
    pokemon_count.append(count)


# now repeat processs on the split sets
for poke_type in type_list:
    train_counter = 0
    train_dir = "pokemon/train/" + str(poke_type) + '/'
    for file in os.listdir(train_dir):
        train_counter += 1  
    train_count.append(train_counter)
    
    test_counter = 0
    test_dir = "pokemon/test/" + str(poke_type) + '/'
    for file in os.listdir(test_dir):
        test_counter += 1  
    test_count.append(test_counter)
    
    val_counter = 0
    val_dir = "pokemon/val/" + str(poke_type) + '/'
    for file in os.listdir(val_dir):
        val_counter += 1
    val_count.append(val_counter)


# create dataframe from the lists
poke_stats_df = pd.DataFrame({
    'type': type_list,
    'total_pokemon': pokemon_count,
    'in_training_set': train_count,
    'in_test_set': test_count,
    'in_validation_set': val_count
})
poke_stats_df

Unnamed: 0,type,total_pokemon,in_training_set,in_test_set,in_validation_set
0,bug,12,3,4,5
1,dragon,3,0,1,2
2,electric,9,2,3,4
3,fighting,8,2,3,3
4,fire,12,3,4,5
5,ghost,3,0,1,2
6,grass,12,3,4,5
7,ground,8,2,3,3
8,ice,2,0,1,1
9,normal,26,6,10,10


Now the problem is obvious: most pokemon types have too few images, and therefore have too few in the training set.  With such a small training set, any major differences in images within that classification will make it difficult to train and validate a model.

## Step 4:  Define a more reasonable research question
To give the model a better chance at discerning type differences, we will:
1. decrease the number of categories to choose from
2. only choose categories with a significant number of pokemon

#### New question:
Can a computer distinguish between fire, poison, and normal type pokemon? (3 of the top categories)

#### Preprocessing:
These 3 types have been copied into a new directory called 'reduced_types'

#### Processing:

In [100]:
# New split on the reduced dataset:
IMAGE_DIR = 'pokemon/reduced_types/'     # directory where original classifications are stored
custom_split_train_test_val(IMAGE_DIR, sample_size=0.75)  # with so few pokemon per category, use half of each category to sample

IMAGE_SIZE = (128,128)   # this is shy I used the dataset I did:  consistent image size
path = os.path.join('pokemon/')  # this is where the original images are stored

train_ds = image_dataset_from_directory(
    path + '/train/'  ,   
    shuffle=True,
    image_size= IMAGE_SIZE,
    label_mode='categorical',
    batch_size=64,  
    )

test_ds = image_dataset_from_directory(
    path + '/test/' ,
    shuffle=False,
    image_size=IMAGE_SIZE,
    label_mode='categorical',
    batch_size=16
)

val_ds = image_dataset_from_directory(
    path + '/val/' ,
    shuffle=False,
    image_size=IMAGE_SIZE,
    label_mode='categorical',
    batch_size=16
)


# improve speed by precessing multiple baches at once with tensorflow AUTOTUNE (optional)
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_ds = train_ds.prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)
test_ds = test_ds.prefetch(buffer_size=AUTOTUNE)

Found 12 files belonging to 3 classes.
Found 20 files belonging to 3 classes.
Found 21 files belonging to 3 classes.


In [101]:
# CREATE A NEW MODEL: 
IMG_SHAPE = IMAGE_SIZE + (3,)
base_model = tf.keras.applications.ResNet50(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')
base_model.trainable = False  # only get attributes rom resnet; to not add to training data

# using resnet to make a model
model2 = tf.keras.Sequential()
model2.add(tf.keras.layers.Lambda(preprocess_input, name='preprocessing', input_shape=IMG_SHAPE))
model2.add(base_model)
model2.add(tf.keras.layers.Flatten())
model2.add(tf.keras.layers.Dense(1024, activation='relu'))
model2.add(tf.keras.layers.Dropout(0.3))
model2.add(tf.keras.layers.Dense(4, activation='softmax'))  # 4 because there are 4 classes
model2.summary()

# compile the model
callbacks = [
        tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                         min_delta=0, patience=2, verbose=0, 
                          mode='min', baseline=None, 
                      restore_best_weights=True)]

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
loss = tf.keras.losses.CategoricalCrossentropy()

model2.compile(optimizer=optimizer, loss=loss, metrics='accuracy')

Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
preprocessing (Lambda)       (None, 128, 128, 3)       0         
_________________________________________________________________
resnet50 (Functional)        (None, 4, 4, 2048)        23587712  
_________________________________________________________________
flatten_15 (Flatten)         (None, 32768)             0         
_________________________________________________________________
dense_30 (Dense)             (None, 1024)              33555456  
_________________________________________________________________
dropout_15 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_31 (Dense)             (None, 4)                 4100      
Total params: 57,147,268
Trainable params: 33,559,556
Non-trainable params: 23,587,712
________________________________

In [102]:
# ASSESS THE MODEL
predictions2 = model2.predict(test_ds, verbose=1)
y_pred = np.argmax(predictions2, axis=1)
y_true = np.argmax(np.concatenate([labels.numpy() for images, labels in test_ds.take(-1)]), axis=1)

print(confusion_matrix(y_true, y_pred))

[[ 3  0  1]
 [10  0  0]
 [ 5  1  0]]


### Conclusion:
Even when reducing the sample data to 3 clearly defined categories, it seems that the model classification precision is quite low.

This means that a pokemon's type recognizability doesn't come from its appearance alone.

The final dataset used for this 3-type classification task was 53 pokemon out of the available 151.  With 898 pokemon currently in the Generation 8 Pokedex, there is the potential for a lot more sample data.  If I were to continue this project, I would elicit the help of a couple friends to classify the remaining 700-ish pokemon, and retry the 14-type, 3-type, and full 18-type model creations.  I anticipate that vastly increasing sample size will yeild better results as it will decrease the overall noise and outlier influence. 

