# **Homework 5**

In your project, you will pick an image dataset to solve a classification task. Provide a link to your dataset.

Pokemon Image Dataset: https://www.kaggle.com/vishalsubbiah/pokemon-images-and-types?select=images

## **Task 1**

### **Pre-Setup**

I'd like to find out each pokemons primary type(fire, water, grass etc.) based on their image.

In [253]:
# Imports
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import matplotlib.gridspec as gridspec
import tensorflow as tf

In [254]:
from tensorflow import keras as ks
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv2D, MaxPool2D, Flatten, Activation, BatchNormalization

In [274]:
# Get CSV Data
df = pd.read_csv("../input/pokemon-images-and-types/pokemon.csv")
df.head(5)

In [275]:
# Select only a few types of pokemon because data are too little
selected = ["Water", "Fire", "Grass"]

df = df[df['Type1'].isin(selected)]
df['Type1'].value_counts()

In [276]:
img = os.listdir('../input/pokemon-images-and-types/images/images/')
for i in img:
  name = i.split('.')[0]
  df.loc[df['Name'] == name, ["Image"]] = i
df.head(5)

In [277]:
import shutil
shutil.rmtree('train/')
shutil.rmtree('test/')
shutil.rmtree('valid/')

os.mkdir('train/')
os.mkdir('test/')
os.mkdir('valid/')

for i in df['Type1'].unique():
  os.mkdir('train/'+str(i)+'/')
  os.mkdir('test/'+str(i)+'/')
  os.mkdir('valid/'+str(i)+'/')

In [278]:
x_train, x_rem, y_train, y_rem = train_test_split(df, df["Type1"], train_size=0.7)
x_valid, x_test, y_valid, y_test = train_test_split(x_rem, y_rem, test_size=0.5)

In [280]:
from shutil import copyfile, copy2
for image, types in zip("../input/pokemon-images-and-types/images/images/" + x_train['Image'], y_train):
  copy2(image, 'train/' + types)
for image, types in zip("../input/pokemon-images-and-types/images/images/" + x_test['Image'], y_test):
  copy2(image, 'test/' + types)
for image, types in zip("../input/pokemon-images-and-types/images/images/" + x_valid['Image'], y_valid):
  copy2(image, 'valid/' + types)

In [281]:
train = ImageDataGenerator().flow_from_directory('train/')
test = ImageDataGenerator().flow_from_directory('test/')
val = ImageDataGenerator().flow_from_directory('valid/')

**Part 1 (20 points):** This step involves downloading, preparing, and visualizing your dataset. Create a convolutional base using a common pattern: a stack of Conv and MaxPooling layers. Depending on the problem and the dataset you must decide what pattern you want to use (i.e., how many Conv layers and how many pooling layers). Please describe why you chose a particular pattern. Add the final dense layer(s). Compile and train the model. Report the final evaluation and describe the metrics.

---

For the first Conv2D I choosed 16 filters with size 3, the activation function for each Conv2D is relu. To reduce the spatial dimensions of the output volume a MaxPooling2D is used where the size is 3. For the third layer a dropout of 30% is used to prevent overfitting. After that add the 2nd, 3rd convolutional layer and increase the filter gradually. Lastly flatten into 1 dimension and a dense layer.

In [282]:
model = Sequential()

model.add(Conv2D(16, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (3, 3)))

model.add(Conv2D(32, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (3, 3)))

model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size = (3, 3)))
model.add(Dropout(0.3))

model.add(Flatten())
model.add(Dense(256, activation = 'relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

model.add(Dense(3, activation = 'softmax'))
model.compile(loss = 'mean_squared_error', optimizer = "adam", metrics = ['accuracy'])

In [284]:
history = model.fit(train, epochs = 20, validation_data = val)

In [286]:
from sklearn.metrics import classification_report
predict = model.predict(test)
y_pred = np.argmax(predict, axis=-1)

y_pred_conv = []
for i in y_pred:
  if i == 0:
    y_pred_conv.append("Fire")
  elif i == 1:
    y_pred_conv.append("Grass")
  elif i == 2:
    y_pred_conv.append("Water")

print(classification_report(y_test, y_pred_conv))

The main matrices is precision, but recall and f1 score is also printed out for each pokemon type. We can see that water type pokemon have much higher precision than fire and grass type, one reason is because water type pokemon have the most number of data thus it have trained better than the other two.

**Part 2 (25 points):** The following models are widely used for transfer learning because of their performance and architectural innovations:

1. VGG (e.g., VGG16 or VGG19).
2. GoogLeNet (e.g., InceptionV3).
3. Residual Network (e.g., ResNet50).
4. MobileNet (e.g., MobileNetV2)

Choose any **one** of the above models to perform the classification task you did in Part 1. Evaluate the results using the same metrics as in Part 1. Are there any differences? Why or why not?

---

In [304]:
from keras.applications.vgg16 import VGG16
from keras.models import Model

model = VGG16(include_top=False, input_shape=(256, 256, 3))
flat1 = Flatten()(model.layers[-1].output)
class1 = Dense(64, activation='relu')(flat1)
output = Dense(3, activation='softmax')(class1)
model = Model(inputs=model.inputs, outputs=output)
model.summary()

In [305]:
model.compile(loss = 'mean_squared_error', optimizer = "adam", metrics = ['accuracy'])
history = model.fit(train, epochs = 5, validation_data = val)

In [307]:
predict = model.predict(test)
y_pred = np.argmax(predict, axis=-1)

y_pred_conv = []
for i in y_pred:
  if i == 0:
    y_pred_conv.append("Fire")
  elif i == 1:
    y_pred_conv.append("Grass")
  elif i == 2:
    y_pred_conv.append("Water")

print(classification_report(y_test, y_pred_conv))

The result is worse than the previous one, identifying Water type pokemon is slightly lower and the non of the tests for both fire & grass type pokemon is correct. I'd say there are few reasons behind this:

1. the dataset is two small, there are only ~230 images of pokemon thus the accuracy may not go up.
2. Test set is too small as well, when splitting up the data set, only 37 images were put into test set.
3. Overfitting may be one of the cause.
4. Pokemons of each type do not have obvious features except for their colors(red, green, blue), and even some pokemons' color do not match their actual type.

**Part 3 (25 points):** Use data augmentation to increase the diversity of your dataset by applying random transformations such as image rotation (you can use any other technique as well). Repeat the process from part 1 with this augmented data. Did you observe any difference in results?

---

In [312]:
from tensorflow.keras import layers
resize_rescale_flip_rotation = tf.keras.Sequential([
  layers.Resizing(128, 128),
  layers.Rescaling(1./255),
  layers.RandomFlip("horizontal_and_vertical"),
  layers.RandomRotation(0.2),
])

In [313]:
model = tf.keras.Sequential([
  # Add the preprocessing layers you created earlier.
  resize_rescale_flip_rotation,
  layers.Conv2D(16, (3, 3), activation = 'relu'),
  layers.MaxPooling2D(pool_size = (3, 3)),

  Conv2D(32, (3, 3), activation = 'relu'),
  MaxPooling2D(pool_size = (3, 3)),
    
  Conv2D(64, (3, 3), activation = 'relu'),
  BatchNormalization(),
  MaxPooling2D(pool_size = (3, 3)),
  Dropout(0.3),

  Flatten(),
  Dense(256, activation = 'relu'),
  BatchNormalization(),
  Dropout(0.5),
  Dense(3, activation = 'softmax')
])

model.compile(loss = 'mean_squared_error', optimizer = "adam", metrics = ['accuracy'])

In [314]:
history = model.fit(train, epochs = 20, validation_data = val)

In [315]:
predict = model.predict(test)
y_pred = np.argmax(predict, axis=-1)

y_pred_conv = []
for i in y_pred:
  if i == 0:
    y_pred_conv.append("Fire")
  elif i == 1:
    y_pred_conv.append("Grass")
  elif i == 2:
    y_pred_conv.append("Water")

print(classification_report(y_test, y_pred_conv))

Comparing to the result in task 1 the result I have here is much better, I've reduce the size of the image by half and also randomly flipped & rotate the images. All of the tests for water type pokemon is correct and tests for both fire&grass type pokemon have substantially higher precision.

## Task 2

**Part 1 (15 points): Variational Autoencoder (VAE):** Here is a complete implementation of a VAE in TensorFlow: https://www.tensorflow.org/tutorials/generative/cvae

Following these steps try generating images using the same encoder-decoder architecture using a different Image dataset (other than MNIST).

---

**Part 2 (15 points): Generative Adversarial Networks (GANs):** Repeat part 1 (use same dataset) and implement a GAN model to generate high quality synthetic images. You may follow steps outlined here: https://www.tensorflow.org/tutorials/generative/dcgan

---