In [None]:
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#  Copyright (c) 2021. Mohamed Reda Bouadjenek, Deakin University              +
#           Email:  reda.bouadjenek@deakin.edu.au                              +
#                                                                              +
#  Licensed under the Apache License, Version 2.0 (the "License");             +
#   you may not use this file except in compliance with the License.           +
#    You may obtain a copy of the License at:                                  +
#                                                                              +
#                 http://www.apache.org/licenses/LICENSE-2.0                   +
#                                                                              +
#    Unless required by applicable law or agreed to in writing, software       +
#    distributed under the License is distributed on an "AS IS" BASIS,         +
#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  +
#    See the License for the specific language governing permissions and       +
#    limitations under the License.                                            +
#                                                                              +
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


**Notebook author:** [Mohamed Reda Bouadjenek](https://rbouadjenek.github.io/), Lecturer of Applied Artificial Intelligence, 

**Institution:** Deakin University, School of Information Technology, Faculty of Sci Eng & Built Env

**Adress:** Locked Bag 20000, Geelong, VIC 3220

**Phone:** +61 3 522 78380

**Email:** reda.bouadjenek@deakin.edu.au

<img style="float: left;" src="https://github.com/rbouadjenek/deakin-simpsons-challenge2020/blob/main/images/deakin2.png?raw=1" width="200">

# Introduction

**Welcome to the Notebook for the Deakin Simpsons Challenge 2021!**

![](https://github.com/rbouadjenek/deakin-simpsons-challenge2020/blob/main/images/Simpsons_cast.png?raw=1)



This Notebook allows you to build a classification model for The Deakin Simpsons challenge 2021.

The **Deakin Simpsons challenge 2021** is a computer vision competition for which the goal is to recognize Simpsons characters individually in images using machine learning/deep learning. The challenge is designed to provide students with the opportunity to work as team members, to compete with each other, and to enhance the student learning experience by improving their AI modeling, problem-solving, and team-working skills.
 


As participants, your goal is to build a machine learning/deep learning model to automatically recognize the following Simpsons characters:

 
1. [Abraham grampa simpson](https://en.wikipedia.org/wiki/Grampa_Simpson)
2. [Apu nahasapeemapetilon](https://en.wikipedia.org/wiki/Apu_Nahasapeemapetilon)
3. [Bart simpson](https://en.wikipedia.org/wiki/Bart_Simpson)
4. [Charles montgomery burns](https://en.wikipedia.org/wiki/Mr._Burns)
5. [chief wiggum](https://en.wikipedia.org/wiki/Chief_Wiggum)
6. [Comic book guy](https://en.wikipedia.org/wiki/Comic_Book_Guy)
7. [Edna krabappel](https://en.wikipedia.org/wiki/Edna_Krabappel)
8. [Homer simpson](https://en.wikipedia.org/wiki/Homer_Simpson)
9. [Kent brockman](https://en.wikipedia.org/wiki/Kent_Brockman)
10. [Krusty the clown](https://en.wikipedia.org/wiki/Krusty_the_Clown)
11. [Lenny leonard](https://simpsons.fandom.com/wiki/Lenny_Leonard)
12. [Lisa simpson](https://en.wikipedia.org/wiki/Lisa_Simpson)
13. [Marge simpson](https://en.wikipedia.org/wiki/Marge_Simpson)
14. [Mayor quimby](https://en.wikipedia.org/wiki/Mayor_Quimby)
15. [Milhouse van houten](https://en.wikipedia.org/wiki/Milhouse_Van_Houten)
16. [Moe szyslak](https://en.wikipedia.org/wiki/Moe_Szyslak)
17. [Ned flanders](https://en.wikipedia.org/wiki/Ned_Flanders)
18. [Nelson muntz](https://en.wikipedia.org/wiki/Nelson_Muntz)
19. [Principal skinner](https://en.wikipedia.org/wiki/Principal_Skinner)
20. [Sideshow bob](https://en.wikipedia.org/wiki/Sideshow_Bob)


To achieve this taks, you will be given a data set that consists of 19,548 images to train your model and to tune your hyperparameters. However, feel free to extend it by collecting new images or by using data augmentation techniques.

Once you have built your model, you will have to submit it on the [CodaLab](https://competitions.codalab.org/competitions/27191?secret_key=f0a7cc3e-7f78-4bb1-8564-95bc2fadafa5) platform to be evaluated. 
We evaluate the performance of your model using the [Accuracy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html)  on a private test set that we have directly collected and labeled from TV show episodes.
Once the evaluation completed, your entry will appear on the leaderboard to see your performance against other competitors.


In the following, we will take you through  a 6-step process to build a simple model to perform this task as follows:

1. `Setup the environment:` Thie first step consists of setting the environement and downloading the data.
2. `Preprocessing:` The second step is a preprocessing step that consists of resizing, plitting, and piping the input data.
3. `Exploring the data:` The third step consists of a simple data exploration step where you will see samples of the data and some statistics to help you in understanding the data.
4. `Designing the model:` The forth step consists of designing an architecture for the task.
5. `Traning:` The fifth step consists of starting the training process.
6. `Monitoring:` The sixth step consists of monitoring the traning process to investigate possible overfitting.
7. `Submission:` The seventh and last step will take you through the submission process.


**References:**

- [The Simpsons characters recognition and detection using Keras](https://medium.com/alex-attia-blog/the-simpsons-character-recognition-using-keras-d8e1796eae36)


# Setup the environment


First, it is important to mention that in order to submit you model to the leaderbord, you need to generate it and save it using  <span style="color:red;font-weight: bold;">TensorFlow 2.2.0</span> and not  <span style="color:red;font-weight: bold;text-decoration: line-through;">TensorFlow 2.3.0</span>. Therefore, please first run the following cell to install the appropriate <span style="color:red;font-weight: bold;">TensorFlow version (2.2.0)</span>. You may need to restart your kernel.

**The following code section is from the template provided by the AI Challenge organiser.**

In [None]:
# Run this to install the appropriate tensorflow package
!pip install tensorflow==2.2.0


Collecting tensorflow==2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/4c/1a/0d79814736cfecc825ab8094b39648cc9c46af7af1bae839928acb73b4dd/tensorflow-2.2.0-cp37-cp37m-manylinux2010_x86_64.whl (516.2MB)
[K     |████████████████████████████████| 516.2MB 33kB/s 
Collecting tensorboard<2.3.0,>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/1d/74/0a6fcb206dcc72a6da9a62dd81784bfdbff5fedb099982861dc2219014fb/tensorboard-2.2.2-py3-none-any.whl (3.0MB)
[K     |████████████████████████████████| 3.0MB 35.2MB/s 
Collecting tensorflow-estimator<2.3.0,>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f5/926ae53d6a226ec0fda5208e0e581cffed895ccc89e36ba76a8e60895b78/tensorflow_estimator-2.2.0-py2.py3-none-any.whl (454kB)
[K     |████████████████████████████████| 460kB 43.2MB/s 
Installing collected packages: tensorboard, tensorflow-estimator, tensorflow
  Found existing installation: tensorboard 2.4.1
    Uninstalling tensorboard-2.4.1:
      

**The following code section is from the template provided by the AI Challenge organiser.**

Once the appropriate TensorFlow version installed, you need now to load all the required packages for this Notebook.

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, balanced_accuracy_score, accuracy_score, classification_report
from tensorflow.keras import models, layers, optimizers
from tensorflow.python.keras.saving import hdf5_format
from keras.preprocessing.image import ImageDataGenerator, DirectoryIterator
import h5py, itertools, collections
import itertools

##################
# Verifications:
#################
print('GPU is used.' if len(tf.config.list_physical_devices('GPU')) > 0 else 'GPU is NOT used.')
print("Tensorflow version: " + tf.__version__)


GPU is used.
Tensorflow version: 2.2.0


Now, please run the following cell to download the dataset that you will use to build your model.

**The following code section is from the template provided by the AI Challenge organiser.**

In [None]:
# Download dataset:
!wget http://206.12.93.90:8080/simpson_dataset/simpsons_train.tar.gz 
# Unzip the dataset:
!tar -xzvf simpsons_train.tar.gz > /dev/null



--2021-05-13 22:28:38--  http://206.12.93.90:8080/simpson_dataset/simpsons_train.tar.gz
Connecting to 206.12.93.90:8080... connected.
HTTP request sent, awaiting response... 200 
Length: 488194922 (466M) [application/x-gzip]
Saving to: ‘simpsons_train.tar.gz’


2021-05-13 22:29:03 (18.6 MB/s) - ‘simpsons_train.tar.gz’ saved [488194922/488194922]



# Preprocessing


We use the Simpson character data available in [kaggle](https://www.kaggle.com/alexattia/the-simpsons-characters-dataset). 

This dataset is composed of 20 folders (one for each character) with 400-2000 images in each folder. The total number of images is 19,548.

For reading these images, we use `DirectoryIterator` in `tf.keras.preprocessing.image` that is an iterator capable of reading images from a directory on disk and is capable to extract labels. We also use `ImageDataGenerator` to split this dataset into training and validation set, this later is used to tune the hyperparameters of our model.


**The following code section is from the template provided by the AI Challenge organiser.**

**Changes were made to:**

**- image size**

**- batch size**

**An imageDataGenerator was added to feed augmented images into the model for data augmentation.**

In [None]:
'''
    Split train and validation.
'''
# We define the size of input images to 128x128 pixels.
image_size = (224, 224)
# We define the batch size
batch_size = 8
# Create an image generator with a fraction of images reserved for validation:
image_generator = ImageDataGenerator(
        rotation_range=10,
        width_shift_range=0.1,
        height_shift_range=0.1,
        shear_range=0.1, 
        zoom_range=0.2,
        validation_split=0.1)

val_image_generator = ImageDataGenerator(validation_split=0.1)
# Now, we create a training data iterator by creating batchs of images of the same size as 
# defined previously, i.e., each image is resized in a 64x64 pixels format.
train_ds =  DirectoryIterator(
    "dataset/simpsons_train/",
    image_generator,
    class_mode='categorical',
    seed=1337,
    target_size=image_size,
    batch_size=batch_size,
    subset = 'training',
)

# Similarly, we create a validation data iterator by creating batchs of images of the same size as 
# defined previously, i.e., each image is resized in a 64x64 pixels format.
val_ds = DirectoryIterator(
    "dataset/simpsons_train/",
    val_image_generator,
    class_mode='categorical',
    seed=1337,
    target_size=image_size,
    batch_size=batch_size,
    subset = 'validation',
    shuffle=False
)

# We save the list of classes (labels).
class_names = list(train_ds.class_indices.keys())

# We also save the number of labels.
num_classes = train_ds.num_classes


Found 17603 images belonging to 20 classes.
Found 1945 images belonging to 20 classes.


# Exploring the data

Now, we do data exploration to show you samples of the images and their labels and some statistics to help you in understanding the data.

# Designing the model


We now design the architecture for the task. The artchitecture below consists of:
1. `Rescaling layer:` whose role is to normalize the input data to values between 0 and 1. This will help in speed up the training process.
2. `Flatten layer:` whose role is to flatten the 3D volume.
3. `Dense layers`: one dense layer followed by a classification layer with a softmax activation function.

Please note that you will have to design your own model if you want to beat the baseline and be at the top of the leaderboard!

In [None]:
# CONVOLUTIONAL PRETRAINED BASE

conv_base = tf.keras.applications.DenseNet201(weights='imagenet', include_top=False)


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet201_weights_tf_dim_ordering_tf_kernels_notop.h5


In [None]:
# Defining your model here:

from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Activation

inputs = keras.Input(shape=image_size + (3,))
x = inputs
x - layers.experimental.preprocessing.Rescaling(1./255)(x)


x = conv_base(x)
x = GlobalAveragePooling2D()(x)

#x = layers.Flatten()(x)
#x = layers.Dropout(0.7)(x)


x = Dense(1024, activation='relu')(x)
x = layers.Dropout(0.7)(x)
x = Dense(20)(x)
outputs = Activation(activation='softmax')(x)
model = keras.Model(inputs, outputs)


model.compile(optimizer=optimizers.RMSprop(lr=0.00001),
              loss='CategoricalCrossentropy',
              metrics=['accuracy'])

model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
densenet201 (Model)          multiple                  18321984  
_________________________________________________________________
global_average_pooling2d (Gl (None, 1920)              0         
_________________________________________________________________
dense (Dense)                (None, 1024)              1967104   
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 20)                20500     
_________________________________________________________________
activation (Activation)      (None, 20)                0     

In [None]:
conv_base.trainable = True


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')


Mounted at /content/gdrive


In [None]:
model.compile(optimizer=optimizers.RMSprop(lr=0.00001),
              loss='CategoricalCrossentropy',
              metrics=['accuracy'])

In [None]:
# Create callback to save best model
model_name="densenet201FMaug3"
model_save_name = "model_best_accuracy_"+model_name

highest_acc = 0
class high_acc_Callback(tf.keras.callbacks.Callback):
        def on_epoch_end(self, epoch, logs={}):
          current_acc=logs.get('val_accuracy')
          global highest_acc
          global class_names
          global image_size
          #print ("image_size: ",image_size)
          from google.colab import drive
          if current_acc > highest_acc:
            print("Highest accuracy so far: ",current_acc)
            highest_acc=current_acc
            #strip activation layer
            with h5py.File('/content/gdrive/My Drive/'+"dnsnetaugSubt_"+"{:.5f}".format(current_acc)+ '_model.h5', mode='w') as f:
              hdf5_format.save_model_to_hdf5(model, f)
              f.attrs['class_names'] = class_names
              f.attrs['image_size'] = image_size
          
callbacks = high_acc_Callback()

history = model.fit(
  train_ds,
  epochs=50,
  validation_data=val_ds,
  callbacks=[callbacks]
  )



Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


# Acknowledgment


**Author:** [Mohamed Reda Bouadjenek](https://rbouadjenek.github.io/), Lecturer of Applied Artificial Intelligence, 

**Institution:** Deakin University, School of Information Technology, Faculty of Sci Eng & Built Env

**Adress:** Locked Bag 20000, Geelong, VIC 3220

**Phone:** +61 3 522 78380

**Email:** reda.bouadjenek@deakin.edu.au

**www.deakin.edu.au**

<div>
<img style="float: left;" src="https://github.com/rbouadjenek/deakin-simpsons-challenge2020/blob/main/images/deakin2.png?raw=1" width="200" >
</div>
<br>
<br>
<br>
<br>

<div>  <a href="https://twitter.com/DeakinAI2021" > <img style="float: left;" src="https://irisconnect.com/uk/wp-content/uploads/sites/3/2020/12/twitter-Follow-us-button.png" width="200" > </a>
</div>