# CNN Exercise 2: The Kaggle Cats and dogs challenge

## Introduction

In this exercise we will try to solve an actual challenge, namely the Kaggle Cats and Dogs challenge https://www.kaggle.com/c/dogs-vs-cats. The challenge is to determine whether it is a cat or a dog that is in the image. As we know now, this is not straightforward using "classical" machine learning approaches. However, convolutional neural networks provide improved performance coupled with spatial invariance, which solves major previous obstacles. Several of the submitted solutions can be investigated here: https://www.kaggle.com/c/dogs-vs-cats/kernels. For example, there is one with a good explanation of CNNs, if you need a recap, here: https://www.kaggle.com/ruchibahl18/cats-vs-dogs-basic-cnn-tutorial.

One word of advise for this exercise is to be patient. Some of the models will take quite a while to train, unless you are using GPU acceleration. It is suggested that you implement the code whlie using a very limited number of epochs. Once all the code runs, set the epochs to, e.g., 100, and let your computer train overnight. This is similar to how working with deep learning often is. Do note, however, that the training procedure can be quite noisy, so even though you let it run for 5 epochs, you might not see much improvement. Try letting it train for 20-30 epochs and see if you witness improvements.

If you have an NVidia GPU, you can significantly reduce the training time by installing TensorFlow with GPU support. It can be a hassle though, and if you choose to do so, be absolutely sure that the version of TensorFlow works with the versions of CUDA and CuDNN you install (rqeuired libraries to support GPU acceleration). An alternative which might work is Google Colab (https://colab.research.google.com/notebooks/welcome.ipynb). I have not yet tried it out, so I do not know how much of a help it is, but you can take a look. They offer GPU acceleration for free.

You need to install split-folders (https://pypi.org/project/split-folders/)


In [None]:
# Scientific and vector computation for python
import numpy as np
np.random.seed(42)  # Set the global random seed to make reproducible experiments (scikit-learn also use this)

# Used for manipulating directory paths
import os

# Library to handle images
from PIL import Image, ImageOps

# Used to delete directories
import shutil

# Unzip files
import zipfile

# Split a directory of images into two directories containing train and test images
import split_folders

# Deep learning framework
from keras.models import Sequential  # Create models sequentially
from keras.layers import Dense, Dropout, BatchNormalization, Flatten, Conv2D, MaxPooling2D  # Relevant layers
from keras.optimizers import Adam  # Optimizer for gradient descent
from keras.backend import clear_session  # Delete previous models
from keras.preprocessing.image import ImageDataGenerator  # To feed the model with images during training

# Set the global random seed for TensorFlow to make reproducible experiments
from tensorflow import set_random_seed
set_random_seed(42)

# Ignore warning for corrupt EXIF data in the images
import warnings
warnings.filterwarnings("ignore", "(Possibly )?corrupt EXIF data", UserWarning)

# Plotting library
import matplotlib.pyplot as plt  
%matplotlib inline

## 1 Get and preprocess dataset

The dataset can be downloaded here (put it in the same directory as the jupyter notebook): https://www.microsoft.com/en-us/download/details.aspx?id=54765. Then execute the code below. 

In [None]:
# Check if data has already been processed
if not os.path.exists(os.path.join('Data', 'kagglecatsanddogs', 'processed')):
    print("Processing data.")
    
    # Delete any previous 'kagglecatsanddogs' datasets, if they should be present
    try:
        shutil.rmtree(os.path.join('Data', 'kagglecatsanddogs'))
    except:
        pass

    # Unzip images
    with zipfile.ZipFile("kagglecatsanddogs_3367a.zip","r") as zip_ref:
        zip_ref.extractall(os.path.join('Data', 'kagglecatsanddogs', 'raw'))

    # Remove two corrupt images
    os.remove(os.path.join('Data', 'kagglecatsanddogs', 'raw', 'PetImages', 'Cat', '666.jpg'))
    os.remove(os.path.join('Data', 'kagglecatsanddogs', 'raw', 'PetImages', 'Dog', '11702.jpg'))

    # Split dataset into train and test set
    split_folders.ratio(os.path.join('Data', 'kagglecatsanddogs', 'raw', 'PetImages'), 
                        output=os.path.join('Data', 'kagglecatsanddogs', 'processed'), 
                        seed=42, 
                        ratio=(.7, 0, .3))
else:
    print("It seems like the data has already been processed.")

## 2 Create your own CNN in Keras to classify cats/dogs

First let us set some hyperparameters. One important note regarding epochs in this exercise is that it is implemented such that an epoch only goes through 1/50 of the available data. This is opposed to the general definition of an epoch, which is defined by training through the entire dataset once. The reason for the difference here is because we want Keras go through test samples and save the accuracies and loss more often than only once per "actual" epoch.

In [None]:
# Dataset details
train_samples = 17498
test_samples = 7500

# Hyperparameters
img_width, img_height = 128, 128  # Size you want to rescale images to
batch_size = 32
epochs = 30  # In this exercise, 50 epochs corresponds to training on the entire dataset once

In this exercise, we are not loading the data into the computer's RAM, which we have done in all exercises until now. This is to demonstrate how we could facilitate training of deep learning models on hundreds of gigabytes of data which have to be stored on the SSD. We will use the ImageDataGenerator class, which have implemented a method to fetch images from directories on the SSD. If you are not familiar with generators in Python, then do not worry, as they are not very relevant for this exercise. Just think of them as functions which provide you a new batch of images everytime you call them.

The ImageDataGenerator class also has built-in functionality for performing real-time data augmentation. Data augmentation is the concept of applying transformations to your training data, to make them "appear" as new training data. For instance, if you flip an image of a dog horizontally, the neural network will see it as a new dog, and not recognize that it is simply a flipped version of the initial image. So much for neural networks being highly sophisticated AI machines. Jokes aside, data augmentation is a powerful ally when training deep neural networks, especially if you do not have massive amounts of training data. In this exercise, however, we will not investigate it further.   

What you should note is that in the entirety of this exercise, cats are assigned to class 0 and dogs are assigned to class 1. So if the classifier outputs 0, it means that it is completely confident that there is a cat in the image. Likewise, if it outputs 1, it is completely confident that there is a dog in the image.

In [None]:
# Use data augmentation during training
datagen = ImageDataGenerator(rescale=1./255)  # Re-scale images to pixel values between 0 and 1


train_generator = datagen.flow_from_directory('Data/kagglecatsanddogs/processed/train',
                                              target_size=(img_width, img_height),
                                              batch_size=batch_size,
                                              class_mode='binary')

test_generator = datagen.flow_from_directory('Data/kagglecatsanddogs/processed/test',
                                             target_size=(img_width, img_height),
                                             batch_size=batch_size,
                                             class_mode='binary')

# Plot some examples from the training dataset
plt.figure(figsize=(15, 15))
for X_batch, Y_batch in train_generator:
    for i in range(0, 20):
        plt.subplot(5, 4, i+1)
        image = X_batch[i]
        plt.title("Belongs to class " + str(int(Y_batch[i])))
        plt.axis('off')
        plt.imshow(image)
    break  # Break the for loop after a single batch
plt.tight_layout()
plt.show()

## 3 Define models

### 3.1 Your own CNN from scratch

Try defining a reasonable CNN architecture for the task at hand. Remember that this challenge is to built a binary classifier, so the last layer should be a dense layer with 1 neuron, employing the sigmoid activation function.

In [None]:
clear_session()  # Delete any existing models

# Define model
# ====================== YOUR CODE HERE =======================


# =============================================================

model.compile(optimizer=Adam(lr=1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

Now try training the classifier you just made.

In [None]:
history = model.fit_generator(train_generator,
                              steps_per_epoch=train_samples/batch_size/50,
                              epochs=epochs,
                              validation_data=test_generator,
                              validation_steps=test_samples/batch_size/50)

model.save("model_Custom.h5")

Let us plot the training history.

In [None]:
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

As you probably realized, this training procedure can take a long time! Once you feel ready to let your computer train for longer, go back to the beginning of Section 2, where we set our hyperparameters, and increase the number of epochs which you want to train for. You might need to train for hundreds of epoch before the model converges to a good solution.

### 3.2 Use VGG16 pre-trained feature extractor 

Remember that the convolutional layers in a CNN acts as feature extractor, which transforms the input image to an improved data representation for the classifier to function optimally. So instead of training our neural network from scratch, it would make sense that we took a pre-trained feature extractor from another architecture, and then add a neural network classifier on top of it. In the code below, a trained version of the feature extractor from the VGG16 architecture (https://arxiv.org/pdf/1409.1556.pdf) is loaded. It is made trainable, so once you train your entire model, the feature extractor will also be fine-tuned. Your task is now to add a neural network classifier on top of the feature extractor, with two hidden layers, each with 256 neurons, using ReLU activation functions. Remember that the last layer should be a dense layer with 1 neuron employing the sigmoid activation funciton.

In [None]:
from keras.applications.vgg16 import VGG16
clear_session()  # Delete any existing models

model = Sequential()

feature_extractor = VGG16(weights='imagenet',  # Use weights trained on ImageNet
                          include_top=False,  # Do not include the classifier from the model
                          input_shape=(img_width, img_height, 3))

feature_extractor.trainable=True  # Make it trainable, such that it will fine-tune as you train

model.add(feature_extractor) 
model.add(Flatten())
# ====================== YOUR CODE HERE =======================


# =============================================================

model.compile(loss='binary_crossentropy', 
              optimizer=Adam(lr=1e-5), # The model will not converge if a larger learning rate is chosen
              metrics=['acc'])

model.summary()

Now try training the classifier you just made.

In [None]:
history_vgg16 = model.fit_generator(
                            train_generator,
                            steps_per_epoch=train_samples/batch_size/50,
                            epochs=epochs,
                            validation_data=test_generator,
                            validation_steps=test_samples/batch_size/50)

model.save("model_VGG16.h5")

Let us plot the training history.

In [None]:
# Plot training & validation accuracy values
plt.plot(history_vgg16.history['acc'])
plt.plot(history_vgg16.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history_vgg16.history['loss'])
plt.plot(history_vgg16.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

Hopefully, you will se that this approach trains faster. If you would like, you can take a look at the VGG16 feature extractor by running feature_extractor.summary():

In [None]:
feature_extractor.summary()

### 3.3 Use pre-trained MobileNet feature extractor

Similar to before, we will now add a classifier on top of the MobileNet architecture. This architecture is a tremendously fast classifier, made for mobile phones. If you have plenty of time, you can take a look at [TensorFlow Lite](https://www.tensorflow.org/lite), and see how MobileNet can be used to continously [classify what it sees on the back camera of your phone](https://github.com/tensorflow/examples/blob/master/lite/examples/image_classification/android/README.md). 

Your task is, like before, to add a neural network classifier on top of the feature extractor, with two hidden layers, each with 256 neurons, using ReLU activation functions. The last layer should be a dense layer with 1 neuron employing the sigmoid activation funciton.

In [None]:
from keras.applications.mobilenet import MobileNet
clear_session()  # Delete any existing models

model = Sequential()

feature_extractor = MobileNet(weights='imagenet',  # Use weights which have been pre-trained on imagenet
                          include_top=False,  # Do not include the classification part of the model
                          input_shape=(img_width, img_height, 3))

feature_extractor.trainable=True 
model.add(feature_extractor) 
model.add(Flatten())

# ====================== YOUR CODE HERE =======================


# =============================================================

model.compile(loss='binary_crossentropy', 
              optimizer=Adam(lr=1e-5),  # Adam has a quite large default learning rate... 
              metrics=['acc'])

model.summary()

Now try training the classifier you just made.

In [None]:
history_mobilenet = model.fit_generator(train_generator,
                              steps_per_epoch=train_samples/batch_size/50,
                              epochs=epochs,
                              validation_data=test_generator,
                              validation_steps=test_samples/batch_size/50)

model.save("model_MobileNet.h5")

Let us plot the training history.

In [None]:
# Plot training & validation accuracy values
plt.plot(history_mobilenet.history['acc'])
plt.plot(history_mobilenet.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history_mobilenet.history['loss'])
plt.plot(history_mobilenet.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

## 4 Investigate trained models

Try running the code below, using one of your three newly trained models ("Custom", "VGG16", or "MobileNet"). You should now also be able to predict images downloaded from the internet. Try downloading some .jpg images of cats and dogs, and put them in the 'Data/downloaded_samples' directory, and see if your model can recognize whether they are cats or dogs. Remember that a score of 0 corresponds to absolute certainty that it is a cat, and a score of 1 corresponds to absolute certainty that it is a dog.

In [None]:
from keras.models import load_model
clear_session()  # Delete any existing models

# ====================== YOUR CODE HERE =======================


# =============================================================

# Load the saved model
print("Loading the " + model_name + " model.")
model = load_model("model_" + model_name + ".h5")

# Load a batch of images using the test_generator and predict them using your model
print("Predicting images using the test generator:")
plt.figure(figsize=(15, 15))
for X_batch, Y_batch in test_generator:
    # Predict cats and dogs in the batch
    predictions = model.predict(X_batch)
    
    # Plot the images and the predictions
    for i in range(0, 20):
        plt.subplot(5, 4, i+1)
        image = X_batch[i]
        plt.axis('off')
        plt.imshow(image)
        if predictions[i] > 0.5:
            prediction = "Dog"
        else:
            prediction = "Cat"
        plt.title("Prediction: " + prediction + " with score " + str(predictions[i][0].round(2)))
    break  # Break the for loop after a single batch
plt.tight_layout()
plt.show()

# Predict downloaded images
filenames = os.listdir(os.path.join('Data', 'downloaded_samples'))  # Find filenames of downloaded images
print("Predicting downloaded images:")
for filename in filenames:    
    # Predict and show image
    image = Image.open(os.path.join('Data', 'downloaded_samples', filename))
    
    # Resize to 128x128 pixels
    image = ImageOps.fit(image, (128, 128), Image.ANTIALIAS)
    
    # Scale to values between 0 and 1
    image_np = np.array(image)/255
    prediction_score = model.predict(image_np.reshape(1, img_width, img_height, 3))
    
    # Plot image and prediction
    plt.figure()
    plt.imshow(image)
    plt.axis('off')
    plt.tight_layout()
    
    if prediction_score > 0.5:
        prediction = "Dog"
    else:
        prediction = "Cat"
    plt.title("Prediction: " + prediction + " with score " + str(prediction_score[0][0].round(2)))
    
    plt.show()

This concludes CNNs. We have skipped one important factor, however, which is the validation dataset. Once you want to do hyperparameter optimization, where you optimize over architecture choices, learning rate, dropout rate etc., you should split your data into training, validation, and test data. The validation data should be used to tune your choices for the model, like we did in this exercise. The test dataset should only be seen by the model in the very end, after hyperparameter optimization, and then be used for evaluation. Several datasets have not made the test data publicly available, and if you contest in a Kaggle competition, you will not get access to the test data. Instead, you submit your model, and they evaluate on data which is has never seen before.