# Part 2: Dogs-vs-Cats training, validation, predictions and saving

We prepared the dataset in the [Part 1 NoteBook](Part1_dataset_prep.ipynb) so that it is compatible with the `.flow_from_directory()` method.

Now we are ready to start training. We begin by importing the Keras modules that are required...

+ `import Adam` - we will be usng the Adaptive Momentum optimizer.
+ `import binary_crossentropy` - there are only two classes, so we can use binary cross-entropy rather than softmax cross-entropy.
+ `import TensorBoard, EarlyStopping` - these callback will be used to gather TensorBoard data and to stop training if the validation accuracy no longers increases over a set number of epochs.
+ `import ImageDataGenerator` - the ImageDataGenerator will produce batches of augmented data.

In [None]:
import keras
from keras.optimizers import Adam
from keras.losses import binary_crossentropy
from keras.callbacks import TensorBoard, EarlyStopping
from keras.preprocessing.image import ImageDataGenerator

import numpy as np
import pandas as pd
import os
import shutil
import sys

from customCNN import customCNN


# Silence TensorFlow messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

Next we set up the folders for storing the trained Keras model, the TensorBoard logs and the augmented images and delete any previous results..

In [None]:
SCRIPT_DIR = os.getcwd()
print('This script is located in: ', SCRIPT_DIR)

TRAIN_DIR = os.path.join(SCRIPT_DIR, 'dataset/train')
VALID_DIR = os.path.join(SCRIPT_DIR, 'dataset/valid')
TEST_DIR = os.path.join(SCRIPT_DIR, 'dataset/test')

# Augmented images folder
AUG_IMG_DIR = os.path.join(SCRIPT_DIR,'aug_img')

# Keras model folder
KERAS_MODEL_DIR = os.path.join(SCRIPT_DIR, 'keras_model')

# TensorBoard folder
TB_LOG_DIR = os.path.join(SCRIPT_DIR, 'tb_logs')

# remove previous results and recreate folders
dir_list = [KERAS_MODEL_DIR, TB_LOG_DIR, AUG_IMG_DIR]
 
for dir in dir_list: 
    if (os.path.exists(dir)):
        shutil.rmtree(dir)
    os.makedirs(dir)
    print("Directory" , dir ,  "created ")

if (os.path.exists('results.csv')):
    os.remove('results.csv')

The training parameters are set here...note that we are very unlikely to reach 100 epochs due to the Early Stopping callback.

In [None]:
EPOCHS = 100

# batchsizes for training & validation, batchsize for prediction is 1
TRAIN_BATCHSIZE = 64
VAL_BATCHSIZE = 64

# optimizer learning rate & decay rate
LEARN_RATE = 0.0001
DECAY_RATE = LEARN_RATE/10.0

The real-time image augmentation includes a resizing of the images. We set the image size to be 200 x 200 pixels. All images will be resized before being used in training, validation and prediction. Note that the orignal images in the Kaggle dataset are of differing sizes and are not usually square. Resizing the images in this way will lead to some distortion.

In [None]:
IMAGE_HEIGHT = 200
IMAGE_WIDTH = 200

The description of our CNN is contained in the customCNN.py script and uses the Keras Functional API. The CNN is *fully-convolutional* - the dense or fully-connected layers have been replaced with convolutional layers that have their kernel sizes, number of filters and stride lengths set such that they create output shapes that mimic the output shapes of dense/FC layers.

There are no pooling layers - these have also been replaced with convolutional layers that have their kernel size and strides set to the same value which is > 1.

The output activation layer is a sigmoid function as we only have two classes - if the output of the sigmoid is > 0.5, the predicted class is 'dog', less that 0.5 is a prediction of 'cat'.

The CNN has deliberately been kept simple (it only has 8 convolutional layers) so the expected prediction accuracy will not be higher than approximately 90%. To reduce overfitting, batch normalization layers have been used and also L2 kernel regularization.

In [None]:
model = customCNN(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3))

# print a summary of the model
print(model.summary())
print("Model Inputs: {ips}".format(ips=(model.inputs)))
print("Model Outputs: {ops}".format(ops=(model.outputs)))

Next we declare two instances of the ImageDataGenerator class. The first, `datagen_tv`, will perform image augmentation for training and validation. The second, `datagen_p` for prediction.

The image augmentation for training and validation is performed on-the-fly and is composed of:

+ the 8bit pixel data is normalized from 0:225 to the range 0:1.0
+ a random rotation of 5° max
+ random horizontal flipping i.e flipping about the vertical axis to produce a mirror image.
+ random horizontal and vertical shifts of the image by 10% of the image size (200 x 200 in this case).

We only use pixel normalization for the prediction augmentation.

In [None]:
datagen_tv = ImageDataGenerator(rescale=1/255,
                                rotation_range=5,
                                horizontal_flip=True,
                                height_shift_range=0.1,
                                width_shift_range=0.1
                                )

# data generation for prediction - only rescaling
datagen_p = ImageDataGenerator(rescale=1/255)

In [None]:
# train generator takes images from the specified directory, applies
# a resize to 200x200 with bilinear interpolation.
train_generator = datagen_tv.flow_from_directory(TRAIN_DIR,
                                                 target_size=(IMAGE_HEIGHT, IMAGE_WIDTH),
                                                 interpolation='bilinear',
                                                 batch_size=TRAIN_BATCHSIZE,
                                                 class_mode='binary',
                                                 shuffle=True,
                                                 seed=42
                                                 )
'''
uncomment save_to_dir=AUG_IMG_DIR to save the augmented images
note that this will take up considerable disk space
'''
validation_generator = datagen_tv.flow_from_directory(VALID_DIR,
                                                      target_size=(IMAGE_HEIGHT, IMAGE_WIDTH),
                                                      interpolation='bilinear',
                                                      batch_size=VAL_BATCHSIZE,
                                                      class_mode='binary',
                                                      shuffle=True,
                                                    # save_to_dir=AUG_IMG_DIR
                                                      )


prediction_generator = datagen_p.flow_from_directory(TEST_DIR,
                                                     target_size=(IMAGE_HEIGHT, IMAGE_WIDTH),
                                                     interpolation='bilinear',
                                                     batch_size=1,
                                                     class_mode='binary',
                                                     shuffle=False)



##############################################
# Compile model
##############################################
# Adam optimizer to change weights & biases
# Loss function is binary crossentropy
model.compile(optimizer=Adam(lr=LEARN_RATE, decay=0.0),
              loss='binary_crossentropy',
              metrics=['binary_accuracy'])

Here we set up two Callbacks. The first is for collecting TensorBoard data. The second defines a means for halting the training if the validation accuracy stops improving for 5 epochs. Once training stops, the model parameters from the epoch that gave the best results in terms of validation accuracy are restored.

In [None]:
# create Tensorboard callback
tb_call = TensorBoard(log_dir=TB_LOG_DIR,
                      batch_size=TRAIN_BATCHSIZE)

earlystop_call = EarlyStopping(monitor='val_binary_accuracy',
                               mode='max',
                               min_delta=0.0001,
                               patience=5,
                               restore_best_weights=True)


callbacks_list = [tb_call, earlystop_call]

Now we run the training.

In [None]:
# calculate number of steps in one training epoch
train_steps = train_generator.n//train_generator.batch_size

# calculate number of steps in one validation epoch
val_steps = validation_generator.n//validation_generator.batch_size

# run training
train_history=model.fit_generator(generator=train_generator,
                                  epochs=EPOCHS,
                                  steps_per_epoch=train_steps,
                                  validation_data=validation_generator,
                                  validation_steps=val_steps,
                                  callbacks=callbacks_list,
                                  shuffle=True)


print("\nTo open TensorBoard: tensorboard --logdir={dir} --host localhost --port 6006".format(dir=TB_LOG_DIR))

After training has finished, we can run a final evaluation using the validation set. The data used comes from the validation generator and we run the complete validation set for 1 epoch. `evaluate_generator` returns the evaluation loss and accuracy.

In [None]:
scores = model.evaluate_generator(generator=validation_generator,
                                  max_queue_size=10,
                                  steps=val_steps,
                                  verbose=1)

print ('Evaluation Loss    : ', scores[0])
print ('Evaluation Accuracy: ', scores[1])

As an extra, optional step we can make predictions using the trained model. The predict_generator returns a list of predictions that it makes from the data fed to the prediction_generator - the test dataset in this case.

In [None]:
# reset the generator before using it for predictions
prediction_generator.reset()

# calculate number of steps for prediction
predict_steps = prediction_generator.n

# predict generator returns a list of all predictions
pred = model.predict_generator(generator=prediction_generator,
                               steps=predict_steps,
                               verbose=1)

We don't have a list of labels that match the data sent to the prediction generator, so we need to extract the 'ground truth' labels from the filenames in the test dataset. We do this by first creating a list of the filenames that were used during prediction - it is important to understand that this list will be in the order in which the image files were applied to the prediction generator.

In [None]:
# get a list of image filenames used in prediction
filenames = prediction_generator.filenames

Another attribute of the prediction generator, `.class_indices`, will give us a dictionary where the classes are the keys. `{'cat': 0, 'dog': 1}`

We then swap the keys and values so that we can use the predictions to index the dictionary and get back the associated class.

In [None]:
# the .class_indices attribute returns a dictionary with keys = classes
labels = (prediction_generator.class_indices)
print(labels)

# make a new dictionary with keys & values swapped 
labels = dict((v,k) for k,v in labels.items())
print(labels)

Now we can run through the list of predictions and decide f they are 'cat or 'dog'.  The values in the list of predictions come from the sigmoid activation function, so they are floating-point values between 0 and 1.  Any value < 0.5 is class 0 and any value above 0.5 is class 1.

In [None]:
# use the 'labels dictionary to create a list of predictions as strings
# predictions is a list of sigmoid outputs
# if sigmoid output < 0.5, CNN predicted class 0
# if sigmoid output > 0.5, CNN predicted class 1
predictions = list()
for i in range(len(pred)):
    if pred[i] > 0.5:
        predictions.append(labels[1])
    else:   
        predictions.append(labels[0])

Now that we have a list of predictions that are strings (i.e. either 'cat or 'dog') we can compare this list to the ground truth labels (extracted from the filenames) to calculate accuracy.

In [None]:
# iterate over the list of predictions and compare to ground truth labels
# ground truth labels are derived from prediction filenames.
correct_predictions = 0
wrong_predictions = 0

for i in range (len(predictions)):

    # ground truth is first part of filename (i.e. the class folder)
    # will need to be modified to '\' for windows
    ground_truth, _ = filenames[i].split('/',1)

    # compare prediction to ground truth
    if predictions[i] == ground_truth:
        correct_predictions += 1
    else:
        wrong_predictions += 1

# calculate accuracy
acc = (correct_predictions/len(predictions)) * 100

print('Correct Predictions: ',correct_predictions)
print('Wrong Predictions  : ',wrong_predictions)
print('Prediction Accuracy: ',acc,'%')

The prediction results can also be captured in a .csv file:

In [None]:
# write filenames and associated predictions to .csv file
results = pd.DataFrame({"Filename":filenames,
                        "Predictions":predictions})
results.to_csv('results.csv',index=False)

print('\nPredictions and true labels saved to results.csv')

The trained model is saved in the directory pointed to by the variable `KERAS_MODEL_DIR`. The model weighst and biases are stored in an HDF5 format file called 'k_model_weights.h5'. The architecture (without weights) is stored in a JSON file called 'k_model_architecture.json'.

In [None]:
# save just the weights (no architecture) to an HDF5 format file
model.save_weights(os.path.join(KERAS_MODEL_DIR,'k_model_weights.h5'))

# save just the architecture (no weights) to a JSON file
with open(os.path.join(KERAS_MODEL_DIR,'k_model_architecture.json'), 'w') as f:
    f.write(model.to_json())

print('\nTrained model saved to {dir}'.format(dir=KERAS_MODEL_DIR))