<h1 style="color:blue;">CIFAR-10 example using Keras</h1>

<b>This notebook will show you how to create, train and evaluate a small convolution network to work on the CIFAR-10 dataset.

The first thing we will do is the usual housekeeping ..import the required python libraries and set up directories. There's no machine learning specific code required here, just plain vanilla Python (..I use Python3)</b>

In [None]:
import shutil
import os
from pathlib import Path
import tensorflow as tf
import numpy as np


TB_LOG_DIR = Path('tb_logs')
CHKPT_DIR = Path('chkpts')

# create a directory for the TensorBoard logs
if (os.path.exists(TB_LOG_DIR)):
    shutil.rmtree(TB_LOG_DIR)
os.makedirs(TB_LOG_DIR)
print("Directory " , TB_LOG_DIR ,  "created ") 

# create a directory for the checkpoints
if (os.path.exists(CHKPT_DIR)):
    shutil.rmtree(CHKPT_DIR)
os.makedirs(CHKPT_DIR)
print("Directory " , CHKPT_DIR ,  "created ")

<h2 style="color:blue;">Data wrangling</h2>

<b>Next, we download the dataset. Keras conveniently provides the CIFAR-10 dataset and functions for loading it.

The dataset is already split into training and test data - 50k images & labels for training, 10k images & labels for test.</b>

In [None]:
(X_train, Y_train), (X_test, Y_test) = tf.keras.datasets.cifar10.load_data()

<b>The 'images' are actually numpy arrays, with a shape of (28,28) and datatype uint8. This means the actual data values of each element of the arrays (i.e. the pixels') can have a value of 0 to 255. Let's scale them back to range 0 to 1.0.  Note that dividing by 255.0 converts the array elements from integer to float.</b>

In [None]:
X_train = X_train / 255.0
X_test = X_test / 255.0

<b>For convenience, we'll create a list of labels for the 10 categories of image in the CIFAR-10 dataset. We 'll use it later when making predictions.</b>

In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

<b>Now for the training parameters. We'll set up the batch size to be 128, a learning rate of 0.0001 and decay rate of 1e-6 for the Adaptive Momentum optimizer.

The maximum number of epochs is set to 250, but we are unlikely to reach this limit due to the Early Stop call back which we will see later.

You are encourged to modify these parameters to see what effect they have on the final accuracy.</b>

In [None]:
BATCHSIZE = 128
LEARN_RATE = 0.0001
DECAY_RATE = 1e-6

EPOCHS = 3
#EPOCHS = 250

<h2 style="color:blue;">Define the sequential model</h2>

<b>
This next section creates our CNN. It is a Keras Sequential model and built up of layers.

Note how we need to define the shape of the input to the first layer, the others are automatically calculated.
</b>

In [None]:
model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1024, activation='relu'))
model.add(tf.keras.layers.Dense(10, activation='softmax'))

<b>Keras makes it easy to print out a summary of our network model...</b>

In [None]:
print(model.summary())
print("Model Inputs: {ips}".format(ips=(model.inputs)))
print("Model Outputs: {ops}".format(ops=(model.outputs)))

<h2 style="color:blue;">Callbacks</h2>

<b>..and now for the callbacks. These will be used during training.
The first callback sets up the TensorBoard logging.
The second one sets a limit for the training and will stop it if the loss doesn't improve by the value of min_delta (0.001 in this case) for at least 3 epochs.
The third callback defines where the checkpoint will be saved.</b>

In [None]:
# create Tensorboard callback
tb_call = tf.keras.callbacks.TensorBoard(log_dir=TB_LOG_DIR,
                                         histogram_freq=10,
                                         batch_size=BATCHSIZE,
                                         write_graph=True)

# Early stop callback
earlystop_call = tf.keras.callbacks.EarlyStopping(min_delta=0.001,
                                                patience=3)

# checkpoint save callback
chk_call = tf.keras.callbacks.ModelCheckpoint(filepath=(os.path.join(CHKPT_DIR,'checkpoint.h5')), save_best_only=True)

<h2 style="color:blue;">Training, evaluation, prediction</h2>

<b>The .compile method defines the learning process by setting the type of optimizer (Adaptive Momentum in this case) and its parameters such as learning rate and decay rate and the metric that it needs to optimize.</b>

In [None]:
# set up the sequential model learning process
model.compile(loss='categorical_crossentropy', 
              optimizer=tf.keras.optimizers.Adam(lr=LEARN_RATE,
                                                 decay=DECAY_RATE),
                                                 metrics=['accuracy'])

<b>The .fit method trains the model for a certain number of epochs.
The validation data will be used to evaluate the model metrics at the end of each epoch.
On both the training and test labels, we use the Keras .to_categorical() method to convert the scalar values to one-hot encoded vectors.
Note that the callbacks we set up earlier are used here.</b>

In [None]:
model.fit(x=X_train,
          y=tf.keras.utils.to_categorical(Y_train),
          batch_size=BATCHSIZE,
          shuffle=True,
          epochs=EPOCHS,
          validation_data=(X_test, tf.keras.utils.to_categorical(Y_test)),
          callbacks=[earlystop_call,tb_call,chk_call])

<b>The .evaluate method will use the supplied dataset to evaluate the trained model.</b>

In [None]:
scores = model.evaluate(x=X_test,
                        y=tf.keras.utils.to_categorical(Y_test))

print('Loss: %.3f' % scores[0])
print('Accuracy: %.3f' % scores[1])

<b>..and finally the .predict method will use the trained model to make some predictions - this would best be done using 'previously unseen' validation data, but here I'm just using the first 10 images from the test dataset.</b>

In [None]:
print("\nLet's make some predictions with the trained model..\n")
predictions = model.predict(X_test)

# each prediction is an array of 10 values
# the max of the 10 values is the model's 
# highest "confidence" classification
# use numpy argmax function to get highest of the set of 10

for i in range(10):
    pred=class_names[np.argmax(predictions[i])]
    actual=class_names[(Y_test[i][0])]
    print("Sample {index} in the test set is: {pred}".format(index=i, pred=pred))
    print("Sample {index} in test set actually is: {actual}".format(index=i, actual=actual))