<a href="https://colab.research.google.com/github/nyp-sit/iti107/blob/main/session-1/first_cnn_for_image_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# First Convolutional Neural Network for Image Classification

In this exercise, you will learn to build your first simple Convolutional Neural Network and use it to classify images. 

You will learn: 
- how to construct a Convolutional Neural Networks 
- adjust the different hyper-parameters of the network (e.g. number of filters, number of layers, etc) and observe the effects 
- how to visualize the activations of the hidden layers 


## Fashion MNIST Dataset

We will be using a toy dataset [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset which contains 70,000 grayscale images in 10 categories. 

![fashion-mnist](https://github.com/nyp-sit/sdaai-iti107/blob/main/session-1/images/fashion-mnist.png?raw=1)

The images are 28x28 NumPy arrays, with pixel values ranging from 0 to 255. The *labels* are an array of integers, ranging from 0 to 9. These correspond to the *class* of clothing the image represents:

|Label|Class|
|---|---|
|0|T-shirt/top|
|1|Trouser|
|2|Pullover|
|3|Dress|
|4|Coat|
|5|Sandal|
|6|Shirt|
|7|Sneaker|
|8|Bag|
|9|Ankle boot|       

Let's load the data using `keras.datasets` as it is part of datasets available from keras.
For a list of dataset available from keras, see https://www.tensorflow.org/api_docs/python/tf/keras/datasets



In [None]:
%load_ext tensorboard

In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np

mnist = keras.datasets.fashion_mnist
(training_images, training_labels), (validation_images, validation_labels) = mnist.load_data()
print('Shape of training_images = {}'.format(training_images.shape))
print('Shape of validation_images = {}'.format(validation_images.shape))

Note that the data is in numpy arrays and not tensor. 

In [None]:
print(type(training_images))

## Preprocess the images

You need to preprocess the image before using it as the input to the CNN.
CNN expects our input to be of the shape (batch, height, width, channels). Furthermore, the pixel values of the original image is in the range (0,255). Neural network will learn better if the input values are normalized to between (0.0, 1.0). 

In [None]:
# reshape to a 4-D tensors, with number of channel as 1, since this is a gray scale image
training_images = np.expand_dims(training_images, axis=3)
validation_images = np.expand_dims(validation_images, axis=3)

# scale the input to between 0. and 1.0
training_images = training_images / 255.0
validation_images = validation_images / 255.0

## Build your first CNN

A typical CNN consists of 1 or more blocks of Conv2D layer followed by MaxPooling2D layer. The 2D array from the last convolutional block will then be flattened into 1D array before feeding into Dense (fully connected) layer for classification. The last layer uses `softmax` to ouput the probabilities of each of the 10-classes. Note that the last layer has to have same number of output units as the number of classes (in our case, we have 10 classes, so we need 10 output units). 

In [None]:
def make_model(input_shape, num_classes):

    # define the input layer with appropriate shape
    inputs = keras.layers.Input(shape=input_shape, name='input')
    x = keras.layers.Conv2D(32, 3, activation='relu', name='conv1')(inputs)
    x = keras.layers.MaxPooling2D(2, name='pool1')(x)
    x = keras.layers.Conv2D(64, 3, activation='relu', name='conv2')(x)
    x = keras.layers.MaxPooling2D(2, name='pool2')(x)
    x = keras.layers.Flatten()(x)
    x = keras.layers.Dense(128, activation='relu', name='dense1')(x)

    if num_classes > 2: 
        activation = 'softmax'
        units = num_classes
    else: 
        activation = 'sigmoid'
        units = 1

    outputs = keras.layers.Dense(units, activation=activation, name='dense2')(x)
    
    return keras.Model(inputs, outputs)
        

# call make_model with appropriate argument values (shape and num_classes)
model = make_model((28,28,1), 10)

# compile your model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

Look at the model summary carefully and make sure you understand why the output shape is as shown and also how to calculate the number of parameters. 

In [None]:
model.summary()

## Train the model

Let's first define a convenience method to create a Tensorboard callback to log the training events. We will also create a ModelCheckpoint callback to save the best-performing set of weights.

In [None]:
def create_tb_callback(): 

    import os
    
    root_logdir = os.path.join(os.curdir, "tb_logs")

    def get_run_logdir():    # use a new directory for each run
        
        import time
        
        run_id = time.strftime("run_%Y_%m_%d-%H_%M_%S")
        
        return os.path.join(root_logdir, run_id)

    run_logdir = get_run_logdir()

    tb_callback = tf.keras.callbacks.TensorBoard(run_logdir)

    return tb_callback

model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath="best_checkpoint",
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [None]:
model.fit(training_images, 
          training_labels, 
          batch_size=256, 
          epochs=30,
          validation_data=(validation_images, validation_labels),
          callbacks=[create_tb_callback(), model_checkpoint_callback])


In [None]:
model.load_weights('best_checkpoint')
model.evaluate(validation_images, validation_labels)

## Visualize the training and validation loss

In [None]:
%tensorboard --logdir tb_logs

We can see that model achieves training accuracy of 98% but the validation accuray stagnates at 92%. So there is some overfitting here. You can try to improve the model by adding in some regularization such as Dropout layer, etc. 


## Visualizing the Convolutions and Pooling

It is often said that deep learning network is a blackbox. However, this is certainly not true for Convnets. The representations learnt by Convnets are highly interpretable, as they are representations of visual concepts. 

The following codes allows us to visualize the output of the feature maps learnt by Convnet. By looking at output (activations) of these feature maps, for different kind of images, we will understand how a specific image is being classified. 


Let's first print out the labels of the first 10 test labels.

In [None]:
print(validation_labels[:10])

Let us look two different images, image 0 with label 9 (ankle boot) and image 2 with label 1 (trouser).

In [None]:
import matplotlib.pyplot as plt 

ANKLE_BOOT_IDX = 0
TROUSER_IDX = 2

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.imshow(validation_images[ANKLE_BOOT_IDX].reshape(28,28))
ax2.imshow(validation_images[TROUSER_IDX].reshape(28,28))

Let's create activation model for each individual layer.

In [None]:
import matplotlib.pyplot as plt
import pprint

# extract the outputs of layer 1 to  layer 5 (only the Conv2D, MaxPooling2D layers)
layer_outputs = [layer.output for layer in model.layers][1:5]
pprint.pprint(layer_outputs)

# create activation models that will return these outputs given the model input
activation_model_conv1 = keras.Model(inputs=model.input, outputs=layer_outputs[0])
activation_model_pool1 = keras.Model(inputs=model.input, outputs=layer_outputs[1])
activation_model_conv2 = keras.Model(inputs=model.input, outputs=layer_outputs[2])
activation_model_pool2 = keras.Model(inputs=model.input, outputs=layer_outputs[3])

Let's look at activations from the 1st Conv2D layer for both images. There are 32 filter maps from the 1st Conv layer, but we going to look at only the first 10.

In [None]:
fig, axarr = plt.subplots(2, 10, figsize=(20, 4))
ankle_boot_activations_conv1 = activation_model_conv1.predict(validation_images[ANKLE_BOOT_IDX].reshape(1, 28, 28, 1))
trouser_activations_conv1 = activation_model_conv1.predict(validation_images[TROUSER_IDX].reshape(1, 28, 28, 1))

for filter_idx in range(0, 10):
    axarr[0, filter_idx].imshow(ankle_boot_activations_conv1[0,:,:, filter_idx])
    axarr[1, filter_idx].imshow(trouser_activations_conv1[0,:,:,filter_idx])

plt.show()

From the plots, we can see that 1st Conv layer seems to act as detector of lines and edges. Some filters act more like vertical line detectors, whereas some filters detect edges of the shape.

Your filter output may not be the same as we have shown here as the specific filters learnt by the Conv layer are not deterministic.

Now let's examine the activations from the 2nd Convolutional layer. Again we will only display the output from the first 10 filters.

You will observe that the outputs seems to be more abstract and seems to detect a higher-level construct, such a the presence of certain part of the object (e.g. the collar part of the boot)


In [None]:
fig, axarr = plt.subplots(2,10, figsize=(20,4))

ankle_boot_activations_conv2 = activation_model_conv2.predict(validation_images[ANKLE_BOOT_IDX].reshape(1, 28, 28, 1))
trouser_activations_conv2 = activation_model_conv2.predict(validation_images[TROUSER_IDX].reshape(1, 28, 28, 1))

for filter_idx in range(0, 10):
    axarr[0, filter_idx].imshow(ankle_boot_activations_conv2[0,:,:, filter_idx])
    axarr[1, filter_idx].imshow(trouser_activations_conv2[0,:,:,filter_idx])

plt.show()

Now, let's examine the activations from the last max-pooling layer for both images. We will just display the first 10.  What do you observe?

The MaxPooling2D just highlight or emphasize more sharply the abstract part detected by the Conv layer. 

In [None]:
fig, axarr = plt.subplots(2,10, figsize=(20,4))

ankle_boot_activations_pool2 = activation_model_pool2.predict(validation_images[ANKLE_BOOT_IDX].reshape(1, 28, 28, 1))
trouser_activations_pool2 = activation_model_pool2.predict(validation_images[TROUSER_IDX].reshape(1, 28, 28, 1))

for filter_idx in range(0, 10):
    axarr[0, filter_idx].imshow(ankle_boot_activations_pool2[0,:,:, filter_idx])
    axarr[1, filter_idx].imshow(trouser_activations_pool2[0,:,:,filter_idx])

plt.show()