The aim of this exercise is to create a simple MLP and CNN to solve image classification problems. You can compare the performance of the networks and understand the behaviour given what was presented in the lectures

In [None]:
import tensorflow
from tensorflow import keras
from keras.datasets import mnist, cifar10, cifar100
import numpy as np
import matplotlib.pyplot as plot

print('Tensorflow version:',tensorflow.__version__)

---

You can use the function below to load some of the simple datasets available directly from `keras`. There are three options for the `dataset_name` argument:
1. `mnist`: a dataset of handwritten digits from 0 - 9. These images are (28,28,1) in shape
2. `cifar10`: these are small colour images with shape (32,32,3) from ten different classes (plane, car, bird, cat, deer, dog, frog, horse, ship, truck)
3. `cifar100`: as above but with 100 classes! This will not be feasible to use with just a CPU as it would take a fairly complex network with many parameters. I have included it in case you want to play with this on a GPU one day

In [None]:
def load_dataset(dataset_name='mnist'):
  # MNIST, CIFAR10 and CIFAR100 are standard datasets we can load straight
  # from keras. The data are split between train and test sets automatically
  # - x_train is a numpy array that stores the training images
  # - y_train is a numpy array that stores the true class of the training images
  # - x_train is a numpy array that stores the testing images
  # - y_train is a numpy array that stores the true class of the testing images
  if dataset_name.lower() == 'cifar10':
    (x_train, y_train), (x_test, y_test) = cifar10.load_data()
    n_classes = 10
  elif dataset_name.lower() == 'cifar100':
    (x_train, y_train), (x_test, y_test) = cifar100.load_data()
    n_classes = 100
  elif dataset_name.lower() == 'mnist':
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    # MNIST is greyscale so we have to do a trick to add a depth dimension
    x_train = np.expand_dims(x_train, axis=-1)
    x_test = np.expand_dims(x_test, axis=-1)
    n_classes = 10
  else:
    print('Requested dataset does not exist. Please choose from mnist, cifar10 or cifar100')
    return

  # Let's check the shape of the images for convenience
  print("Shape of x_train =",x_train.shape)
  print("Shape of x_test =",x_test.shape)

  # The y_train and y_test values we loaded also need to be modified.
  # These values store the true classification of the images (0-9) as a single
  # number. We need to convert the single value into an array of length 10
  # corresponding to the number of output classes. Thus values of
  # y = 2 becomes y = [0,0,1,0,0,0,0,0,0,0]
  # y = 8 becomes y = [0,0,0,0,0,0,0,0,1,0]
  y_train = keras.utils.to_categorical(y_train, n_classes)
  y_test = keras.utils.to_categorical(y_test, n_classes)

  print("Shape of y_train =", y_train.shape)
  print("Shape of y_test =", y_test.shape)

  # Let's take a look at a few example images from the training set
  n_plots=5
  fig, ax = plot.subplots(1, n_plots)
  for plot_number in range (0, n_plots):
    ax[plot_number].imshow(x_train[plot_number])

  return (x_train, y_train), (x_test, y_test), n_classes

---

Here we use the `load_data` function to load our dataset. In the
first instance we will use `mnist` since it is the simplest dataset and we can use a very simple CNN.

In [None]:
# Load the input data.
# x_train is the training data, and y_train the corresponding true labels
# x_test is the testing data, and y_test the corresponding true labels
# We don't have a separate validation sample in these keras datasets
# Num_classes is the number of true classes
(x_train, y_train), (x_test, y_test), num_classes = load_dataset('mnist')

Before starting our CNN, let's make a simple MLP to see how well it does. MLPs consist of a number of fully connected (or dense) layers. We need to make sure that we flatten the input in this case since we have images. We'll make a network with three dense layers (256, 128 and 64 neurons) interspersed with dropout layers (fraction 0.25), and then the final dense layer for classification.

* Flatten layer: `keras.layers.Flatten()`

* Dense layer: `keras.layers.Dense(num_nodes, activation='relu')` where the num_nodes is how many neurons are in the layer. The final layer of the model needs have to have `num_nodes = num_classes`, and should use the `softmax` activation

* Dropout layer: `keras.layers.Dropout(fraction)`

Printing a summary of the network should give you the following:
```Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer)           │ (None, 28, 28, 1)           │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten_1 (Flatten)                  │ (None, 784)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_2 (Dense)                      │ (None, 256)                 │         200,960 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_1 (Dropout)                  │ (None, 256)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_3 (Dense)                      │ (None, 128)                 │          32,896 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_2 (Dropout)                  │ (None, 128)                 │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_4 (Dense)                      │ (None, 64)                  │           8,256 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_3 (Dropout)                  │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_5 (Dense)                      │ (None, 10)                  │             650 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 242,762 (948.29 KB)
 Trainable params: 242,762 (948.29 KB)
 Non-trainable params: 0 (0.00 B)


In [None]:
# Define our MLP: replace "None" with the corresponding layers as described
input_layer = keras.layers.Input(x_train[0].shape)
# First layer: flatten the 2D input into 1D for the dense layers
x = None(...)(input_layer)
# Second layer: dense layer with 256 neurons and a relu activation
x = None(...)(x)
# Third layer: dropout with 25% of neurons disabled
x = None(...)(x)
# Fourth layer: dense layer with 128 neurons and a relu activation
x = None(...)(x)
# Fifth layer: dropout with 25% of neurons disabled
x = None(...)(x)
# Sixth layer: dense layer with 64 neurons and a relu activation
x = None(...)(x)
# Seventh layer: dropout with 25% of neurons disabled
x = None(...)(x)
# Eighth layer: dense layer for classification into the number of classes
x = None(...)(x)
# Define the model from the input and final layers
mlp_model = keras.Model(input_layer, x)
# Print the model summary
mlp_model.summary()

Now we need to define the loss function and optimiser that we will use to perform the gradient descent optimisation.

* `keras.losses.categorical_crossentropy` is the loss function for multi-category classification tasks
* `keras.optimizers.Adam(learning_rate=<learning_rate>)` is a choice of optimiser that can be used here. We need to give the learning rate as an argument

The next step is to then compile the model and tell it which loss function and optimiser to use, and which metrics to display whilst training.

* `model.compile(loss=<loss_function>, optimizer=<optimiser>, metrics=['accuracy'])`, to give an example where we will see the accuracy during the training process.

In [None]:
# The batch size controls the number of images that are processed simultaneously
batch_size = 128
# The number of epochs that we want to train the network for
epochs = 5
# The learning rate (step size in gradient descent)
learning_rate = 0.001
# Define the loss function and optimiser and then compile the model, replacing
# "None" as required
# Categorical crossentropy loss function
loss_function = None
# Adam optimiser using the learning rate defined above
optimiser = None
# Compile the model with the loss function and optimisers defined above
None

Now we are ready to train our MLP and run it on the MNIST dataset. We do this using the `fit` function of the model. It has many arguments, of which I list those we will need below:

* `model.fit(x=<x>, y=<y>, batch_size=<batch_size>, epochs=<epochs>, validation_data = (<x_test>, <y_test>), verbose=1)`

In [None]:
# Train the model using the training images and targets, and use the test
# images as the validation sample. Replace "None" as appropriate
None

Let's define a couple of functions to look at some images that we classified incorrectly

In [None]:
# Make a list of incorrect classifications
def FindIncorrectClassifications(network_model, images, targets):
    incorrect_indices = []
    # Use the network to predict the classification of the images.
    raw_predictions = network_model.predict(images)
    for i in range(0, len(raw_predictions)):
        # Remember the raw output from the CNN gives us an array of scores. We want
        # to select the highest one as our prediction. We need to do the same thing
        # for the truth too since we converted our numbers to a categorical
        # representation earlier. We use the np.argmax() function for this
        prediction = np.argmax(raw_predictions[i])
        truth = np.argmax(targets[i])
        if prediction != truth:
            incorrect_indices.append([i,prediction,truth])
    print('Number of images that were incorrectly classified =',len(incorrect_indices))
    return incorrect_indices

def DrawFailure(images, incorrect_indices, index_to_show=0):
    image_to_plot = images[incorrect_indices[index_to_show][0]]
    fig, ax = plot.subplots(1, 1)
    print('Incorrect classification for image',incorrect_indices[index_to_show][0],
          ': predicted =',incorrect_indices[index_to_show][1],
          'with true =',incorrect_indices[index_to_show][2])
    ax.imshow(image_to_plot)

And now lets look at the images

In [None]:
mlp_failures = None

In [None]:
DrawFailure(x_test, mlp_failures, 0)

---

Now we want to define a CNN. The basic building blocks you will need are:


*   Convolutional layers: `keras.layers.Conv2D(num_filters, (k,k), activation='relu')`. Typical values of `k` are 3, 5, or 7
*   Pooling layers: `keras.layers.MaxPooling2D((2,2))` will perform a factor of 2 downsampling in the two dimensions of image
*   Dropout: keras.layers.Dropout(fraction) where fraction is the fraction of weights that are ignored. Typical values can be 0.25 or 0.5
*   Dense layers: `keras.layers.Dense(num_nodes, activation='relu')` where the num_nodes is how many neurons are in the layer. The final layer of the CNN needs have to have `num_nodes = num_classes`
*   Flatten layer: This just converts and n-dimensional tensor into a vector. In this case we use it to present a dense output layer with a vector input


In the following way of writing our network, we need to write things in the form:

`layer_output = keras.layers.LayerNameHere(arguments_go_here)(layer_input)`

For the first CNN we are building, you will hopefully see the following output from the model.summary() command:

```
Model: "model"
_________________________________________________________________
 Layer (type)                 Output Shape              Param #   
=================================================================
 input_1 (InputLayer)         [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)              (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
                                                              
 dropout (Dropout)            (None, 13, 13, 32)        0         
                                                                 
 flatten (Flatten)            (None, 5408)              0         
                                                                 
 dense (Dense)                (None, 10)                54090     
                                                                 
=================================================================
Total params: 54,410
Trainable params: 54,410
Non-trainable params: 0
_________________________________________________________________
```

In [None]:
# Define our simple CNN model: replace "None" with the corresponding layers as described
input_layer = keras.layers.Input(x_train[0].shape)
# First layer: 2D convolution with 32 filters of size (3,3) and relu activation
x = None(...)(input_layer)
# Second layer: 2D max pooling layer to downsample by a factor of 2 in both dimensions
x = None(...)(x)
# Third layer: dropout with 25% of neurons disabled
x = None(...)(x)
# Fourth layer: flatten the output into 1D for input to a dense layer
x = None(...)(x)
# Fifth layer: dense layer for classification into the number of classes
x = None(...)(x)
# Define the model from the input and final layers
cnn_model = keras.Model(input_layer, x)
# Print the model summary
cnn_model.summary()

In [None]:
# Set up the model to train with the same hyperparameters as the MLP
cnn_loss_function = keras.losses.categorical_crossentropy
cnn_optimiser = keras.optimizers.Adam(learning_rate=learning_rate)
cnn_model.compile(loss=cnn_loss_function, optimizer=cnn_optimiser, metrics=['accuracy'])

Now we can run our network on whichever data sample we requested. Initially on `mnist` we'll hopefully see that we can reach a very high accuracy.

In [None]:
# Train the model using the training images and targets, and use the test
# images as the validation sample.
cnn_model.fit(x_train, y_train,
          batch_size = batch_size,
          epochs = epochs,
          verbose = 1,
          validation_data = (x_test, y_test))


Now let's look at some failures again

In [None]:
cnn_failures = FindIncorrectClassifications(cnn_model, x_test, y_test)

In [None]:
DrawFailure(x_test, cnn_failures, 0)

---

Now we will make a CNN using the depthwise-separable convolution layer.
* `keras.layers.SeparableConv2D(num_filters, kernel_size, activation)` noting that there are many other arguments. The number of filters corresponds to the point-wise convolution that sets the number of output channels. The kernel size corresponds to the size of the kernel in the initial depthwise convolution.

In this case, we want to keep the output data size from the SeparableConv2D layer the same as our previous Conv2D layer, so we need to have the same number of filters and the same kernel size.

The network summary should look as follows:


```
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ input_layer_2 (InputLayer)           │ (None, 28, 28, 1)           │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ separable_conv2d (SeparableConv2D)   │ (None, 26, 26, 32)          │              73 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 13, 13, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout_4 (Dropout)                  │ (None, 13, 13, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ flatten_2 (Flatten)                  │ (None, 5408)                │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_5 (Dense)                      │ (None, 10)                  │          54,090 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 54,163 (211.57 KB)
 Trainable params: 54,163 (211.57 KB)
 Non-trainable params: 0 (0.00 B)
```



In [None]:
# Define our mobile CNN model: replace "None" with the corresponding layers as described
input_layer = keras.layers.Input(x_train[0].shape)
# First layer: 2D depthwise separable convolution with 32 point-wise filters and (3,3) kernels and relu activation
x = None(...)(input_layer)
# Second layer: 2D max pooling layer to downsample by a factor of 2 in both dimensions
x = None(...)(x)
# Third layer: dropout with 25% of neurons disabled
x = None(...)(x)
# Fourth layer: flatten the output into 1D for input to a dense layer
x = None(...)(x)
# Fifth layer: dense layer for classification into the number of classes
x = None(...)(x)
# Define the model from the input and final layers
mobile_model = keras.Model(input_layer, x)
# Print the model summary
mobile_model.summary()

In [None]:
# Set up the model to train with the same hyperparameters as the MLP
mobile_loss_function = keras.losses.categorical_crossentropy
mobile_optimiser = keras.optimizers.Adam(learning_rate=learning_rate)
mobile_model.compile(loss=mobile_loss_function, optimizer=mobile_optimiser, metrics=['accuracy'])

In [None]:
# Train the model using the training images and targets, and use the test
# images as the validation sample.
mobile_model.fit(x_train, y_train,
          batch_size = batch_size,
          epochs = epochs,
          verbose = 1,
          validation_data = (x_test, y_test))

Feel free to look at errors again if you want to

In [None]:
mobile_failures = FindIncorrectClassifications(mobile_model, x_test, y_test)

In [None]:
DrawFailure(x_test, mobile_failures, 0)