# COMM7370 AI Theories and Applications
# Tutorial: Convolutional Neural Network by Keras
## The Problem: MNIST digit classification
In this tutorial, we'll tackle a classic machine learning problem: MNIST handwritten digit classification: given an image, classify it as a digit. But we will use CNN to solve it.

<table class="image">
<tr><td><img src="mnist-examples.webp" alt="drawing" width="450"/></td></tr>
<caption align="center">Sample images from the MNIST dataset</caption>
</table>

Each image in the MNIST dataset is 28x28 and contains a centered, grayscale digit. Our CNN will take an image and output one of 10 possible classes (one for each digit).\
MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing.
## 1. Setup

In [None]:
# install used packages in the current Jupyter kernel
import sys
!{sys.executable} -m pip install keras
!{sys.executable} -m pip install tensorflow
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install matplotlib
!{sys.executable} -m pip install os

In [None]:
# import packages
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
import mnist
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from keras.utils import to_categorical

# If you are using MacOS, please un-comment the following line
# allow to duplicate dll
os.environ['KMP_DUPLICATE_LIB_OK']='True'

## 2. Preparing the Data
Before we begin, we’ll normalize the image pixel values from [0, 255] to [-0.5, 0.5] to make our network easier to train (using smaller, centered values usually leads to better results). We’ll also reshape each image from (28, 28) to (28, 28, 1) because Keras `Conv2D` requires the third dimension.

`np.expand_dims` is used to expand the shape of an array, know more from [here](https://docs.scipy.org/doc/numpy/reference/generated/numpy.expand_dims.html)

In [None]:
def load_data(path):
    '''
    load mnist dataset from specified path
    '''
    with np.load(path) as f:
        X_train, y_train = f['x_train'], f['y_train']
        X_test, y_test = f['x_test'], f['y_test']
        return (X_train, y_train), (X_test, y_test)


(X_train, y_train), (X_test, y_test) = load_data('./data/mnist.npz')

# Normalize the images.
X_train = (X_train / 255) - 0.5
X_test = (X_test / 255) - 0.5

# Reshape the images.
X_train = np.expand_dims(X_train, axis=3)
X_test = np.expand_dims(X_test, axis=3)
print(X_train.shape)
print(X_test.shape)

## 3. Building the Model
We’ll be using the `Sequential` model, since our CNN will be a linear stack of layers.

The `Sequential` constructor takes an array of Keras Layers. We’ll use 3 types of layers for our CNN: **Convolutional**, **Max Pooling**, and **Softmax**.
<img src="cnn-dims-3.svg" alt="drawing" width="550"/>
Here, we stack one convolution layer, one max pooling layer and one fully connected layer.

In [None]:
num_filters = 8
filter_size = 3
pool_size = 2

# Build the model.
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

- `num_filters`, `filter_size`, and `pool_size` are self-explanatory variables that set the hyperparameters for our CNN.
- The first layer in any `Sequential` model must specify the input_shape, so we do so on `Conv2D`. Once this input shape is specified, Keras will automatically infer the shapes of inputs for later layers.
- [Conv2D](https://keras.io/layers/convolutional/): This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs.
- [MaxPooling2D](https://keras.io/layers/pooling/): Max pooling operation for spatial data. 
    - pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
    - strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
- The output Softmax layer has 10 nodes, one for each class.
- [Flatten](https://keras.io/layers/core/): Flattens the input.
<img src="flatten.png" alt="drawing" width="400"/>
<img src="flatten-fig.png" alt="drawing" width="550"/>

## 4. Compiling the Model
Before we can begin training, we need to configure the training process. We decide 3 key factors during the compilation step:
- The **optimizer**. We’ll stick with a pretty good default: the Adam gradient-based optimizer (Adam - A Method for Stochastic Optimization). Keras has many [other optimizers](https://keras.io/optimizers/) you can look into as well.
- The **loss function**. Since we’re using a Softmax output layer, we’ll use the Cross-Entropy loss. Keras distinguishes between binary_crossentropy (2 classes) and categorical_crossentropy (>2 classes), so we’ll use the latter. [See all Keras losses](https://keras.io/losses/).
- A list of **metrics**. Since this is a classification problem, we’ll just have Keras report on the accuracy metric.

Here’s what that compilation looks like:

In [None]:
# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

## 5. Training the Model
Training a model in Keras literally consists only of calling `fit()` and specifying some parameters. There are a lot of possible parameters, but we’ll only manually supply a few:
- The **training data** (images and labels), commonly known as X and Y, respectively.
- The **number of epochs** (iterations over the entire dataset) to train for.
- The **batch size** (number of samples per gradient update) to use when training.

Keras expects the training targets to be 10-dimensional vectors, since there are 10 nodes in our Softmax output layer, but we’re instead supplying a single integer representing the class for each image. Conveniently, Keras provide the `to_categorical` method that fixes this exact issue. It turns our array of class integers into an array of <span style="color:orange">one-hot vectors</span> instead. For example, digit 2 would become `[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]`

In [None]:
# Train the model.
history = model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

In [None]:
# plotting the metrics
fig = plt.figure()
plt.subplot(2,1,1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='lower right')

plt.subplot(2,1,2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper right')

plt.tight_layout()

We achieve **97.2%** test accuracy with this simple CNN!

### Visualization
Displaying original Image

In [None]:
plt.imshow(X_test[51][:,:,0]);

- Confusion matrix

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known.

In [None]:
from sklearn.metrics import confusion_matrix
Y_prediction = model.predict(X_test)
# Convert predictions classes to one hot vectors 
Y_pred_classes = np.argmax(Y_prediction,axis = 1) 
# Convert validation observations to one hot vectors
# Y_true = np.argmax(y_test,axis = 1) 
Y_true = y_test
# compute the confusion matrix
confusion_mtx = confusion_matrix(Y_true, Y_pred_classes) 

plt.figure(figsize=(10,8))
sns.heatmap(confusion_mtx, annot=True, fmt="d");

## 6. Using the Model
Now that we have a working, trained model, let’s put it to use. The first thing we’ll do is save it to disk so we can load it back up anytime:

In [None]:
model.save_weights('cnn.h5')

We can now reload the trained model whenever we want by rebuilding it and loading in the saved weights:

In [None]:
num_filters = 8
filter_size = 3
pool_size = 2

# Build the model.
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

# Load the model from disk later using:
model.load_weights('cnn.h5')

Using the trained model to make predictions is easy: we pass an array of inputs to `predict()` and it returns an array of outputs. Keep in mind that the output of our network is 10 probabilities (because of softmax), so we’ll use `np.argmax()` to turn those into actual digits.

In [None]:
# Predict on the first 5 test images.
predictions = model.predict(X_test[:5])

# Print our model's predictions.
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

# Check our predictions against the ground truths.
print(y_test[:5]) # [7, 2, 1, 0, 4]

## 8. Extensions
There’s much more we can do to experiment with and improve our network - in this [official Keras MNIST CNN example](https://keras.io/examples/mnist_cnn/), they achieve 99.25% test accuracy after 12 epochs. Some examples of modifications you could make to our CNN include:

### Network Depth
What happens if we add or remove Convolutional layers? How does that affect training and/or the model’s final performance?

In [None]:
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  Conv2D(num_filters, filter_size),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

### Dropout
What if we tried adding [Dropout](https://keras.io/layers/core/#dropout) layers, which are commonly used to prevent overfitting?

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass.

`Dropout` consists in randomly setting a fraction rate of input units, indicates the fraction of the input units to drop at each update during training time.

In [None]:
from keras.layers import Dropout

model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Dropout(0.5),
  Flatten(),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

### Fully-connected Layers
What if we add fully-connected layers between the Convolutional outputs and the final Softmax layer? This is something commonly done in CNNs used for Computer Vision.

In [None]:
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(64, activation='relu'),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

### Convolution Parameters
What if we play with the [Conv2D](https://keras.io/layers/convolutional/#conv2d) parameters?

`padding='same'`: “same” padding, since the input and output have the same dimensions. 

In [None]:
# These can be changed, too!
num_filters = 8
filter_size = 3

model = Sequential([
  # See https://keras.io/layers/convolutional/#conv2d for more info.
  Conv2D(
    num_filters,
    filter_size,
    input_shape=(28, 28, 1),
    strides=2,
    padding='same',
    activation='relu',
  ),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

### The full code

In [None]:
import numpy as np
import os
import mnist
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten
from keras.utils import to_categorical

# If you are using MacOS, please un-comment the following line
# allow to duplicate dll
os.environ['KMP_DUPLICATE_LIB_OK']='True'

def load_data(path):
    '''
    load mnist dataset from specified path
    '''
    with np.load(path) as f:
        X_train, y_train = f['x_train'], f['y_train']
        X_test, y_test = f['x_test'], f['y_test']
        return (X_train, y_train), (X_test, y_test)


(X_train, y_train), (X_test, y_test) = load_data('./data/mnist.npz')

# Normalize the images.
X_train = (X_train / 255) - 0.5
X_test = (X_test / 255) - 0.5

# Reshape the images.
X_train = np.expand_dims(X_train, axis=3)
X_test = np.expand_dims(X_test, axis=3)

num_filters = 8
filter_size = 3
pool_size = 2

# Build the model.
model = Sequential([
  Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
  MaxPooling2D(pool_size=pool_size),
  Flatten(),
  Dense(10, activation='softmax'),
])

# Compile the model.
model.compile(
  'adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

# Train the model.
model.fit(
  X_train,
  to_categorical(y_train),
  epochs=5,
  validation_data=(X_test, to_categorical(y_test)),
)

# Save the model to disk.
model.save_weights('cnn.h5')

# Load the model from disk later using:
# model.load_weights('cnn.h5')

# Predict on the first 5 test images.
predictions = model.predict(X_test[:5])

# Print our model's predictions.
print(np.argmax(predictions, axis=1)) # [7, 2, 1, 0, 4]

# Check our predictions against the ground truths.
print(y_test[:5]) # [7, 2, 1, 0, 4]


### Reference
- https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

- The codes in this notebook are modified from various sources. All codes are for educational purposes only and released under the CC1.0.