# A deep-learning neural network for image recognition
We present here a `Python Keras` implementation of a deep learning neural network for image recognition.

This is a detailed implementation with the code blocks from the previous implementation that are thoroughlly described and explained.

The publicly available `MNIST` dataset is used.

`Keras` is a popular `Python` library for deep learning models:
- wrapper for `TensorFlow`
- minimalistic
- modular
- easy to implement

The `MNIST` database (Modified National Institute of Standards and Technology database) is a large database of hand-written digits (details and data [here](http://yann.lecun.com/exdb/mnist/)):

![mnist](https://drive.google.com/uc?id=1KNK3-8qahQixvL-StpDAs6GoOUAHKSDy)

Deep learning consists of neural networks with multiple hidden layers that learn increasingly abstract and complex representations of the input data.
For instance, if we train a deep learning model to recognize hand-written digits (images):

- the first hidden layers might only learn local edge patterns;
- subsequent layers learns more complex representations of the data;
- the last layer will classify the image as one of ten digits.

For image recognition we use a specific deep learning architecture: **convolutional neural networks** (*CNN*), which assume that input data are images, thereby greatly reducing the number of model parameters to be tuned (more on *CNN's* later in the course).


## 1. SET UP

- **Yesterday**: we loaded an external `python` script (`support_code.py`) to import libraries and set the seed 'behind the scenes'
- **Today**: we import libraries and set the seed <u>manually</u>, "in front of the public"

#### Importing libraries

**Now that you learnt the basic syntax for Python and Keras you should be able to recognise (some of) this!**

We import the necessary libraries to build a DL NN for image recognition:

- import the Sequential model type from Keras: linear stack of neural network layers, to be used to build a feed-forward CNN
-  import the 'core' layers from Keras: layers that are used in almost any neural network
- import the CNN layers from Keras: convolutional layers to train the model on image data
- load the MNIST dataset

In [None]:
import tensorflow as tf
import tensorflow.keras.utils
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K # needed for image_data_format()
from keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

We also import the following libraries:

- `numpy`
- `matplotlib`
- `sklearn`

In [None]:
#libraries
import numpy as np

#general random seed
from numpy.random import seed

## matplotlib
from matplotlib import pyplot as plt

## scikit-learn
import sklearn.metrics

#### Setting the seed(s)

We set the seed for the different libraries that make use of some random operations (e.g. data split, batches etc.)

In [None]:
## numpy seed
np.random.seed(123)

## tensorflow-specific seed
tf.keras.utils.set_random_seed(10) ## python, numpy and tensorflow random seeds (all in one function call)
tf.config.experimental.enable_op_determinism()

## 2. LOAD THE DATA

- **Yesterday**: we loaded data using the `load_data()` function
- **Today**: we load the MNIST data and assign part to training and part to testing, manually

We load the data from the MNIST dataset, and assign them to the training and testing sets.

Image data is generally harder to work with than flat relational data. The MNIST dataset is a beginner-friendly intoduction to working with image data: it contains $70\,000$ labeled images of handwritten digits. These are grey-scale images, 28 x 28 pixels.

The MNIST dataset comprises $60\,000$ training observations and $10\,000$ test observations: the function `load_data()` automatically assigns these to the training and testing sets.

In [None]:
# the data, split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

ntrain = 20000
ntest = 5000

X_train = X_train[0:ntrain,]
y_train = y_train[0:ntrain]
X_test = X_test[0:ntest,]
y_test = y_test[0:ntest]

A little sanity check:

In [None]:
print("Size of the training set")
print(X_train.shape)
print("Size of the test set")
print(X_test.shape)

Data have been split into a **training** and a **testing set**, and within these into a **three-dimensional array** $X$ of **features** (samples x pixels x pixels) and a vector $y$ of labels (0-9 digits).

Each record in the 3-D array $X$ is a 28 x 28 matrix of grayscale intensities (1 byte = 8 bits = 0 - 255 values). Grayscale (black-n-white) images only use one color channel. Colour images use three channels (e.g. RGB) and each image (record) is therefore a 3-D matrix (pixels x pixels x 3).

## 3. CONFIGURE THE PARAMETERS

- **Yesterday**: we set parameters 'behind the scenes' with the `set_parameters()` function
- **Today**: we specify the parameters manually

Define model parameters:

- input shape
- n. of classes: n. of classes to predict (10 digits, in the MNIST problem)
- batch size: DL models typically do not process the entire dataset at once, rather break it in **batches**
- n. of epochs: n. of **iterations** over the entire dataset

In [None]:
img_rows = 28 #pixels
img_cols = 28 #pixels
num_classes = 10
batch_size = 64
num_epochs = 20

**Your turn! QUESTION: do you remember why we specify 10 classes? (`num_classes`)**

## 4. DATA PREPROCESSING

- **Yesterday**: we processed data 'behind the scenes' with the `preprocess()` function
- **Today**: we preprocess data <u>manually</u> (shape, range, labels)

#### Get the data size right

First, we need to explicitly declare the depth of the image representation array: in the case of grayscale images there is only one channel, and this dimension is 1.

We use the utility function [image_data_format()](https://keras.io/api/utils/backend_utils#imagedataformat-function) from keras [backend utilities](https://keras.io/api/utils/backend_utils/) to discover the convention ('channels_first' or 'channels_last') of our current system.

Depending on the backend (Theano or TensorFlow), the depth dimension is either the first or the last to be declared:

In [None]:
if K.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

print("Modified array dimensions:")
print(X_train.shape)
print(input_shape)


#### Normalization

We then convert the input data type to `float32` and normalize the data values to the range $[0, 1]$.
These are operational modifications necessary to speed up and optimize the calculations.

In [None]:
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255 #max value of pixel intensity
X_test /= 255 #max value of pixel intensity
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

Let's have a look at the normalised data:

In [None]:
i = 5
print(y_train[i])
temp = np.copy(X_train)
temp = temp.reshape(ntrain, img_rows, img_cols)
plt.imshow(temp[i, :, :,], cmap='gray')
print(np.round(X_train[i,3:18,0:12,0],3))

#### Labels format

Finally, label vectors are converted to binary class matrices. This serves to convert a vector of numerical digits to a matrix of ten classes per observation, which is a better suited representation for a classification problem.

In [None]:
# convert class vectors to binary class matrices (also known as OHE - One Hot Encoding)
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

Sanity check:

In [None]:
print(y_test[0:6,])

## 5. BUILD THE MODEL

- **Yesterday**: we built the model 'behind the scenes' with the function `build_model()`
- **Today**: we build the model <u>manually</u>, step-by-step, using what we learnt on **Keras/Tensorflow**

We now define our deep-learning **neural network architecture**, and start building our model for image recognition.

First, we declare a [sequential model](https://keras.io/guides/sequential_model/), that is a sequence of layers each with one input tensor and one output tensor.
Then we add a first convolutional layer ([Conv2D](https://keras.io/api/layers/convolution_layers/convolution2d/)) to our model. We see that there's a bunch of additional parameters:

- number of units (nodes)
- size of the kernel (filter: much more on this later!)
- type of activation function
- shape of the input array


In [None]:
model = Sequential()
model.add(
          Conv2D(32, kernel_size=(3, 3),
          activation='relu',
          input_shape=input_shape))

print(model.input_shape) ## convolutional "padding" (28-2 x 28-2) + 32 kernels

The input shape is (None, 28, 28, 1): 28 x 28 pixels, times 1 channel (grayscale), per 60,000 training samples.
The convolutional output shape is:

- None: not yet any samples trained (to be added later)
- 28 pixel x 28 pixel greyscale image

**Your turn! QUESTION: do you remember what `ReLU` is?**

Then we can add more layers to the deep-learning model:

In [None]:
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

You'll encounter these new layers later on in the course: do not worry if you don't understand everything for the moment!

A couple of things, though, you **already know**:
- [Dense](https://keras.io/api/layers/core_layers/dense/) layers (see image below)
- the [softmax](https://keras.io/api/layers/activations/#softmax-function) activation function (the multiclass analog of the logistic function) which returns a probability for each class, e.g. 10% of chance of the sample belonging to class 1, 15% for class 2 and so forth. The sum of all probabilities adds to 100%

![dnn](https://drive.google.com/uc?id=1XD6xqrN3xQdaCSyQhWOiMlbxKqualeKo)

## 6. COMPILE THE MODEL

- **Yesterday**: we compiled the model 'behind the scenes' using the function `compile_model()`
- **Today**: we compile the model <u>manually</u>, looking at the specific 'compilation ingredients'

When compiling the model we specify the **loss function** (here: [categorical_crossentropy](https://keras.io/api/losses/probabilistic_losses/#categoricalcrossentropy-class)), the **optimizer** (here: [Adadelta](https://keras.io/api/optimizers/adadelta/)) and the **metrics**

In [None]:
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adadelta(),
              metrics=['accuracy'])

## TRAIN THE MODEL

- **Yesterday**: we trained the model 'behind the scenes', using the function `train_model`
- **Today**: we train the model <u>manually</u>, looking at the specific 'training ingredients'

We then fit the model on the training data, specifying:

- the batch size
- the number of epochs to train the model

In [None]:
model.fit(X_train, y_train,
          batch_size=batch_size,
          epochs=num_epochs,
          verbose=1)

## 8. TEST THE MODEL

- **Yesterday**: we tested the model 'behind the scenes', using the function `evaluate_model()`
- **Today**: we test the model <u>manually</u>, by caclulating predictions and using metrics to measure model performance

#### Accuracy

We can now measure the performance (in terms of prediction accuracy) of the trained deep-learning model for image recognition.
To measure the performance, we applied our trained model to independent test data.

In [None]:
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

#### Confusion matrix

A [confusion matrix](https://en.wikipedia.org/wiki/Confusion_matrix) is another way to express the accuracy of your predictions. It's a square matrix, with as many rows (and columns) as your classes. Rows represent *true values* and columns represent *predicted values*. On the main diagonal are thus reported the correct predictions, while off-diagonal elements represent errors.

We'll use the [confusion_matrix()](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) function part of [scikit-learn library](https://scikit-learn.org/stable/).

In [None]:
#asking our model to return its predictions for the test set
predictions = model.predict(X_test)

#confusion_matrix function requires actual classes labels (expressed as int)
#and not probabilities as we handled so far
predicted_classes = predictions.argmax(axis=1)
true_classes = y_test.argmax(axis=1)

#rows are true values, columns are predicted values, numbering starts from zero
sklearn.metrics.confusion_matrix(true_classes, predicted_classes)