# Convolutional Neural Networks

In this notebook we will implement a convolutional neural network. Rather than doing everything from scratch we will make use of [TensorFlow 2](https://www.tensorflow.org/) and the [Keras](https://keras.io) high level interface.

## Installing TensorFlow and Keras

TensorFlow and Keras are not included with the base Anaconda install, but can be easily installed by running the following commands on the Anaconda Command Prompt/terminal window:
```
conda install notebook jupyterlab nb_conda_kernels
conda create -n tf tensorflow ipykernel mkl
```
Once this has been done, you should be able to select the `Python [conda env:tf]` kernel from the Kernel->Change Kernel menu item at the top of this notebook. Then, we import TensorFlow package:

In [1]:
import tensorflow as tf

## Creating a simple network with TensorFlow

We will start by creating a very simple fully connected feedforward network using TensorFlow/Keras. The network will mimic the one we implemented previously, but TensorFlow/Keras will take care of most of the details for us.

### MNIST Dataset

First, let us load the MNIST digits dataset that we will be using to train our network. This is available directly within Keras:

In [2]:
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()

The data comes as a set of integers in the range [0,255] representing the shade of gray of a given pixel. Let's first rescale them to be in the range [0,1]:

In [3]:
x_train, x_test = x_train / 255.0, x_test / 255.0

Now we can build a neural network model using Keras. This uses a very simple high-level modular structure where we only have the specify the layers in our model and the properties of each layer. The layers we will have are as follows:
1. Input layer: This will be a 28x28 matrix of numbers.
2. `Flatten` layer: Convert our 28x28 pixel image into an array of size 784.
3. `Dense` layer: a fully-connected layer of the type we have been using up to now. We will use 30 neurons and the sigmoid activation function.
4. `Dense` layer: fully-connected output layer. 

In [17]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(30, activation='sigmoid'),
  tf.keras.layers.Dense(10, activation='softmax')
])

Next we compile this model, specifying the optimization algorithm (ADAM) and loss function (cross-entropy) to be used.

In [18]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

We now train the model with our training data. We will run for 5 epochs.

In [19]:
model.fit(x_train, y_train, epochs=5)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f92784a4210>

Finally, we check the accuracy of our model against the test data

In [22]:
model.evaluate(x_test, y_test, verbose=False)

[0.15718528725653888, 0.9548]

It has 95.5% accuracy, consistent with what was found during training. 

#### Exercises
Experiment with this network:
1. Change the number of neurons in the hidden layer.
2. Add more hidden layers.
3. Change the activation function in the hidden layer to `relu` (for examples see the list of [Keras Layer Activation Functions](https://keras.io/api/layers/activations/)).
4. Change the activation in the output layer to something other than `softmax`.
5. Change the loss function (for examples see the list of [Keras Loss Functions](https://keras.io/api/losses/)).
How does the performance of your network change with these modifications?

#### Task
Implement the neural network in "[Gradient-based learning applied to document recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf)", by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. The [Keras Layer documentation](https://keras.io/api/layers/) includes information about the layers supported. In particular, [`Conv2D`](https://keras.io/api/layers/convolution_layers/convolution2d) and [`MaxPooling2D`](https://keras.io/api/layers/pooling_layers/max_pooling2d) layers may be useful.

##### Solution

We first need to reshape the input data to make the images 28 x 28 x 1 rather than 28 x 28. This is beacause more generally we might have 28 x 28 x 3 to account for the three colour channels (red, green, blue) in an image, but here we have only one grayscale channel.

In [8]:
import numpy as np

In [24]:
X_train = x_train[..., np.newaxis]
X_test = x_test[..., np.newaxis]

We also convert the y's to categorical data

In [None]:
Y_train = tf.keras.utils.to_categorical(y_train, 10)
Y_test = tf.keras.utils.to_categorical(y_test, 10)

Now we construct our network with three convolution layers, two pooling layers and fully-connected layers at the end.

In [25]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(6, (5, 5), activation='relu', padding='same', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D((2, 2)),
  tf.keras.layers.Conv2D(16, (5, 5), activation='relu'),
  tf.keras.layers.MaxPooling2D((2, 2)),
  tf.keras.layers.Conv2D(120, (5, 5), activation='relu'),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(84, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')])

Next, we compile the model, specfiying cross-entropy loss and ADAM optimisation.

In [26]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Now train the model for 20 epochs

In [27]:
model.fit(X_train, Y_train, batch_size=128, epochs=20)

Train on 60000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f920876c710>

We have achieved 99.6% accuracy after training for 20 epochs. Let's check this against the test data:

In [30]:
model.evaluate(X_test, Y_test, verbose=False)

[0.053945613601373774, 0.9853]

The result is 98.5%, so we may have slightly overtrained, but still have a highly accurate model.