### Convolutional Neural Networks

Since convolutional neural networks have been proved to perform better than regular neural networks on images we can add convolution and pooling to the original neural network we built and trained on the CIFAR-10 dataset. Remember this achieved x% accuracy with the regular neural network.

## Import the libraries and preprocess the data (as before)

In [1]:
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

# loading dataset
cifar10 = keras.datasets.cifar10

# loading train and test set
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()

# flatten labels
train_labels = train_labels.reshape(train_labels.shape[0])
test_labels = test_labels.reshape(test_labels.shape[0])

# declare class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# scale images
train_images = train_images / 255.0
test_images = test_images / 255.0

## Build the CNN model

Building the convolutional neural network requires configuring the layers of the model, then compiling the model - just like the first simple neural network we built.

### Set up the layers

These are the layers we have in the first model



In [2]:
# model = keras.Sequential([
#         keras.layers.Flatten(input_shape=(32, 32, 3)),
#         keras.layers.Dense(128, activation='relu'),
#         keras.layers.Dense(10, activation='softmax')
#         ])

We will simply add more layers to this simple architecture to turn it into a convolutional neural network.

Based on the VGG16 paper, which can be found here https://neurohive.io/en/popular-networks/vgg16/, we will build a VGG network consisting of 3 blocks.

Each of the 3 blocks will consist of 2 convolutional layers and a max pooling layer.

Remember:

* *Convolution*: dot product of matrix of pixels and filter matric (kernel)
* *Max Pooling*: the outputs from convolution are pooled to reduce the size and therefore speed up computation. This is usually done by only selecting the maximum value of a section of the matrix.

## CNN Model 1

In [None]:
model = keras.Sequential([
    # block 1
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # block 2
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # block 3
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # old blocks - fully connected
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [None]:
# compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# model summary
model.summary()

In [None]:
# train the model
model.fit(train_images, train_labels, epochs=10)

In [None]:
# view the results on the train dataset
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

Great train and test accuracy. But like in the regular neural network there is a big difference between train accuracy and test accuracy ~15%. This difference indicates we have overfit the model for the training set which does not generalise well to new data.

## Avoid overfitting with dropout regularisation

 Since there is a clear indication of overfitting let's try a technique to help avoid this. Dropout is a regularisation method where a certain number of layer outputs are randomly ignored, i.e. droppped out, forcing nodes within a layer to probabilistically take on more on less responsibility for the input.
 
More can be read on dropout here: http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

## CNN Model 2

In [None]:
model = keras.Sequential([
    # block 1
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # dropout: 25% of outputs are randomly not passed to the next layer
    keras.layers.Dropout(0.25),
    
    # block 2
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # dropout: 25% of outputs are randomly not passed to the next layer
    keras.layers.Dropout(0.25),
    
    # block 3
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # dropout: 40% of outputs are randomly not passed to the next layer
    keras.layers.Dropout(0.4),
    
    # old blocks - fully connected
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [None]:
#compile model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [None]:
# model summary
model.summary()

In [None]:
# train model
model.fit(train_images, train_labels, epochs=10)

In [None]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

Although the train accuracy for the model with dropout layers is lower than previously, the test accuracy is higher at 78%. This means that the model with dropout layers generalises better to unseen data. Now it's time to build one for yourself.


## Exercise

## Part 1 

Add more CNN blocks to model 1 and see what impact this has on the models performance. Feel free to play around with the number of training epochs too.

In [None]:
model = keras.Sequential([
    # block 1
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # block 2
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    # block 3
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    keras.layers.MaxPooling2D((2, 2)),
    
    ###
    #add more blocks here
    ###
    
    # old blocks - fully connected
    keras.layers.Flatten(input_shape=(32, 32, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

## Part 2

Take the model you built in Part 1 and add some dropout layers to see if this has an impact on overfitting and accuracy. Feel free to play around with dropout rates.

## Part 3

If time permits, add some data augmentation you learned in the Keras notebook to see if this has an impact on the performance of the model.