# Hyperparameter Optimisation using Grid Search

In this repo, we use `hyperas` package to perform Grid Search Hyperparameter search. We will use this to try to find the optimal hyper-parameters for a FFNN based model and a CNN based model to classify mnist digits.

We will use the following packages:
- Keras
- Numpy
- hyperas

### Import Packages

First we will install and import the packages.

In [1]:
!pip install hyperas

from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Dropout
from keras.datasets import mnist
from keras.models import Sequential
import keras
from keras.callbacks import EarlyStopping, ModelCheckpoint

import numpy as np
from hyperopt import Trials, STATUS_OK, tpe
from hyperas import optim
from hyperas.distributions import choice, uniform

import os



Using TensorFlow backend.


This is some boilerplate code to help hyperas run on google colab

In [0]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

fid = drive.ListFile({'q':"title='Grid_Search.ipynb'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})
f.GetContentFile('Grid_Search.ipynb')

## Load Data

#### For FFNN
Here we define a function that will grab our data, preprocess it and then send it to our model.

We perform the following preprocessing steps:
- We convert the image to float32
- We normalise it to the range (0, 1) by dividing it with 255
- We reshape it to (784) to feed it into a Dense Layer
- We one-hot encode the labels

In [0]:
def data_ffnn():
    '''
    Function to preprocess mnist data so that it can be fed into a FFNN
    Returns:
        x_train: Training Images
        y_train: Training Labels
        x_test: Testing Images
        y_test: Testing Labels
    '''
    (x_train, y_train), (x_test, y_test) = mnist.load_data()    #Loading the data

    #Converting it to Float32
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    
    #Normalising it by dividing with 255
    x_train /= 255
    x_test /= 255
    
    # One-Hot Encode the Labels
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)
    
    #Reshaping the Images
    x_train = x_train.reshape(x_train.shape[0], 784)
    x_test = x_test.reshape(x_test.shape[0], 784)
    
    return x_train, y_train, x_test, y_test


#### For CNN
Here we perform similar preprocessing to train the CNN.
The only difference is that we reshape the data to shape (28, 28, 1) which is the channels last format that TensorFlow uses.

In [0]:
def data_cnn():
    '''
    Function to preprocess mnist data so that it can be fed into a CNN
    Returns:
        x_train: Training Images
        y_train: Training Labels
        x_test: Testing Images
        y_test: Testing Labels
    '''
    (x_train, y_train), (x_test, y_test) = mnist.load_data()  #Loading the data
    
    #Converting it to Float32
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    
    #Normalising it by dividing it with 255
    x_train /= 255
    x_test /= 255
    
    #One-Hot Encode the labels
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)
    
    #Reshaping the data to feed it to the CNN
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
    x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
    
    return x_train, y_train, x_test, y_test


## Creating the Models

#### Feed Forward Neural Network
In Hyperas, the backend searches for the `{{}}` tag and replaces what is inside with values in different runs
When using `choice`, it chooses between the given set of hyperparameters
When using `uniform`, it uses values at steps given in the range

We keep the first layer parameters as a choice between 256, 512 and 1024 neurons. The `for` loop is used to change the number of hidden layers (choice of 2, 4 and 6). Inside the loop, we also change the number of neurons in each layer, whether dropout is used or not, and if used, the dropout percentage. Finally, we have an output layer.

In [0]:
def create_ffnn_model(x_train, y_train, x_test, y_test):
    '''
    Function to create and compile the FFNN
    Args:
        x_train: Training Images
        y_train: Training Labels
        x_test: Testing Images
        y_test: Testing Labels
    Returns:
        Dictionary with loss, status and trained model
    '''
    model = Sequential()
    
    #Using choice to choose one of 256, 512 and 1024 neurons for each run.
    model.add(Dense({{choice([256, 512, 1024])}}, activation='relu', input_shape=(784,), name='input'))
    #Choosing the number of layers in the output.
    for _ in range({{choice([2, 4, 6])}}):
        model.add(Dense({{choice([32, 64, 128])}}, activation='relu'))
        #Choosing to apply or not apply dropout
        if {{choice([0, 1])}}  == 1:
            model.add(Dropout({{uniform(0.1, 0.5)}}))
    model.add(Dense(10, activation='softmax', name='output'))
    
    # We use categorical crossentropy as our loss and Adam as our optimiser
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])
    
    #Creating the path to save the models to
    if not os.path.exists(f'models/ffnn'):
        os.makedirs(f'models/ffnn')

    # We use EarlyStopping to stop the training of the model if the validation accuracy does not increase
    # after two epochs. ModelCheckpoint saves the best model that we trained.
    # We monitor the validation loss in these callbacks.
    callbacks = [EarlyStopping(monitor='val_loss', patience=2),
             ModelCheckpoint(filepath=f'models/ffnn/best.hdf5', monitor='val_loss', save_best_only=True)]

    # Train the model for a large number of epochs, so that the model stops due to the callbacks
    # and not due to less epochs. We use a validation split of 20% of training data. We also give
    # a choice of 16, 32 or 64 bathsize
    model.fit(x_train, y_train, batch_size={{choice([16, 32, 64])}}, epochs=100, verbose=0, callbacks=callbacks, validation_split=0.2)
    
    # Evaluate the model with the eval dataset.
    score = model.evaluate(x_test, y_test, verbose=0)
    
    #Save the best model
    model.save(f'models/ffnn/{score[1]}.hdf5')
    
    #Boilerplate Hyperas code to give it info on the model performance
    return {'loss': -score[1], 'status': STATUS_OK, 'model': model}

#### CNN Model

We give the model the following hyperparameter search space:
- Convolutional layers 2 or 4
- Conv kernel numbers 8, 12, 16 in the first layer and 2, 4 in the subsequent layers
- Conv kernel shape of 3, 5, 7 in the first layer and 3, 5 in the second layer
- Whether to use Max Pooling or not
- Whether to use dropout or not and if we then percentage between 0.1 and 0.5
- Number of FC layers of choice 2, 4, 6
- Number of neurons in each FC layer of choice 512, 1024 and 128
- If using dropout then percentage between 0.1 and 0.5
- Batch size of either 16, 32 and 64

In [0]:
def create_cnn_model(x_train, y_train, x_test, y_test):
    '''
    Function to create and compile the CNN
    Args:
        x_train: Training Images
        y_train: Training Labels
        x_test: Testing Images
        y_test: Testing Labels
    Returns:
        Dictionary with loss, status and trained model
    '''
    model = Sequential()
    
    #Choice of 8, 12, 16 for conv kernels and 3, 5, 7 for kernel size
    model.add(Conv2D({{choice([8, 12, 16])}}, kernel_size={{choice([(3,3), (5,5), (7,7)])}}, activation='relu', input_shape=(28, 28, 1), name='input'))

    #Choice of number of conv layers and whether to apply or not apply pooling and dropout
    for _ in range({{choice([2, 4])}}):
        model.add(Conv2D({{choice([2, 4])}}, kernel_size={{choice([(3,3), (5,5)])}}, activation='relu'))
        if {{choice([0, 1])}}  == 1:
            model.add(MaxPooling2D(pool_size=(2, 2)))
        if {{choice([0, 1])}}  == 1:
            model.add(Dropout({{uniform(0.1, 0.5)}}))
    model.add(Flatten())
    
    #Choice of number of Dense layers and whether to use Dropout or not
    for _ in range({{choice([2, 4, 6])}}):
        model.add(Dense({{choice([512, 1024, 128])}}, activation='relu'))
        if {{choice([0, 1])}}  == 1:
            model.add(Dropout({{uniform(0.1, 0.5)}}))
    
    model.add(Dense(10, activation='softmax', name='output'))
    
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])
    
    if not os.path.exists(f'models/cnn'):
        os.makedirs(f'models/cnn')
    
    # We use EarlyStopping to stop the training of the model if the validation accuracy does not increase
    # after two epochs. ModelCheckpoint saves the best model that we trained.
    # We monitor the validation loss in these callbacks.
    callbacks = [EarlyStopping(monitor='val_loss', patience=2),
             ModelCheckpoint(filepath=f'models/cnn/best.hdf5', monitor='val_loss', save_best_only=True)]

    model.fit(x_train, y_train, batch_size={{choice([16, 32, 64])}}, epochs=100, verbose=0, callbacks=callbacks, validation_split=0.2)
    
    # Evaluate the model with the eval dataset.
    score = model.evaluate(x_test, y_test, verbose=0)
    
    model.save(f'models/ffnn/{score[1]}.hdf5')
    
    return {'loss': -score[1], 'status': STATUS_OK, 'model': model}

## Starting the Training and Parameter Search

#### FFNN Training
The `optim.minimise` function tries to find the model with the least loss.
We feed it the model function, the data loading function and the number of evals to run for. Running for more evals will take more time, but will give better results.

The we evaluate the best model on the test dataset and see the result!

In [0]:
#Run the hyperparameter Search
best_run, best_model = optim.minimize(model=create_ffnn_model,
                                          data=data_ffnn,
                                          algo=tpe.suggest,
                                          max_evals=5,
                                          trials=Trials(),
                                     notebook_name='Grid_Search')

#Test the best model
best_model.evaluate(data_ffnn)[2], data_ffnn()[3])

>>> Imports:
#coding=utf-8

try:
    from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Dropout
except:
    pass

try:
    from keras.datasets import mnist
except:
    pass

try:
    from keras.models import Sequential
except:
    pass

try:
    import keras
except:
    pass

try:
    from keras.callbacks import EarlyStopping, ModelCheckpoint
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    import os
except:
    pass

try:
    from pydrive.auth import GoogleAuth
except:
    pass

try:
    from pydrive.drive import GoogleDrive
except:
    pass

try:
    from google.colab import auth
except:
    pass

try:
    from oauth2client.client import GoogleCredentials
except:
    pass

>>> Hyperas search space:

def get_space():
    return {
        'Dense': hp.ch

[0.09332765673322065, 0.9799]

As you can see from above, the best result is about 98% accuracy

#### CNN Training and Search
Similar to the FFNN training, we do it for the CNN. In this case, we feed in different data and model functions.

In [15]:
best_run, best_model = optim.minimize(model=create_cnn_model,
                                          data=data_cnn,
                                          algo=tpe.suggest,
                                          max_evals=5,
                                          trials=Trials(),
                                     notebook_name='Grid_Search')

best_model.evaluate(data_cnn()[2], data_cnn()[3])

>>> Imports:
#coding=utf-8

try:
    from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Dropout
except:
    pass

try:
    from keras.datasets import mnist
except:
    pass

try:
    from keras.models import Sequential
except:
    pass

try:
    import keras
except:
    pass

try:
    from keras.callbacks import EarlyStopping, ModelCheckpoint
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    import os
except:
    pass

try:
    from pydrive.auth import GoogleAuth
except:
    pass

try:
    from pydrive.drive import GoogleDrive
except:
    pass

try:
    from google.colab import auth
except:
    pass

try:
    from oauth2client.client import GoogleCredentials
except:
    pass

>>> Hyperas search space:

def get_space():
    return {
        'Conv2D': hp.c

[0.1096816274739273, 0.978]

The best model acheives and accuracy of 97.8%

## Result
This was a brute force attempt at hyperparameter search. Using Bayesian Optimisation, we can acheive better results!