# Bayesian Optimisation Using Gaussian Process

In this repo, we use the `bayes_opt` package to perform Bayesian Optimisation Hyperparameter search. We will use this to try to find the optimal hyper-parameters for a FFNN based model and a CNN based model to classify mnist digits.

We will use the following packages:
- Keras
- Numpy
- bayes_opt

### Import Packages

First we will import the packages.

In [1]:
from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D, Dropout
from keras.datasets import mnist
from keras.models import Sequential
import keras
from keras.callbacks import EarlyStopping, ModelCheckpoint

import numpy as np
from bayes_opt import BayesianOptimization

import os

Using TensorFlow backend.


### Load the data

we can use the `mnist.load_data()` to get the data.

We then perform the following preprocessing:
- We convert the images to float32 representation
- We then normalise the images to range (0, 1) by dividing with 255
- We one-hot encode the labels

In [0]:
# Fetching the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Convert to Float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalise the data
x_train /= 255
x_test /= 255

# Convert labels to one-hot encoded representation
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

data=(x_train, y_train, x_test, y_test)

### Defining the Models

#### Feed Forward Neural Network
First let's try to optimise a FF model

This function defines the model. When using `bayes_opt`, we replace the different hyperparameters with variables that the `bayes_opt` then fills in for us during execution. Since there can be a lot of such hyperparameters, we pass them using `**kwargs`. Basically, it is a python dictionary with each key being an argument keyword and the value being the argument values.

We keep the first layer parameters constant and change the consecutive parameters.
The `for` loop is used to change the number of hidden layers. Inside the loop, we also change the number of neurons in each layer, whether dropout is used or not, and if used, the dropout percentage. Finally, we have an output layer.

In [0]:
def make_ffnn_model(image_shape, **kwargs):
    '''
    Function to make the Feed-Forward Neural Network Model
    Inputs:
        image_shape: The shape of the input
        **kwargs: The different parameters to optimise
    Returns:
        The model
    '''
    model = Sequential()

    model.add(Dense(500, activation='relu', input_shape=image_shape, name='input'))
    
    #For Loop to change the number of layers
    for _ in range(kwargs['layers']):
        model.add(Dense(kwargs['neurons'], activation='relu')) #Changing the number of neurons
        if kwargs['dropout']  == 1:                            #Whether dropout is applied or not
            model.add(Dropout(kwargs['dropout_perc']))         #If dropout is applied then the percentage
    model.add(Dense(10, activation='softmax', name='output'))  #Output layer
    
    return model

#### Convolutional Neural Network

Similar to before we define a CNN. The different parameters optimised are:
- The number of kernels in each Conv layer
- The size of the kernel
- Whether max pooling is applied. If yes, then we apply a (2,2) kernel with stride (2,2)
- If Dropout is applied after each CNN layer.
- If it is applied then the percentage
- The number of Dense Layers
- For each layer, the number of neurons
- If Dropout is applied after each layer
- If it is, then the percentage

In [0]:
def make_cnn_model(image_shape, **kwargs):
    '''
    Function to make the Convolutional Neural Network Model
    Inputs:
        image_shape: The shape of the input
        **kwargs: The different parameters to optimise
    Returns:
        The model
    '''
    model = Sequential()
    
    model.add(Conv2D(kwargs['kernels'], kernel_size=(kwargs['kernel_size'],kwargs['kernel_size']), activation='relu', input_shape=image_shape))
    
    for _ in range(kwargs['conv_layers']):
        model.add(Conv2D(kwargs['kernels'], kernel_size=(kwargs['kernel_size'],kwargs['kernel_size']), activation='relu'))
        if kwargs['maxpooling'] == 1:
            model.add(MaxPooling2D(pool_size=(2, 2)))
        if kwargs['dropout_cnn']  == 1:
            model.add(Dropout(kwargs['dropout_perc_cnn']))
    
    model.add(Flatten())
    
    for _ in range(kwargs['layers']):
        model.add(Dense(kwargs['neurons'], activation='relu'))
        if kwargs['dropout']  == 1:
            model.add(Dropout(kwargs['dropout_perc']))
    
    model.add(Dense(10, activation='softmax'))

    return model

### Setup the Model for Training

Before we can begin training, we need to specify how to train the model and the different parameters to train the model. This function is used to do that.

In [0]:
def fit_params(image_shape, data, **kwargs):
    '''
    Function to prepare the model for training
    Input:
        image_shape: The shape of the input to the model
        data: A tuple of the form (Training_Features, Training_Labels, Testing_Features, Testing_Labels)
        kwargs: The different parameters to optimise
    Returns:
        The Accuracy of the trained model given the parameters
    '''
    # Since most of our parameters are Integers (no. of layers, kernel size etc.)
    # We convert everything to int, except the dropout percentage
    for k in kwargs.keys():
        if 'perc' in k:
            continue
        kwargs[k]=kwargs[k].astype(np.int64)
    
    x_train, y_train, x_test, y_test = data
    
    # Reshape the input image to feed to the mode
    # If the image_shape is 784, then we are training a FFNN and reshape the data as such.
    # Else we reshape it to shape (28, 28, 1) (channel last format) as required by the 
    # TensorFlow backend
    if image_shape[0]==784:
        x_train = x_train.reshape(x_train.shape[0], 784)
        x_test = x_test.reshape(x_test.shape[0], 784)
        model = make_ffnn_model(image_shape, **kwargs)
    else:   
        x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
        x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
        model = make_cnn_model(image_shape, **kwargs)
        
    # We use categorical crossentropy as our loss and Adam as our optimiser
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

    # Creating the path to save the trained models
    if not os.path.exists(f'models/{image_shape[0]}'):
        os.makedirs(f'models/{image_shape[0]}')
    
    # We use EarlyStopping to stop the training of the model if the validation accuracy does not increase
    # after two epochs. ModelCheckpoint saves the best model that we trained.
    # We monitor the validation loss in these callbacks.
    callbacks = [EarlyStopping(monitor='val_loss', patience=2),
             ModelCheckpoint(filepath=f'models/{image_shape[0]}/best.hdf5', monitor='val_loss', save_best_only=True)]
    
    # Train the model for a large number of epochs, so that the model stops due to the callbacks
    # and not due to less epochs. We use a validation split of 20% of training data
    model.fit(x_train, y_train, batch_size=kwargs['batch_size'], epochs=100, verbose=0, callbacks=callbacks, validation_split=0.2)
    
    # Evaluate the model with the test dataset.
    score = model.evaluate(x_test, y_test, verbose=0)
    
    # Save the best model
    model.save(f'models/{image_shape[0]}/{score[1]}.hdf5')

    # Return the accuracy.
    return score[1]



from functools import partial

# Set the image for training the FFNN model
image_shape=(784, )

fit_with_partial = partial(fit_params, image_shape, data)

### Training the Models

#### FFNN
First we train the FFNN model.

We set the following parameter space:
- Number of layers between 1 and 10
- Number of neurons per layer between 50 and 500
- Batch size between 16 and 64
- dropout as either True or False
- Dropout percentage between 0.1 and 0.5

We can then use the `fit_with_partial` function to train our model using the parameter space we have set.

We then run the optimiser. We run it for 4 iterations and initialise the parameters 3 times. Ofcourse running it for more will help it search more space and might help improve accuracy

We print the hyperparameters for each iteration and also the best iteration

In [6]:
from bayes_opt import BayesianOptimization

# Bounded region of parameter space
pbounds = {'layers':(1,10), 'neurons':(50, 500), 'batch_size':(16, 64), 'dropout': (0,1), 'dropout_perc': (0.1,0.5)}

# Setting up the optimiser
optimizer = BayesianOptimization(
    f=fit_with_partial,
    pbounds=pbounds,
    verbose=2,  # verbose = 1 prints only when a maximum is observed, verbose = 0 is silent
    random_state=1,
)

# Start the parameter search
optimizer.maximize(init_points=3, n_iter=4)

# Print the results for each iteration
for i, res in enumerate(optimizer.res):
    print("Iteration {}: \n\t{}".format(i, res))

# Print the result of the best model
print(optimizer.max)

|   iter    |  target   | batch_... |  dropout  | dropou... |  layers   |  neurons  |
-------------------------------------------------------------------------------------





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

| [0m 1       [0m | [0m 0.965   [0m | [0m 36.02   [0m | [0m 0.7203  [0m | [0m 0.1     [0m | [0m 3.721   [0m | [0m 116.0   [0m |
| [95m 2       [0m | [95m 0.9735  [0m | [95m 20.43   [0m | [95m 0.1863  [0m | [95m 0.2382  [0m | [95m 4.571   [0m | [95m 292.5   [0m |
| [95m 3       [0m | [95m 0.9779  [0m | [95m 36.12   [0m | [95m 0.6852  [0m | [95m 0.1818  [0m | [95m 8.903   [0m | [95m 62.32   [0m |
| [0m 4       [0m | [0m 0.9749  [0m | [0m 64.0    [0m | [0m 1.0     [0m | [0m 0.1     [0m | [0m 10.0    [0m | [0m 500.0   [0m |
| [0m 5       [0m | [0m 0.9684  [0m | [0m 16.09   [0m | [0m 0.5771  [0m | [0m 0.4721  [0m | [0m 2.037   [0m | [0m 498.3   [0m |
| [

#### Results
As you can see from above, the best model is when we use:
- batch size of 36
- no dropout
- 8 Dense Layers
- 62 Neurons in each layer

We acheive a testing accuracy of 97.79%

#### CNN Model
Next we try to train the CNN model

We set the following hyperparameters:
- Convolutional layers between 1 and 5
- Conv kernel numbers between 16 and 64
- Conv kernel shape between 3 and 9
- Whether to use Max Pooling or not
- Whether to use dropout or not and if we then percentage between 0.1 and 0.5
- Number of FC layers between 1 and 10
- Number of neurons in each FC layer
- If using dropout then percentage between 0.1 and 0.5
- Batch size between 16 and 64

In [7]:
image_shape=(28, 28, 1)


fit_with_partial = partial(fit_params, image_shape, data)

# Bounded region of parameter space
pbounds = {'layers':(1,10), 'neurons':(50, 500), 'batch_size':(16, 64), 'dropout': (0,1), 'dropout_perc': (0.1,0.5), 
          'dropout_perc_cnn':(0.1, 0.5), 'dropout_cnn':(0,1), 'maxpooling': (0,1), 'conv_layers':(1,5), 
           'kernel_size':(3, 9), 'kernels':(16, 64)}

# Setting up the optimiser
optimizer = BayesianOptimization(
    f=fit_with_partial,
    pbounds=pbounds,
    verbose=2,  # verbose = 1 prints only when a maximum is observed, verbose = 0 is silent
    random_state=1,
)

# Start the parameter search
optimizer.maximize(init_points=3, n_iter=4)

# Print the results for each iteration
for i, res in enumerate(optimizer.res):
    print("Iteration {}: \n\t{}".format(i, res))

# Print the result of the best model
print(optimizer.max)

|   iter    |  target   | batch_... | conv_l... |  dropout  | dropou... | dropou... | dropou... | kernel... |  kernels  |  layers   | maxpoo... |  neurons  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m 0.9883  [0m | [0m 36.02   [0m | [0m 3.881   [0m | [0m 0.000114[0m | [0m 0.3023  [0m | [0m 0.1587  [0m | [0m 0.1369  [0m | [0m 4.118   [0m | [0m 32.59   [0m | [0m 4.571   [0m | [0m 0.5388  [0m | [0m 238.6   [0m |
| [0m 2       [0m | [0m 0.9883  [0m | [0m 48.89   [0m | [0m 1.818   [0m | [0m 0.8781  [0m | [0m 0.02739 [0m | [0m 0.3682  [0m | [0m 0.2669  [0m | [0m 6.352   [0m | [0m 22.74   [0m | [0m 2.783   [0m | [0m 0.8007  [0m | [0m 485.7   [0m |
| [95m 3       [0m | [95m 0.9894  [0m | [95m 31.04   [0m | [95m 3.769   [0m | [95m 0.8764  [0m | [95m 0.8946  [0m | [95m 0.134   [0m | [95m 0.115

#### Result

The best model gets an accuracy of 99.02% which is much higher than the FFNN. You can see the final hyperparameters above

## Conclusion
This notebook shows how to use Bayesian Optimisation and the `bayes_opt` package to do hyperparameter search. Better results can be obtained by changing the hyperparameter bounds and increase the number of iterations we run it for.