### Bayesian Optimization to find the optimal neural network architecture
### Fashion MNIST Dataset


**Importing necessary libraries**

In [None]:
import pandas as pd
import numpy as np

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import matplotlib.pyplot as plot
from sklearn.model_selection import train_test_split

import sklearn.gaussian_process as gp

from sklearn.model_selection import KFold

from __future__ import absolute_import, division, print_function, unicode_literals
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

import matplotlib.pyplot as plt

print(tf.__version__)

1.15.0


**Importing gp.py**

In [None]:
from google.colab import files
src = list(files.upload().values())[0]
open('gp.py','wb').write(src)
import gp

%load gp.py
%run gp.py

Saving gp.py to gp (1).py


6563

**Reading the Fashion-Mnist Data**

In [None]:
fashion_mnist = keras.datasets.fashion_mnist

**Train-Test Split**

In [None]:
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

### Approach


    No of models used- 2

    First model - Using one hidden layer
    Second model- Using two hidden layer

    For both the models we first choose the no of nodes in the layer/layers randomly. We then check the accuracy of the model in train and test sets.

    Now our target is to find the optimal no of nodes for a layer for each model.
    We will use Bayes Optimization to find the optimal architechture for these two cases. We then check the accuracy in the train and test set by fitting models using optimal no of nodes.

    So, steps- 
    1. Fit model 1 (By randomly choosing the no of nodes)
    2. Evaluate the model on train and test set
       Now use Bayes Optimization to find optimal no of nodes.
    3. Define a sample loss function(here we used cross validation) and fix the bounds for the parameters
    4. Find the optimal no of nodes
    5. Fit another model using the optimal no of nodes
    6. Evaluate the model on train and test sets

    We follow the same steps for model 2 (where we used two hidden layers)   

**Defining the first model**

    Here,
    no of hidden layers=1 
    Activation function in hidden layer- Relu
    No of Nodes in hidden layer= 128

    No of nodes in output layer= 10 (as total 10 classes are there)
    Activation function in output layer- Softmax

    Optimizer- ADAM
    Loss Function- Sparse Categorical Crossentropy
    Metric= Accuracy


In [None]:
#one parameter, h1. It defines the no of nodes in the hidden layer.
def create_model(h1):
  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(h1, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
  ])
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

  return model

**Fitting the model**

    Verbose is set to zero. So it will not show the information for any epochs. If we set the verbose to 3, we can see the details for each epochs. 

In [None]:
model=create_model(128)
model.fit(train_images, train_labels, batch_size=512, epochs=30, verbose=0)

Instructions for updating:
If using Keras pass *_constraint arguments to layers.


<tensorflow.python.keras.callbacks.History at 0x7f1d2a602eb8>

**Evaluating the model**

In [None]:
train_loss, train_acc = model.evaluate(train_images,  train_labels, verbose=0)
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=0)
print("Train Accuracy",train_acc)
print("Test Accuracy",test_acc)

Train Accuracy 0.88306665
Test Accuracy 0.8478


**Applying Bayesian Optimization**

    Target: To find the optimal architecture for the above fitted model
    Here we have the checked the accuracy of the model fitted using one hidden layer and 128 nodes in that layer
    We don't have idea about how many nodes to use in the hidden layer. So here we will use Bayesian Optimization to identify the optimal number of nodes in that hidden layer

**Defining Sample Loss Function**

    Used the Cross validation score.
    No of splits- 4

In [None]:
def sample_loss_NN(params):
  h1 = np.int(params[0])
  n_split=4
  cv_scores=[]

  for train_index,test_index in KFold(n_split).split(train_images):
    x_train,x_test=train_images[train_index],train_images[test_index]
    y_train,y_test=train_labels[train_index],train_labels[test_index]
  
    model_cv=create_model(h1)
    model_cv.fit(x_train, y_train,batch_size=512,epochs=20,verbose=0)
  
    #print('Model evaluation ',model_cv.evaluate(x_test,y_test))
    cv_scores.append(model_cv.evaluate(x_test,y_test,verbose=0)[1])
  #print(cv_scores)
  return(np.array(cv_scores).mean())

    Bounds for the parameters

In [None]:
bounds = np.array([[60,300]])
print(bounds)

xp, yp = bayesian_optimisation(n_iters=10, sample_loss=sample_loss_NN, 
                               bounds=bounds,
                               n_pre_samples=10)


[[ 60 300]]




**Optimal number of nodes**

In [None]:
#print(xp)
#print(yp)

# The maximum is at:
xp_hat = np.round(xp[np.array(yp).argmax(), :])

print(np.round(xp_hat))

[297.]


**Fitting the model with the optimal no of nodes obtained**

In [None]:
model_1_gp=create_model(np.round(xp_hat))
model_1_gp.fit(train_images, train_labels, batch_size=512, epochs=30, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f1cd38e6128>

**Evaluating the model fitted using optimal no of nodes**

In [None]:
train_loss, train_acc = model_1_gp.evaluate(train_images,  train_labels, verbose=0)
test_loss, test_acc = model_1_gp.evaluate(test_images,  test_labels, verbose=0)
print("Train Accuracy",train_acc)
print("Test Accuracy",test_acc)

Train Accuracy 0.896
Test Accuracy 0.8386


    We now see the optimal architechture using 2 hidden layers
    Here we will try to find the optimal no of nodes for each of the layers.
    Except that we can also check for optimal batch size. But batch size of 512 is used here.

**Defining 2nd model**

    No of layes is 2. All other parameters are same.



In [None]:
#one parameter, h1. It defines the no of nodes in the hidden layer.
def create_model_2(h1,h2):
  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(h1, activation='relu'),
    keras.layers.Dense(h2, activation='relu'),                 ### added layer
    keras.layers.Dense(10, activation='softmax')
  ])
  model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

  return model

**Fitting model 2 with randomly chosen number of nodes**

    Here no of nodes in each hidden layer chosen as 128.
    Activation function in both hidden layers- Relu
    Other architecture is same as the first model

In [None]:
model_2=create_model_2(128,128)
model_2.fit(train_images, train_labels, batch_size=512, epochs=30, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f1cd37f7668>

**Evaluating Model 2**

In [None]:
train_loss, train_acc = model_2.evaluate(train_images,  train_labels, verbose=0)
test_loss, test_acc = model_2.evaluate(test_images,  test_labels, verbose=0)
print("Train Accuracy",train_acc)
print("Test Accuracy",test_acc)

Train Accuracy 0.8933167
Test Accuracy 0.8364


**Applying Bayes optimization**

**Defining sample loss function**

    Same as the first model.
    Cross validation score is used.

In [None]:
def sample_loss_2(params):
  h1 = np.int(params[0])
  h2 = np.int(params[1])

  n_split=4
  cv_scores=[]

  for train_index,test_index in KFold(n_split).split(train_images):
    x_train,x_test=train_images[train_index],train_images[test_index]
    y_train,y_test=train_labels[train_index],train_labels[test_index]
  
    model_cv=create_model_2(h1,h2)
    model_cv.fit(x_train, y_train,batch_size=512,epochs=20,verbose=0)
  
    #print('Model evaluation ',model_cv.evaluate(x_test,y_test))
    cv_scores.append(model_cv.evaluate(x_test,y_test,verbose=0)[1])
  #print(cv_scores)
  return(np.array(cv_scores).mean())

In [None]:
bounds = np.array([[60,300],[60,300]])
#print(bounds)

xp, yp = bayesian_optimisation(n_iters=10, sample_loss=sample_loss_NN, 
                               bounds=bounds,
                               n_pre_samples=10)




**Optimal no of nodes**

In [None]:
xp_hat = np.round(xp[np.array(yp).argmax(), :])

print(np.round(xp_hat))

[297. 279.]


**Fitting and evaluating Model 2 with optimal architechture for model 2**

In [None]:
model_2_gp=create_model_2(xp_hat[0],xp_hat[1])
model_2_gp.fit(train_images, train_labels, batch_size=512, epochs=30, verbose=0)

<tensorflow.python.keras.callbacks.History at 0x7f1cd0047978>

In [None]:
train_loss, train_acc = model_2_gp.evaluate(train_images,  train_labels, verbose=0)
test_loss, test_acc = model_2_gp.evaluate(test_images,  test_labels, verbose=0)
print("Train Accuracy",train_acc)
print("Test Accuracy",test_acc)

Train Accuracy 0.9209167
Test Accuracy 0.8632


    We see for both the models,(model 1 with one hidden layer and model 2 with 2 hidden layers), using Bayes Optimization we have found the optimal no of nodes. 

    The accuracy has increased in both train and test data in case of both the models.