# Sklearn based Grid Search for NN models built using Keras and TensorFlow

`sklearn` is one of the most popular libraries for machine learning. In this notebook, we will explore
1. How to evaluate model performance using K-Fold Validation
2. How to tune Grid Search model using sklearn `GridsearcCV` function

Before we go ahead, if you are new with Keras and Tensorflow and want to build from scratch, [this notebook](https://www.kaggle.com/sumitkant/simple-neural-network-with-keras-and-tensorflow) has a step by step breakdown of building a simple Multilayer perceptron model.

**Let's K-Fold and Gridsearch...**

## Load and Describe
We'll use `pandas` to load the dataset and identify some basic stats about the dataset like,
* the shape of dataset we are dealing with using `df.shape`
* the event rate = ones/(ones + zeros)
* the distribution of variables using `df.describe`

In [None]:
import pandas as pd
import numpy as np

df = pd.read_csv('../input/pima-indians-diabetes-database/diabetes.csv')
df.head()

In [None]:
print ('Number of Rows :', df.shape[0])
print ('Number of Columns :', df.shape[1])
print ('Number of Patients with outcome 1 :', df.Outcome.sum())
print ('Event Rate :', round(df.Outcome.mean()*100,2) ,'%')

In [None]:
df.describe()

## Preprocess
Deep learning models scale well with normalized variables, so let's go ahead and `normalize` variables using `sklearn.preprocessing` module

In [None]:
from sklearn.preprocessing import normalize

X = df.to_numpy()[:,0:8] 
Y = df.to_numpy()[:,8]

X_norm = normalize(X)

## Define model

The Keras library provides `KerasClassifier` wrapper for deep learning models to be used as classification in scikit-learn. This wrapper class in Keras take an argument `build_fn` which takes the function used to define the model as the value. Below we will define a `Sequential` model inside the function `create_model`


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

SEED = 42
np.random.seed(SEED)

def create_model(optimizer = 'adam'):
    
    model = Sequential()
    model.add(Dense(12, activation = 'relu', input_dim=(8)))
    model.add(Dense( 8, activation = 'relu'))
    model.add(Dense( 1, activation = 'sigmoid'))
    
    model.compile(loss = 'binary_crossentropy', optimizer= optimizer, metrics = ['accuracy'])
    
    return model

model = KerasClassifier(build_fn = create_model, epochs = 150, batch_size = 8, verbose = 0)

## K-Fold
We will use `StratifiedKFold` to generate folds and define the number of splits by passing value to `n_splits` argument. We can use 8 folds here since 768/8 = 96 observations in each fold. The results of the K-Fold can be measured using `cross_val_score`

In [None]:
from sklearn.model_selection import StratifiedKFold, cross_val_score
kfold = StratifiedKFold(n_splits = 8, shuffle = True, random_state = SEED)

%time results = cross_val_score(model, X_norm, Y, cv=kfold)
print ('Accuracy',round(results.mean()*100,2), '%')

## Grid Search

Below we will define the parameter grid, which will contain the values of paramters like `optimizer`, `epochs` etc. on which we will tune our model. We will use sklearn's `GridSearchCV` and pass on `model`, `param_grid` and `kfold` folds as arguments. Each parameter will be evaluated on 3 stratified cross-validation. This step is computationally intensive, a total of 81 models derived out of 3 x 3 x 3 (params) x 3 (folds) are being built here. 

In [None]:
param_grid = {
    'optimizer'  : ['rmsprop','adam','sgd'],
    'epochs'     : [100, 150, 200],
    'batch_size' : [8, 16, 32],
}

In [None]:
%%time 
from sklearn.model_selection import GridSearchCV

model = KerasClassifier(build_fn = create_model, verbose = 0)
grid = GridSearchCV(estimator = model, param_grid = param_grid, cv = 3)
grid_result = grid.fit(X_norm, Y)

print (f'With {grid_result.best_params_} got {round(grid_result.best_score_*100,2)} as best score!!')

The `GridSearchCV` results are stored in `cv_results_` argument for `grid_result` object, which can be used to visualize all grid search results

In [None]:
print ('---- GRID SEARCH RESULTS ----')
for p,s in zip(grid_result.cv_results_['params'],grid_result.cv_results_['mean_test_score']):
    print (f' Accuracy : {round(s*100,2)} % | Param : {p}')