# Exercise 10

Hyperpameter tuning/optimization: Random and Grid Search

Pima Indian Diabetes dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.

The datasets consists of several medical predictor variables and one target variable, _Outcome_. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

[Pima Indian Diabes dataset in Kaggle](https://www.kaggle.com/uciml/pima-indians-diabetes-database)

Dataset can found from Moodle -> Dataset.

NOTE: We use small NN and result will not necessary be very good. This exercise only demonstrate the Grid and Random search methods. 

## Grid Search

Lots of code examples can be found from [this tutorial](https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/) for different hyperparameters.

### Use scikit-learn to grid search the batch size, epochs, dropout_rate and learning_rate

In [1]:
# Import libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import Adam
from scipy.stats import uniform as sp_randFloat
from scipy.stats import randint as sp_randInt

In [2]:
# read data
df = pd.read_csv('data\diabetes.csv')
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [3]:
df['Outcome'].value_counts()

0    500
1    268
Name: Outcome, dtype: int64

In [4]:
df.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [6]:
df.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [7]:
# Dataset splitting

X = df.drop('Outcome',axis=1).values
y = df['Outcome'].values

In [8]:
scaler = MinMaxScaler()
scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [9]:
X_train

array([[  2.   ,  84.   ,   0.   , ...,   0.   ,   0.304,  21.   ],
       [  9.   , 112.   ,  82.   , ...,  28.2  ,   1.282,  50.   ],
       [  1.   , 139.   ,  46.   , ...,  28.7  ,   0.654,  22.   ],
       ...,
       [ 10.   , 101.   ,  86.   , ...,  45.6  ,   1.136,  38.   ],
       [  0.   , 141.   ,   0.   , ...,  42.4  ,   0.205,  29.   ],
       [  0.   , 125.   ,  96.   , ...,  22.5  ,   0.262,  21.   ]])

In [10]:
# Define the grid search parameters as follows:
# learning_rate:[0.001, 0.01, 0.1]
# batch_size: [10, 40 , 80]
# number of epochs: [10, 50, 100]
# dropout_rate:  [0.0, 0.3, 0.5]

learning_rate = [0.001, 0.01, 0.1]
batch_size = [10, 40, 80]
epochs = [10, 50, 100]
dropout_rate =  [0.0, 0.3, 0.5]

# NOTE: If you don't have enough memory in your computer, drops some parameters

In [11]:
#Create model
    #Input layer = 8
    #Only one hidden layer = 12 Dense nodes, activation function = relu
    #Dropout layer
    #Output layer: 1 node, activation = sigmoid

def create_model(dropout_rate, learning_rate):
    model = Sequential()
    model.add(Dense(12, input_shape = (8,), activation = 'relu'))
    model.add(Dense(12, activation = 'relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation = 'sigmoid'))
    
    opt = Adam(learning_rate = learning_rate)
    model.compile(loss = 'binary_crossentropy', optimizer = opt, metrics =['accuracy'])
    return model

# fix random seed for reproducibility
#seed = 7
#np.random.seed(seed)

#Define KerasClassifier
model = KerasClassifier(build_fn = create_model, verbose = 0)


In [12]:
#Set parameter dictionary
param_grid = dict(batch_size=batch_size, epochs=epochs, dropout_rate = dropout_rate, learning_rate = learning_rate)

#Define grid = GridSearcCV(....)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3,verbose = 1)


In [13]:
#Results
grid_result = grid.fit(X_train, y_train)

Fitting 3 folds for each of 81 candidates, totalling 243 fits


In [14]:
# Summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))



Best: 0.713383 using {'batch_size': 40, 'dropout_rate': 0.0, 'epochs': 50, 'learning_rate': 0.01}
0.649872 (0.022117) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 10, 'learning_rate': 0.001}
0.659692 (0.036127) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 10, 'learning_rate': 0.01}
0.648199 (0.014580) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 10, 'learning_rate': 0.1}
0.695449 (0.005582) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 50, 'learning_rate': 0.001}
0.706807 (0.018803) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 50, 'learning_rate': 0.01}
0.653077 (0.021359) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 50, 'learning_rate': 0.1}
0.688945 (0.044631) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 100, 'learning_rate': 0.001}
0.701857 (0.048509) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs': 100, 'learning_rate': 0.01}
0.653077 (0.021359) with: {'batch_size': 10, 'dropout_rate': 0.0, 'epochs':

## Random Search

In [15]:
def create_model_random2(dropout_rate, learning_rate):
    model = Sequential()
    model.add(Dense(8, input_dim = 8, activation = 'relu'))
    model.add(Dense(12, input_dim = 8, activation = 'relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, activation = 'sigmoid'))
    
    opt = Adam(learning_rate = learning_rate)
    model.compile(loss = 'binary_crossentropy', optimizer = opt, metrics =['accuracy'])
    return model


#Define KerasClassifier
model_random2 = KerasClassifier(build_fn = create_model_random2, verbose = 0)

In [16]:
parameters_dict = dict(
    learning_rate =  learning_rate, # sp_randFloat(0.001, 0.1), # RuntimeError, if replaced with this / these
    batch_size = batch_size,        # sp_randInt(10, 80),
    epochs = epochs,                # sp_randInt(10, 100),
    dropout_rate = dropout_rate     # sp_randFloat(0.0, 0.5)
)

#Define random_search = RandomizedSearchCV(...)

n_iter_search = 16 # Number of parameter settings that are sampled.
random_search2 = RandomizedSearchCV(estimator = model_random2, 
                                   param_distributions = parameters_dict,
                                   n_iter = n_iter_search,
                                   n_jobs = 1, # Number of jobs to run in parallel. Default none, meaning in most cases 1.
                                   cv = 3 , # default 5-fold cross validation generator
                                   verbose = 10)

In [17]:
random_search2.fit(X_train, y_train)

Fitting 3 folds for each of 16 candidates, totalling 48 fits
[CV 1/3; 1/16] START batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01
[CV 1/3; 1/16] END batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01;, score=0.698 total time=   0.9s
[CV 2/3; 1/16] START batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01
[CV 2/3; 1/16] END batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01;, score=0.634 total time=   0.8s
[CV 3/3; 1/16] START batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01
[CV 3/3; 1/16] END batch_size=40, dropout_rate=0.5, epochs=50, learning_rate=0.01;, score=0.647 total time=   0.8s
[CV 1/3; 2/16] START batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.001
[CV 1/3; 2/16] END batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.001;, score=0.683 total time=   1.4s
[CV 2/3; 2/16] START batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.001
[CV 2/3; 2/16] END batch_size=40, dropout_rate=0.3, e

[CV 1/3; 11/16] END batch_size=10, dropout_rate=0.0, epochs=50, learning_rate=0.1;, score=0.683 total time=   1.6s
[CV 2/3; 11/16] START batch_size=10, dropout_rate=0.0, epochs=50, learning_rate=0.1
[CV 2/3; 11/16] END batch_size=10, dropout_rate=0.0, epochs=50, learning_rate=0.1;, score=0.634 total time=   1.7s
[CV 3/3; 11/16] START batch_size=10, dropout_rate=0.0, epochs=50, learning_rate=0.1
[CV 3/3; 11/16] END batch_size=10, dropout_rate=0.0, epochs=50, learning_rate=0.1;, score=0.642 total time=   1.6s
[CV 1/3; 12/16] START batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.01
[CV 1/3; 12/16] END batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.01;, score=0.678 total time=   1.1s
[CV 2/3; 12/16] START batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.01
[CV 2/3; 12/16] END batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=0.01;, score=0.634 total time=   1.1s
[CV 3/3; 12/16] START batch_size=40, dropout_rate=0.3, epochs=100, learning_rate=

RandomizedSearchCV(cv=3,
                   estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasClassifier object at 0x000002761F4353D0>,
                   n_iter=16, n_jobs=1,
                   param_distributions={'batch_size': [10, 40, 80],
                                        'dropout_rate': [0.0, 0.3, 0.5],
                                        'epochs': [10, 50, 100],
                                        'learning_rate': [0.001, 0.01, 0.1]},
                   verbose=10)

In [18]:
#Summarize results
print("Best: %f using %s" % (random_search2.best_score_, random_search2.best_params_))
means = random_search2.cv_results_['mean_test_score']
stds = random_search2.cv_results_['std_test_score']
params = random_search2.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.716659 using {'learning_rate': 0.01, 'epochs': 100, 'dropout_rate': 0.3, 'batch_size': 10}
0.659589 (0.027363) with: {'learning_rate': 0.01, 'epochs': 50, 'dropout_rate': 0.5, 'batch_size': 40}
0.664491 (0.014073) with: {'learning_rate': 0.001, 'epochs': 100, 'dropout_rate': 0.3, 'batch_size': 40}
0.654719 (0.041866) with: {'learning_rate': 0.01, 'epochs': 50, 'dropout_rate': 0.0, 'batch_size': 80}
0.671035 (0.013401) with: {'learning_rate': 0.001, 'epochs': 100, 'dropout_rate': 0.0, 'batch_size': 40}
0.716659 (0.041102) with: {'learning_rate': 0.01, 'epochs': 100, 'dropout_rate': 0.3, 'batch_size': 10}
0.633549 (0.015954) with: {'learning_rate': 0.001, 'epochs': 10, 'dropout_rate': 0.3, 'batch_size': 10}
0.659613 (0.019973) with: {'learning_rate': 0.1, 'epochs': 50, 'dropout_rate': 0.0, 'batch_size': 80}
0.653077 (0.021359) with: {'learning_rate': 0.1, 'epochs': 100, 'dropout_rate': 0.0, 'batch_size': 40}
0.653077 (0.021359) with: {'learning_rate': 0.1, 'epochs': 10, 'dropout_

# Conclusion

GridSearch:
Best: 0.713383 using {'batch_size': 40, 'dropout_rate': 0.0, 'epochs': 50, 'learning_rate': 0.01}
RandomSearchCV:
Best: 0.716659 using {'learning_rate': 0.01, 'epochs': 100, 'dropout_rate': 0.3, 'batch_size': 10}

If I try to generate random parameters when using RandomSearchCV, the fit will end at cannot clone object RuntimeError that seems to be a known issue after scikit 0.21.2. So, I ended up using set values and I am very interested in getting help in applying the randomness of the random search :)

This application of random searh seemed to be bit faster than grid search. Both are a lot handier in hyperparameter tuning than guesswork.



## Random Search - aiempi yritys

In [19]:
# #Create model
# #Same as in Grid search example 

# def create_model_random(dropout_rate, learning_rate):
#     model = Sequential()
#     model.add(Dense(8, input_shape = (8,), activation = 'relu'))
#     model.add(Dense(12, activation = 'relu'))
#     model.add(Dropout(dropout_rate))
#     model.add(Dense(1, activation = 'sigmoid'))
    
#     opt = Adam(learning_rate = learning_rate)
#     model.compile(loss = 'binary_crossentropy', optimizer = opt, metrics =['accuracy'])
#     return model


# #Define KerasClassifier
# model_random = KerasClassifier(build_fn = create_model_random, verbose = 0)


In [20]:
# # define the Random search parameters
# # randomly from same range as the Grid search

# parameters = {
#     'learning_rate': sp_randFloat(0.001, 0.1), # uniform
#     'batch_size': sp_randInt(10, 80), # randint
#     'epochs': sp_randInt(10, 100), # randint
#     'dropout_rate': sp_randFloat(0.0, 0.5) # uniform
# }


# #Define random_search = RandomizedSearchCV(...)

# n_iter_search = 16 # Number of parameter settings that are sampled.
# random_search = RandomizedSearchCV(estimator = model_random, 
#                                    param_distributions = parameters,
#                                    n_iter = n_iter_search,
#                                    #n_jobs = 1, # Number of jobs to run in parallel. Default none, meaning in most cases 1.
#                                    #cv = 2 , # default 5-fold cross validation generator
#                                    verbose = 0)

In [21]:
# #Results
# random_search.fit(X_train, y_train)

In [22]:
# #Summarize results
# print("Best: %f using %s" % (random_search.best_score_, random_search.best_params_))
# means = random_search.cv_results_['mean_test_score']
# stds = random_search.cv_results_['std_test_score']
# params = random_search.cv_results_['params']
# for mean, stdev, param in zip(means, stds, params):
#     print("%f (%f) with: %r" % (mean, stdev, param))

