## Pima Indians Onset of Diabetes Dataset - Keras + ScikitLearn

### Evaluate Deep learning models with Cross validation

10 fold cross validation on deeper networks can add to higher computational requirements so it is important to make a conscious decision about using it.

In this example StratifiedKFold function from scikit-learn is used. <br>
scikit-learn funciton cross_val_score() has been used to evaluate the model using cross validation scheme and print results.


In [2]:
# MLP for Pima Indians Dataset with 10-fold cross validation via sklearn
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.cross_validation import StratifiedKFold
from sklearn.cross_validation import cross_val_score
import numpy
import pandas


In [3]:
# Function to create model, required for KerasClassifier
def create_model():
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu')) 
    model.add(Dense(8, kernel_initializer='uniform', activation='relu')) 
    model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
      # Compile model
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) 
    return model


In [4]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)


In [6]:
# load pima indians dataset
path = '/Users/Deepthi/Documents/Self Study/DeepLearning/deep_learning_with_python/deep_learning_with_python_code/data/'
dataset = numpy.loadtxt(path+"pima-indians-diabetes.csv", delimiter=",")

In [7]:
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]


In [8]:
# create model
model = KerasClassifier(build_fn=create_model, nb_epoch=150, batch_size=10)


In [9]:
# evaluate using 10-fold cross validation
kfold = StratifiedKFold(y=Y, n_folds=10, shuffle=True, random_state=seed)
results = cross_val_score(model, X, Y, cv=kfold)
print(results.mean())

Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
0.651059473549


### Grid Search Deep Learning Model Parameters

Using GridSearchCV, the network can be optimized by searching for:

1. Optimizers for searching different weight values.
2. Initializers for preparing the network weights using different schemes.
3. Number of epochs for training the model for different number of exposures to the training dataset.
4. Batches for varying the number of samples before weight updates.

In [11]:
# MLP for Pima Indians Dataset with grid search via sklearn
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import numpy
import pandas


In [12]:
# Function to create model, required for KerasClassifier
def create_model(optimizer='rmsprop', init='glorot_uniform'):
    # create model
    model = Sequential()
    model.add(Dense(12, input_dim=8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(8, kernel_initializer=init, activation='relu')) 
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
      # Compile model
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) 
    return model


In [13]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)


In [15]:
# load pima indians dataset
path = '/Users/Deepthi/Documents/Self Study/DeepLearning/deep_learning_with_python/deep_learning_with_python_code/data/'
dataset = numpy.loadtxt(path+"pima-indians-diabetes.csv", delimiter=",")


In [16]:
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]


In [17]:
# create model
model = KerasClassifier(build_fn=create_model)


In [19]:
# grid search epochs, batch size and optimizer
optimizers = ['rmsprop', 'adam']
init = ['glorot_uniform', 'normal', 'uniform']
epochs = numpy.array([50, 100, 150])
batches = numpy.array([5, 10, 20])
param_grid = dict(optimizer=optimizers, nb_epoch=epochs, batch_size=batches, init=init) 
grid = GridSearchCV(estimator=model, param_grid=param_grid)
grid_result = grid.fit(X, Y)


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


Epoch 1/1
Epoch 1/1
Epoch 1/1
Epoch 1/1


In [59]:
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

params = grid_result.cv_results_.get('params')
mean_test_score = grid_result.cv_results_.get('mean_test_score')
std_test_score = grid_result.cv_results_.get('std_test_score')

for i in range(len(params)):
    print mean_test_score[i], std_test_score[i], params[i]

Best: 0.653646 using {'init': 'normal', 'optimizer': 'adam', 'nb_epoch': 100, 'batch_size': 5}
0.61067709676 0.0504630788379 {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 50, 'batch_size': 5}
0.634114596915 0.029634913092 {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 50, 'batch_size': 5}
0.585937512767 0.0469833809249 {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 100, 'batch_size': 5}
0.632812512592 0.0261066897314 {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 100, 'batch_size': 5}
0.35156250813 0.0240797424855 {'init': 'glorot_uniform', 'optimizer': 'rmsprop', 'nb_epoch': 150, 'batch_size': 5}
0.54947917891 0.0478770227214 {'init': 'glorot_uniform', 'optimizer': 'adam', 'nb_epoch': 150, 'batch_size': 5}
0.651041679434 0.0247738254628 {'init': 'normal', 'optimizer': 'rmsprop', 'nb_epoch': 50, 'batch_size': 5}
0.651041679434 0.0247738254628 {'init': 'normal', 'optimizer': 'adam', 'nb_epoch': 50, 'batch_size': 5}
0.65234376268

In [None]:
# [mydict[x] for x in mykeys]