# Keras DL Exploration

I am exploring building a deep learning model using the Keras framework in order to solve the Pima Indians Diabetes Database Challenge (https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database). 

This dataset was taken directly from the Kaggle website, which has taken the data from the National Insititute of Diabetes and Digestive and Kidney Diseases. The data was collected for a collection of female patients born of Pima Indian hertiage, aged 21 years or older, in order to predict whether certain physical health measures could act as predictors for whether or not the patient has diabetes.

This notebook is part 2 of 2. In this part (2) I explore model tuning and optimization of the base model which I constructed in part 1 using a 5-fold cross-validation grid search.

In [1]:
#start by importing the necessary packages
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import TensorBoard
import tensorflow.random
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
# !pip install scikeras
from scikeras.wrappers import KerasClassifier
from tensorflow.keras.constraints import MaxNorm

## Import and pre-process the data

Following the same steps as was done in notebook 1, import the data to a pandas dataframe, scale the features and split into training and testing sets.

In [2]:
#read in the data
df = pd.read_csv('diabetes.csv')

#identify the feature matrix and target variable
X = df.drop('Outcome', axis=1)
y = df.Outcome

#standardize the features
scaler = StandardScaler()
X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

#split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Define the Keras model- model optimization/tuning

### Epochs and Batch Size

Tuning the number of training epochs and the batch size- let's use accuracy as the scoring metric.

Note: in order to use a Keras model in scikit-learn (gridsearchCV) I have to use a SciKeras wrapper (ie KerasClassifier() in the context of this problem).

In [9]:
#starting with the basic network from the previous notebook- remember to use scikeras

def create_model():
    model = Sequential()
    model.add(Dense(12, input_shape=(8,), activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

model = KerasClassifier(model = create_model, verbose = 0)

#use a random seed for reproducibility
seed = 10
tensorflow.random.set_seed(seed)

#now define the parameter grid (epochs and batch sizes to explore)
batch_sizes = [5, 10, 20 ,40, 60, 80, 100]
epochs = [50, 100, 150]
param_grid = dict(batch_size = batch_sizes,
                 epochs = epochs)

#define and fit the grid-search object
grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid,
                   n_jobs = -1, 
                   cv = 5,
                   scoring = 'accuracy')
grid_result = grid.fit(X_train, y_train)

#summarize results
print(grid_result.best_score_, grid_result.best_params_)

0.7606024256963881 {'batch_size': 40, 'epochs': 50}


### Number of Neurons in hidden layers


In [17]:
def create_model(neurons):
    model = Sequential()
    model.add(Dense(neurons, input_shape=(8,), activation='relu'))
    model.add(Dense(neurons, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

model = KerasClassifier(model = create_model, batch_size=40, epochs= 50, verbose = 0)

neurons = [2,6,8,10,16,20, 24,30]
param_grid=dict(model__neurons=neurons)

grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid, 
                   n_jobs = -1, 
                   cv = 5, 
                   scoring = 'accuracy')
grid_results=grid.fit(X_train, y_train)

#summarize results
print(grid_results.best_score_, grid_results.best_params_)

0.742702918832467 {'model__neurons': 16}


### Model optimization algorithm

Tune for the best optimizer out of adam, adamax, adagrad, rmsprop, and sgd. This time, alter the create_model function so that it does not yet compile the model- since we are tuning the optimization algorithm and this is done in the model compilation step, we will leave this to the KerasClassifier wrapper which will compile the model and set the optimizer.

In [18]:
def create_model():
    model = Sequential()
    model.add(Dense(16, input_shape=(8,), activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    return model

model = KerasClassifier(model = create_model,
                        batch_size = 40, #make sure to set the optimized batchsize/epochs
                        epochs = 50, 
                        loss = 'binary_crossentropy',
                        verbose = 0)

#use a random seed for reproducibility
seed = 10
tensorflow.random.set_seed(seed)

#now define the parameter grid (optimizers to explore)
optimizers = ['adam', 'adamax', 'adagrad', 'rmsprop', 'sgd']
param_grid = dict(optimizer = optimizers)

#define and fit the grid-search object
grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid,
                   n_jobs = -1, 
                   cv = 5,
                   scoring = 'accuracy')
grid_result = grid.fit(X_train, y_train)

#summarize results
print(grid_result.best_score_, grid_result.best_params_)

0.7557377049180328 {'optimizer': 'adam'}


### Network Weight Initialization

Tune for the initialization method of weights in the network. Try out uniform, normal, zero, glorot normal and glorot uniform initializers.

In [20]:
def create_model(init_mode = 'uniform'): #have to pass the wrapper a default init mode
    model = Sequential()
    model.add(Dense(16, input_shape=(8,), kernel_initializer = init_mode, activation='relu'))
    model.add(Dense(16, kernel_initializer = init_mode, activation='relu'))
    model.add(Dense(1, kernel_initializer = init_mode, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


model = KerasClassifier(model = create_model,
                        batch_size = 40, #make sure to set the optimized batchsize/epochs
                        epochs = 50, 
                        verbose = 0)

#use a random seed for reproducibility
seed = 10
tensorflow.random.set_seed(seed)

#define the parameter grid of initialization methods
init_modes = ['uniform', 'zero', 'normal', 'glorot_uniform', 'glorot_normal']
param_grid = dict(model__init_mode=init_modes) #ask the wrapper to route the parameter to the creat_model function

#define and fit the gridsearch object
grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid,
                   n_jobs = -1,
                   cv = 5,
                    scoring = 'accuracy')
grid_result = grid.fit(X_train, y_train)

#summarize results
print(grid_result.best_score_, grid_result.best_params_)


0.7508463281354125 {'model__init_mode': 'uniform'}


### Tune the activation function in the hidden layers

Explore relu, softmax, tanh, sigmoid, hard_sigmoid, and linear activation functions in the hidden layers.



In [21]:
def create_model(activation = 'relu'): #pass the wrapper relu to start
    model = Sequential()
    model.add(Dense(16, input_shape=(8,), kernel_initializer = 'uniform', activation=activation))
    model.add(Dense(16, kernel_initializer = 'uniform', activation=activation))
    model.add(Dense(1, kernel_initializer = 'uniform', activation=activation))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


model = KerasClassifier(model = create_model,
                        batch_size = 40, #make sure to set the optimized batchsize/epochs
                        epochs = 50, 
                        verbose = 0)

#use a random seed for reproducibility
seed = 10
tensorflow.random.set_seed(seed)

activations=['relu', 'softmax', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(model__activation=activations)

#define and fit the gridsearch object
grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid, 
                   n_jobs = -1, 
                   cv = 5,
                   scoring = 'accuracy')
grid_result = grid.fit(X_train, y_train)

#summarize results
print(grid_result.best_score_, grid_result.best_params_)


0.765480474476876 {'model__activation': 'relu'}


### Dropout Regularization

Let's explore the effects of adding regularization to the model (specifically, dropout regularization).

In [3]:
def create_model(dropout_rate, weight_constraint): #pass the wrapper relu to start
    model = Sequential()
    model.add(Dense(16, input_shape=(8,), kernel_initializer = 'uniform', activation='relu', kernel_constraint=MaxNorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(16, kernel_initializer = 'uniform', activation='relu', kernel_constraint=MaxNorm(weight_constraint)))
#     model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model


model = KerasClassifier(model = create_model,
                        batch_size = 40, #make sure to set the optimized batchsize/epochs
                        epochs = 50, 
                        verbose = 0)

weight_constraints = [1.0, 2.0, 3.0, 4.0, 5.0]
dropout_rates = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

param_grid = dict(model__dropout_rate=dropout_rates, model__weight_constraint=weight_constraints)

#define and fit the gridsearch object
grid = GridSearchCV(estimator = model, 
                   param_grid = param_grid, 
                   n_jobs = -1, 
                   cv = 5,
                   scoring = 'accuracy')
grid_results=grid.fit(X_train, y_train)

#summarize results
print(grid_results.best_score_, grid_results.best_params_)

0.7801279488204719 {'model__dropout_rate': 0.7, 'model__weight_constraint': 3.0}


### The 'best' model

Now let's try and fit the model with all these optimized hyperparameters and find out whether it improves the performance of the original model (which had achieved a testing accuracy of about 76%).

In [22]:
model = Sequential()
model.add(Dense(16, input_shape=(8,), kernel_initializer = 'uniform', activation='relu', kernel_constraint=MaxNorm(3.0)))
# model.add(Dropout(0.7))
model.add(Dense(16, kernel_initializer = 'uniform', activation='relu', kernel_constraint=MaxNorm(3.0)))
model.add(Dense(1, kernel_initializer = 'uniform', activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [23]:
#create a TensorBoard logger object to log training of the model
logger = TensorBoard(log_dir='logs',
                    write_graph = True, 
                    histogram_freq = 5)

In [24]:
#train the model
model.fit(X_train, y_train, epochs=150, batch_size=30, callbacks=[logger])

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

Epoch 84/150
Epoch 85/150
Epoch 86/150
Epoch 87/150
Epoch 88/150
Epoch 89/150
Epoch 90/150
Epoch 91/150
Epoch 92/150
Epoch 93/150
Epoch 94/150
Epoch 95/150
Epoch 96/150
Epoch 97/150
Epoch 98/150
Epoch 99/150
Epoch 100/150
Epoch 101/150
Epoch 102/150
Epoch 103/150
Epoch 104/150
Epoch 105/150
Epoch 106/150
Epoch 107/150
Epoch 108/150
Epoch 109/150
Epoch 110/150
Epoch 111/150
Epoch 112/150
Epoch 113/150
Epoch 114/150
Epoch 115/150
Epoch 116/150
Epoch 117/150
Epoch 118/150
Epoch 119/150
Epoch 120/150
Epoch 121/150
Epoch 122/150
Epoch 123/150
Epoch 124/150
Epoch 125/150
Epoch 126/150
Epoch 127/150
Epoch 128/150
Epoch 129/150
Epoch 130/150
Epoch 131/150
Epoch 132/150
Epoch 133/150
Epoch 134/150
Epoch 135/150
Epoch 136/150
Epoch 137/150
Epoch 138/150
Epoch 139/150
Epoch 140/150
Epoch 141/150
Epoch 142/150
Epoch 143/150
Epoch 144/150
Epoch 145/150
Epoch 146/150
Epoch 147/150
Epoch 148/150
Epoch 149/150
Epoch 150/150


<keras.callbacks.History at 0x1780aec40>

In [25]:
#now evaluate the model performance on the test set!!
loss, accuracy = model.evaluate(X_test, y_test)



## Conclusions:
I was able to improve the model performance by a small amount (~1.3%) by using the hyperparameter values I found in the above cross-validation gridsearches. Though this is a small amount, I was able to experiement with model optimization and tuning using the cross-valudation grid-search framework from sklearn, which was ultimately the goal of this exercise, after all. Perhaps next steps in further improving the model performance would be to refine the grids I used to tune the hyper parameters, or to explore the effects of further simplification of the model (also, perhaps exploring the number of hidden layers in the network, as well as the size of each) may prove successful in improving the model performance significantly.