# Hyperparameter optimization using Keras and the scikit-learn API

When using Keras, we can implement a grid search over hyperparameters using the scikit-learn API.

We will demonstrate this using the original regression example.

#### Important hyperparameters for training

- optimization algorithm
- learning rate
- dropout
- regularization
- batch size
- number of training epochs

As examples of grid search, we will explore varying optimizers, number of epochs, learning rate, and regularization.

In [1]:
num_epochs = 50
learning_rate = 0.01

In [2]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
from sklearn import model_selection
from sklearn.linear_model import LinearRegression

from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from keras import regularizers

Using TensorFlow backend.


In [3]:
boston = load_boston()
X = boston.data
y = boston.target
y = y.reshape(-1,1)

In [4]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=0)

In [5]:
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [6]:
# some housekeeping
input_dim = X_train.shape[1]
output_dim = 1 # for regression

#### First, let's do a grid search over optimization algorithms.

For some optimizers, dependent parameters can/should be tuned. We'll explore that later (by example).
For most optimizers, it is in fact recommended to NOT change the defaults (e.g., RMSprop, Adagrad...)

For now, the defaults will be used, e.g.
- keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)

In [7]:
# build_fn for keras.wrappers.scikit_learn.KerasRegressor(build_fn=None, **sk_params)
def create_model(optimizer = "Adam"):
    
    model = Sequential()
    model.add(Dense(output_dim , input_dim = input_dim, kernel_initializer='normal')) # activation = None for regression
    model.compile(loss='mean_squared_error', optimizer=optimizer)
    return model

In [8]:
model = KerasRegressor(build_fn=create_model, epochs = num_epochs, verbose=0)

In [9]:
# define the grid search parameters
optimizers = ['RMSprop', 'Adam', 'SGD']
grid = GridSearchCV(estimator=model, cv=10, param_grid=dict(optimizer = optimizers))

# do the grid search
fit = grid.fit(X_train, y_train)

In [10]:
def report_cv_results(fit):
    means = fit.cv_results_['mean_test_score']
    sds = fit.cv_results_['std_test_score']
    params = fit.cv_results_['params']
    for mean, sd, param in zip(means, sds, params):
        print("Mean score: {:.2f}    Std. dev.: {:.2f}    Param: {}".format(mean, sd, param))

In [11]:
report_cv_results(fit)

Mean score: 539.61    Std. dev.: 60.88    Param: {'optimizer': 'RMSprop'}
Mean score: 541.15    Std. dev.: 60.89    Param: {'optimizer': 'Adam'}
Mean score: 21.42    Std. dev.: 7.57    Param: {'optimizer': 'SGD'}


#### Interestingly, with 50 epochs of training, SGD is _much_ better than the other algorithms! Let's check if Adam and RMSprop catch up with more epochs.

In [12]:
def create_model(optimizer = "SGD", epochs = num_epochs):
    
    model = Sequential()
    model.add(Dense(output_dim , input_dim = input_dim, kernel_initializer='normal')) # activation = None for regression
    model.compile(loss='mean_squared_error', optimizer=optimizer)
    return model

In [13]:
model = KerasRegressor(build_fn=create_model, epochs = num_epochs, verbose=0)

In [14]:
# define the grid search parameters
optimizers = ['SGD', 'RMSprop', 'Adam']
epochs = [50,100,150]
grid = GridSearchCV(estimator=model, cv=10, param_grid=dict(optimizer = optimizers, epochs = epochs))

# do the grid search
fit = grid.fit(X_train, y_train)
report_cv_results(fit)

Mean score: 21.46    Std. dev.: 7.70    Param: {'epochs': 50, 'optimizer': 'SGD'}
Mean score: 540.05    Std. dev.: 60.74    Param: {'epochs': 50, 'optimizer': 'RMSprop'}
Mean score: 540.57    Std. dev.: 60.78    Param: {'epochs': 50, 'optimizer': 'Adam'}
Mean score: 21.40    Std. dev.: 7.28    Param: {'epochs': 100, 'optimizer': 'SGD'}
Mean score: 504.16    Std. dev.: 56.96    Param: {'epochs': 100, 'optimizer': 'RMSprop'}
Mean score: 505.19    Std. dev.: 56.78    Param: {'epochs': 100, 'optimizer': 'Adam'}
Mean score: 21.39    Std. dev.: 7.37    Param: {'epochs': 150, 'optimizer': 'SGD'}
Mean score: 474.65    Std. dev.: 52.96    Param: {'epochs': 150, 'optimizer': 'RMSprop'}
Mean score: 476.88    Std. dev.: 54.19    Param: {'epochs': 150, 'optimizer': 'Adam'}


#### Doesn't help. How about varying the learning rate for Adam (default is 0.001)?

In [15]:
def create_model(learn_rate = learning_rate):
    
    model = Sequential()
    model.add(Dense(output_dim , input_dim = input_dim, kernel_initializer='normal')) # activation = None for regression
    optimizer = Adam(lr=learn_rate)
    model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
    return model

In [16]:
model = KerasRegressor(build_fn=create_model, epochs = num_epochs, verbose=0)

In [17]:
learning_rates = [0.001, 0.01, 0.1, 0.3, 0.5]
grid = GridSearchCV(estimator=model, cv=10, param_grid=dict(learn_rate = learning_rates))

# do the grid search
fit = grid.fit(X_train, y_train)
report_cv_results(fit)

Mean score: 540.94    Std. dev.: 60.97    Param: {'learn_rate': 0.001}
Mean score: 323.14    Std. dev.: 35.94    Param: {'learn_rate': 0.01}
Mean score: 21.45    Std. dev.: 7.44    Param: {'learn_rate': 0.1}
Mean score: 21.90    Std. dev.: 7.58    Param: {'learn_rate': 0.3}
Mean score: 22.64    Std. dev.: 7.94    Param: {'learn_rate': 0.5}


#### Finally, let's see an example of grid search for different types and degrees of regularization.

In [18]:
def create_model(regularizer = regularizers.l2(0.)):
    
    model = Sequential()
    model.add(Dense(output_dim , input_dim = input_dim, kernel_initializer='normal',
                   kernel_regularizer = regularizer))
    model.compile(loss='mean_squared_error', optimizer="SGD")
    return model

In [19]:
model = KerasRegressor(build_fn=create_model, epochs = num_epochs, verbose=0)

In [20]:
regularizer_list = [regularizers.l1(0.001), regularizers.l1(0.01), regularizers.l1(0.1), regularizers.l2(0.001), regularizers.l2(0.01), regularizers.l2(0.1)]
grid = GridSearchCV(estimator=model, cv=10, param_grid=dict(regularizer = regularizer_list))

# do the grid search
fit = grid.fit(X_train, y_train)
#report_cv_results(fit)

In [21]:
report_cv_results(fit)

Mean score: 21.56    Std. dev.: 7.63    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974dcf8>}
Mean score: 21.78    Std. dev.: 7.55    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974dd30>}
Mean score: 23.41    Std. dev.: 7.64    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974dd68>}
Mean score: 21.48    Std. dev.: 7.48    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974dda0>}
Mean score: 21.81    Std. dev.: 7.56    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974ddd8>}
Mean score: 24.64    Std. dev.: 8.09    Param: {'regularizer': <keras.regularizers.L1L2 object at 0x7f6a5974de10>}
