# Keras Tuner for hyperparameter optimization

There are a number of automated tuners available for tuning a model, and by tuning I mean even the architecture - from number of layers, drop outs to learning rates. We can of course build innumerable for loops, but a more elegant option is always welcome.

Keras tuner is a keras wrapper around hyperopt. Here is a quick code for implementing the tuner. Hope you find this useful.

In [None]:
import pandas as pd
import numpy as np
import pandas as pd
import datatable as dt
np.random.seed(42)
import tensorflow as tf
tf.random.set_seed(42)


# Data ===============================================================================================================================
# datatable loads the data much faster
datatable_df = dt.fread('/kaggle/input/jane-street-market-prediction/train.csv')
train = datatable_df.to_pandas()
del datatable_df

features = [c for c in train.columns if 'feature' in c]
train = train[train['weight']>0]
train['action'] = (train['resp']>0)*1

for col in features:
    mean = np.mean(train[col])
    train[col] = train[col].fillna(mean)

# Leaving 50 days between training and test to avoid data leakage.
# Idea taken from this notebook https://www.kaggle.com/gogo827jz/jane-street-xgboost-grouptimesplitkfold?scriptVersionId=48297430
    
X_train = train[train['date']<=350][features].values.astype('float32')
X_test = train[train['date']>400][features].values.astype('float32')

y_train = train[train['date']<=350]['action'].values.astype('int')
y_test = train[train['date']>400]['action'].values.astype('int')

The below code is where we build an architecture and pass choices for hyperparameters. It takes a while to run this block depending on the space of possible parameter combinations. You can add more parameters or more choices. keras Tuner is slightly limited when compared to Hyperopt, but I found it to be more intuitive and easier to understand

In [None]:
import tensorflow as tf
from kerastuner.tuners import RandomSearch, Hyperband
from tensorflow.keras.callbacks import Callback, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping

# Build Model ============================================================================================================================
def build_model(hp):
    model = tf.keras.Sequential()
    for i in range(hp.Int('num_layers', 2, 5)):
        model.add(tf.keras.layers.BatchNormalization())
        model.add(tf.keras.layers.Dense(units=hp.Int('units_' + str(i),
                                            min_value=32,
                                            max_value=2048,
                                            step=32),
                               activation='relu'
                                       ))
        model.add(tf.keras.layers.Dropout(rate=hp.Float('dropout_' + str(i),
                                                 min_value=0.0,
                                                 max_value=0.5,
                                    )))
    model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
    model.compile(
        optimizer=tf.keras.optimizers.Adam(
                                            hp.Float(
                                                    'learning_rate',
                                                    min_value=1e-4,
                                                    max_value=1e-2,
                                                    sampling='LOG',
                                                    default=1e-3
                                                )
                                        ),
            loss=tf.keras.losses.BinaryCrossentropy(label_smoothing = 1e-2),
        metrics='acc')
    return model

# You can either use hyperband or Randomsearch or even Bayesian optimization
tuner = RandomSearch(
    build_model,
    seed=1,
    objective='val_acc',
    max_trials=1,  #change this
    executions_per_trial=1,   #change this
    directory='JS_NN_randomsearch',
    project_name='JS_keras_tuner')

# tuner = Hyperband(
#     build_model,
#     seed=1,
#     objective='val_acc',
#     max_epochs = 40,
#     executions_per_trial=2,
#     directory='JS_NN_hyperband',
#     project_name='JS_keras_tuner'
# )

# Add an early stop so the tuner isn't spending time on a suboptimal model
es = EarlyStopping(monitor = 'val_acc', min_delta = 1e-4, patience = 5)

#This is the tuner object
tuner.search(X_train, y_train,
             epochs=10, # small number for quick run
             callbacks = [es], #,rlr,ckp],
             batch_size = 4096,
             validation_data=(X_test, y_test))

The tuner object stores all the tested models. You can display a summary or pull up top score models.

In [None]:
# Tuner summary
tuner.results_summary()

The tuner summary output is ugly and I haven't yet found a way to display it better. But you can extract the best model. The best model is empty when you extract so it needs to either train again or at least pass it through prediction and then you can print it just like a regular NN architecture. 

In [None]:
best_model = tuner.get_best_models(num_models=1)[0]
best_model.predict(X_test[0:1,])
best_model.summary()

What we get is the most optimal architecture and not necessarily best model (weights), so for that we need to train it again with k-fold. There is an amazing notebook (which inspired this one) https://www.kaggle.com/gogo827jz/jane-street-neural-network-starter for NN training and submitting.

Please upvote if you find it useful.
Good luck!