# Keras + KerasTuner best practices

This notebook presents how to use KerasTuner to find a high-performing model in just a few lines of code.

First, let's start by installing the latest Kerastuner version:

In [None]:
!pip install git+https://github.com/keras-team/keras-tuner.git -q

## Load the data

We use Pandas to load the data into NumPy arrays.

Our inputs are uint8 arrays of shape `(num_samples, 28, 28, 1)` and our targets are integer arrays of shape `(num_samples,)`.

In [None]:
import pandas as pd

def load_data(path):
    data = pd.read_csv(path)
    y = data["label"]
    x = data.drop(labels=["label"], axis=1).values.reshape(-1, 28, 28, 1)
    return x, y

x_train, y_train = load_data("../input/digit-recognizer/train.csv")
x_test, _ = load_data("../input/digit-recognizer/train.csv")

## Define a tunable model

We define a function `def make_model(hp):` which builds a compiled Keras model,
parameterized by hyperparameters obtained from the `hp` argument.

Our model includes a stage that does random image data augmentation, via the `augment_images`
function. Our image augmentation is itself tunable: we'll find the best augmentation
configuration during the hyperparameter search.

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

def augment_images(x, hp):
    use_rotation = hp.Boolean('use_rotation')
    if use_rotation:
        x = layers.experimental.preprocessing.RandomRotation(
            hp.Float('rotation_factor', min_value=0.05, max_value=0.2)
        )(x)
    use_zoom = hp.Boolean('use_zoom')
    if use_zoom:
        x = layers.experimental.preprocessing.RandomZoom(
            hp.Float('use_zoom', min_value=0.05, max_value=0.2)
        )(x)
    return x

def make_model(hp):
    inputs = keras.Input(shape=(28, 28, 1))
    x = layers.experimental.preprocessing.Rescaling(1. / 255)(inputs)
    x = layers.experimental.preprocessing.Resizing(64, 64)(x)
    x = augment_images(x, hp)
    
    num_block = hp.Int('num_block', min_value=2, max_value=5, step=1)
    num_filters = hp.Int('num_filters', min_value=32, max_value=128, step=32)
    for i in range(num_block):
        x = layers.Conv2D(
            num_filters,
            kernel_size=3,
            activation='relu',
            padding='same'
        )(x)
        x = layers.Conv2D(
            num_filters,
            kernel_size=3,
            activation='relu',
            padding='same'
        )(x)
        x = layers.MaxPooling2D(2)(x)
    
    reduction_type = hp.Choice('reduction_type', ['flatten', 'avg'])
    if reduction_type == 'flatten':
        x = layers.Flatten()(x)
    else:
        x = layers.GlobalAveragePooling2D()(x)

    x = layers.Dense(
        units=hp.Int('num_dense_units', min_value=32, max_value=512, step=32),
        activation='relu'
    )(x)
    x = layers.Dropout(
        hp.Float('dense_dropout', min_value=0., max_value=0.7)
    )(x)
    outputs = layers.Dense(10)(x)
    model = keras.Model(inputs, outputs)
    
    learning_rate = hp.Float('learning_rate', min_value=3e-4, max_value=3e-3)
    optimizer = keras.optimizers.Adam(learning_rate=1e-3)
    model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  optimizer=optimizer,
                  metrics=[keras.metrics.SparseCategoricalAccuracy(name='acc')])
    model.summary()
    return model

## Run hyperparameter search

Now, we launch the search. For simplicity, we'll use `RandomSearch`. We'll limit the search to 100 different model configurations.

Note that we configure the calls to `model.fit()` to use the `EarlyStopping` callbacks.
Indeed, we train for 100 epochs, but the model is likely to start overfitting much earlier than that --
in general, always use a large number of epochs + the `EarlyStopping` callback.

Our search is guided by validation accuracy, which is computed on a fixed 20% hold-out set of the training data.

In [None]:
import kerastuner as kt

tuner = kt.tuners.RandomSearch(
    make_model,
    objective='val_acc',
    max_trials=100,
    overwrite=True)

callbacks=[keras.callbacks.EarlyStopping(monitor='val_acc', mode='max', patience=3, baseline=0.9)]
tuner.search(x_train, y_train, validation_split=0.2, callbacks=callbacks, verbose=1, epochs=100)

On the free Kaggle GPU, trying out 100 models takes 4 hours.
At the end of the search, our best validation accuracy is 99.33%.

## Find the best epoch value

Now, we can retrieve the best hyperparameters, use them to build the best model,
and train the model for 50 epochs to find at which epoch training should stop.

In [None]:
best_hp = tuner.get_best_hyperparameters()[0]
model = make_model(best_hp)
history = model.fit(x_train, y_train, validation_split=0.2, epochs=50)

## Train the production model

Finally, we can train the best model configuration from scratch for the optimal number of epochs.

This time, we train on the entirety of the training data -- no validation split. Our model parameters are already validated.

In [None]:
val_acc_per_epoch = history.history['val_acc']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
model = make_model(best_hp)
model.fit(x_train, y_train, epochs=best_epoch)

## Make a submission

In [None]:
import numpy as np

predictions = model.predict(x_test)
submission = pd.DataFrame({"ImageId": list(range(1, len(predictions) + 1)),
                           "Label": np.argmax(predictions, axis=-1)})
submission.to_csv("submission.csv", index=False)