## NN Starter + Keras Tuner Ensemble

This notebook will show you steb-by-step how to:

- Use a TF-Keras neural network for tabular data (regression)
- Use `KerasTuner` to find high-performing model configurations
- Ensemble a few of the top models to generate final predictions

References:

- https://www.kaggle.com/fchollet/moa-keras-kerastuner-best-practices
- https://github.com/keras-team/keras-tuner

Note: This notebook is addressed more to newers/mid-level in deep learning, aligned with the purpose of this Playground competition. 
Experienced kagglers probably won't learn anything new. 

# Import libraries

In [None]:
!pip install git+https://github.com/keras-team/keras-tuner.git -q

In [None]:
import os, sys, gc
import time, random
import numpy as np
import pandas as pd
import logging
import typing as tp
from pathlib import Path
from contextlib import contextmanager

from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_log_error, mean_squared_error

import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.callbacks import *

print('TF version:', tf.__version__)
print('GPU devices:', tf.config.list_physical_devices('GPU'))
print("GPU available: ", tf.test.is_gpu_available())

# Load data

In [None]:
TARGET = "target"
# IDVAR = 'id'
N_OUTS = 1
BS = 128
EPOCHS = 10

DIR = "../input/tabular-playground-series-jan-2021/"
WORK = "./"


num_cols = [f'cont{i}' for i in range(1,15)]
all_cols = num_cols  # + string_cols + categorical_cols

In [None]:
train = pd.read_csv(DIR+"train.csv")
train['id'] = train.index
# train_labels = train[TARGET].values

test = pd.read_csv(DIR+"test.csv")
test_index = test['id']

sub = pd.read_csv(DIR+"sample_submission.csv")

print('Raw data loaded!')
print("Train: {}, Test: {}, Sample sub: {}".format(train.shape, test.shape, sub.shape))


# split to train/valid sets
print('Split to train/valid sets:\n')
val_df = train[all_cols].sample(frac=0.2, random_state=2020)
train_df = train[all_cols].drop(val_df.index)
print('Train shape:', train_df.shape)    
print('Valid shape:', val_df.shape)  

In [None]:
display(train.sample(5))

In [None]:
# train[[TARGET]].plot(figsize=(16, 8));

# Prepare Dataset

1) Encode our features

First we need to encode our input variables before passing them to the NN. 
Since we have only numerical features we use a single `Normalization` layer to encode each feature separately.
Then, we concatenate the entire feature space into a single vector. 

We wrap all the above steps in the following python method: `encode_numerical_feature` 

In [None]:
from tensorflow.keras.layers.experimental.preprocessing import Normalization

def encode_numerical_feature(feature, name, dataset):
    normalizer = Normalization()                    # Create a Normalization layer for each feature
    feature_ds = dataset.map(lambda x, y: x[name])  # Prepare a TF-Dataset that only yields our feature
    normalizer.adapt(feature_ds)                    # Learn the statistics of the data
    encoded_feature = normalizer(feature)           # Normalize the input feature
    return encoded_feature


Let's turn our dataframes into `tf.data.Dataset`, which we will use to train our Keras models in the next step.
The following method: `dataframe_to_dataset` does exactly that. 

In [None]:
# code part coppied from: https://www.kaggle.com/nicapotato/keras-nn-tabular-regression-problem

def dataframe_to_dataset(dataframe, labels, role, BATCHSIZE):
    dataframe = dataframe.copy()
    if role != "test":
        ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    else: 
        ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if role == "train":
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(BATCHSIZE)
    return ds

In [None]:
train_ds = dataframe_to_dataset(train_df, train.loc[train_df.index, TARGET], "train", BS)
val_ds = dataframe_to_dataset(val_df, train.loc[val_df.index, TARGET], "val", BS)
test_ds = dataframe_to_dataset(test[all_cols], np.zeros((test.shape[0], N_OUTS)), "test", BS)

# full dataset
full_train_ds = dataframe_to_dataset(train[all_cols], train[TARGET], "train", BS)

print('Training ds steps:', int(train_ds.cardinality()))
print('Validation ds steps:', int(val_ds.cardinality()))
print('Test ds steps:', int(test_ds.cardinality()))
print()
print('Full Training ds steps:', int(full_train_ds.cardinality()))

# train_ds = train_ds.shuffle(1024).batch(BS).prefetch(8)
# val_ds = val_ds.batch(BS).prefetch(8)
# test_ds = test_ds.batch(BS).prefetch(8)

In [None]:
# sanity check 

# import pprint as pp

# print('Look at Data')
# for x, y in val_ds.take(1):
#     pp.pprint(x)
#     pp.pprint(y)

# Training a baseline model

In [None]:
# We use TF Functional API to create aour NN model
# For more info see here: 


def base_model():

    num_inputs = [Input(shape=(1,), name=x) for x in num_cols]
    encoded_nums = [encode_numerical_feature(var_input, var_name, train_ds)
                   for var_input, var_name in zip(num_inputs, num_cols)]
    
    all_feats = Concatenate()(encoded_nums)

    x = Dropout(0.2)(all_feats)
    x = Dense(64, activation='relu')(x)
    x = Dropout(0.3)(x)
    x = Dense(32, activation="relu")(x)
    x = Dropout(0.2)(x)
    out = Dense(1, activation='linear')(x)
    base_model = tf.keras.Model(num_inputs, out)
    
    # compile model 
    base_model.compile(
        optimizer=tf.keras.optimizers.Adam(lr=2e-3),
        loss="mse",
        metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse')])

    return base_model

In [None]:
base_model = base_model()
base_model.summary()

In [None]:
# set callbacks 
es = EarlyStopping(monitor='val_loss', min_delta=0.0001,patience=5, verbose=1, mode='min',restore_best_weights=True)
rlr = ReduceLROnPlateau(monitor='val_loss', factor=0.7, patience=5, min_lr=0.00001, verbose=0)
# ckp = callbacks.ModelCheckpoint('model.h5', monitor='val_loss', save_best_only=True)

hist = base_model.fit(train_ds, 
                      batch_size=BS, 
                      epochs=EPOCHS,
                      validation_data=val_ds, 
                      verbose=1, 
                      callbacks=[es, rlr])

In [None]:
# from nb: https://www.kaggle.com/nicapotato/keras-nn-tabular-regression-problem/

plot_metrics = ['loss', 'rmse']

f, ax = plt.subplots(1,2,figsize = [12,4])
for p_i,metric in enumerate(plot_metrics):
    ax[p_i].plot(hist.history[metric], label='Train ' + metric, )
    ax[p_i].plot(hist.history['val_' + metric], label='Val ' + metric)
    ax[p_i].set_title("Loss Curve - {}".format(metric))
    ax[p_i].set_ylabel(metric.title())
    ax[p_i].legend()
plt.show()

# Optimize with Keras Tuner

Here we use KerasTuner to make a hyperparameter search for our NN configuration. 

For demo we use the following hyperparams: 

- `num_layers` (`Int`): The no. of layers in our NN (shallow or deep NN)

- `units_i` (`Int`): The no. of dense neurons for each layer-i 

- `dp_i` (`Float`): The dropout rate for each layer-i 

- `final_dp` (`Float`): The dropout rate for at the last layer before output

- `learning_rate` (`Float`): The learning rate for our optimizer

In [None]:
import kerastuner as kt

def make_model(hp):
    
    num_inputs = [Input(shape=(1,), name=x) for x in num_cols]
    encoded_nums = [encode_numerical_feature(var_input, var_name, train_ds)
                   for var_input, var_name in zip(num_inputs, num_cols)]
    all_feats = Concatenate()(encoded_nums)
    x = all_feats
    
    num_layers = hp.Int('num_layers', min_value=2, max_value=5, step=1)
    for i in range(num_layers):
        units = hp.Int(f'units_{i}', min_value=128, max_value=512, step=64)
        dp = hp.Float(f'dp_{i}', min_value=0., max_value=0.5)
        x = Dropout(dp)(x)
        x = Dense(units, activation='relu')(x)
    
    dp = hp.Float('final_dp', min_value=0.05, max_value=0.5)
    x = Dropout(dp)(x)
    outputs = Dense(1, activation='linear')(x)
    model = tf.keras.Model(num_inputs, outputs)

    lr = hp.Float('learning_rate', min_value=1e-4, max_value=5e-2)
    optimizer = tf.keras.optimizers.Adam(learning_rate=lr) # 1e-3
    model.compile(loss='mse',
                  optimizer=optimizer,
                  metrics=[tf.keras.metrics.RootMeanSquaredError(name='rmse')])
    #     model.summary()
    return model



In [None]:
# set KerasTurner

MAX_TRIALS = 10  # 5  
# Set to 5 for a quick run, but need 100+ for good results


tuner = kt.tuners.BayesianOptimization(
    make_model,
    objective=kt.Objective('val_rmse', direction="min"),  # 'val_loss',
    max_trials=MAX_TRIALS,          
    overwrite=True)

tuner.search(train_ds, 
             validation_data=val_ds, 
             callbacks=[EarlyStopping(monitor='val_loss', mode='min', patience=5)], 
             epochs=60)

## Reinstantiate the top N models and train them on the full dataset

In [None]:
def get_trained_model(hp):
    model = make_model(hp)
    # First, find the best number of epochs to train for
    callbacks=[EarlyStopping(monitor='val_rmse', mode='min', patience=5, restore_best_weights=True)]
    hist = model.fit(train_ds, validation_data=val_ds, epochs=50, callbacks=callbacks)
    val_loss_per_epoch = hist.history['val_rmse']
    best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
    print('Best epoch: %d' % (best_epoch,))
    # Increase epochs by 20% when training on the full dataset
    model = make_model(hp)
    model.fit(full_train_ds, epochs=int(best_epoch * 1.2), verbose=0)
    return model

In [None]:
# select top-N models

n = 3     # e.g. n=10 for top ten models
best_hps = tuner.get_best_hyperparameters(n)

all_preds = []
for hp in best_hps:
    model = get_trained_model(hp)
    preds = model.predict(test_ds)
    all_preds.append(preds)

In [None]:
for i in range(n):
    sns.distplot(all_preds[i])
plt.show()

# Ensemble predictions from top-N models

In [None]:
preds = np.zeros(shape=(len(test), 1))
for p in all_preds:
    preds += p
preds /= len(all_preds)

# Submit

In [None]:
sub['target'] = preds
sub.to_csv('nn_model.csv', index=False)

print('Submit!')
sub.head(10)

In [None]:
sub['target'].plot(figsize=(16,4));

In [None]:
sub['target'].iloc[:1000].plot(figsize=(16,4));

In [None]:
sns.distplot(sub['target'])

### WiP - The notebook will be updated constantly with new tasks if there is interest. 

### Feel free to ask any questions you might have in the comments bellow 

Next steps: 

- Try different SoA optimizers

- Try different LR schedulers 

- Tune different sets of hyperparams 

