# Neural Network Experimentation

__Note: since there is no predefined RMSE loss function in Keras, this notebooks optimizes the MSE only__

In [None]:
import sys
sys.path.append("..")

import numpy as np
import pandas as pd

from nn_predictors import MultiLayerPerceptron
import keras

x_train = pd.read_pickle("../data/x_train.pkl") 
y_train = pd.read_pickle("../data/y_train.pkl") 
x_test = pd.read_pickle("../data/x_test.pkl")

# fill NaNs
x_train.fillna(0, inplace=True)
y_train.fillna(0, inplace=True)
x_test.fillna(0, inplace=True)

# Save the dataset ids
id_x_train =  x_train['fullVisitorId']
id_y_train = y_train['fullVisitorId']
id_x_test = x_test['fullVisitorId']

# Delete fullVisitor ID -> probably we want to leave it as a OHE feature
del x_train['fullVisitorId']
del y_train['fullVisitorId']
del x_test['fullVisitorId']

### Data Preparation
Some final preparations:
- drop some columns
- log transform revenue features and the target revenue
- normalize the other features

In [None]:
print("People who buy: {} out of {} (proportion: {})"
      .format(int(np.sum(y_train > 0)), int(len(y_train)), float(np.sum(y_train > 0)/len(y_train))))

to_drop = ['category__other category', 'category__0']
x_train.drop(labels=to_drop, axis=1, inplace=True)
x_test.drop(labels=to_drop, axis=1, inplace=True)

for col in x_train.columns:
    if "transactionRevenue" in col:
        x_train[col] = np.log(x_train[col] + 1)
        x_test[col] = np.log(x_test[col] + 1)
    else:
        x_train[col] / (x_train[col].max() + 1e-6)
        x_test[col] / (x_train[col].max() + 1e-6)

y_train = np.log(y_train + 1)
assert (x_train.columns == x_test.columns).all()

### Simple train and evaluate

The MLP class can be used to fit a single model and predict / evaluate it. Works mostly like you would expect. Note that it takes a dictionary for the hyperparameter values.

In [None]:
mlp = MultiLayerPerceptron(x_train, y_train, loss='mean_squared_error', nfolds=3, prediction_type="regression")

params = {'learning_rate': 0.0001,
          'lr_decay': 0.0,
          'dropout_rate': 0.1,
          'neurons': 32,
          'hidden_layers': 4,
          'batch_size': 64,
          'epochs': 10,
          'activation': 1,
          'optimizer': 1
          }

mlp.fit(params, stratified_batches=True)
loss = mlp.evaluate(x_train, y_train)
print("Evaluation loss: {}".format(loss))

### Tune with HORD

The class has built-in methods to tune hyperparameters using ["Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates"](https://ilija139.github.io/pub/aaai-17.pdf) (AAAI-17) by Ilija Ilievski, Taimoor Akhtar, Jiashi Feng, and Christine Annette Shoemaker. 

[arXiv](https://arxiv.org/abs/1607.08316) -- [PDF](https://ilija139.github.io/pub/aaai-17.pdf) -- [Supplement](https://ilija139.github.io/pub/aaai-17-sup.pdf) -- [Poster](https://ilija139.github.io/pub/aaai-17_poster.pdf)

Code is also inspired by the corresponding [GitHub Repository](https://github.com/ilija139/HORD). Hyperparameter Optimization via RBF and Dynamic coordinate search (HORD) is specifically designed to tune Deep Learning models with many parameters and the authors show that it performs well for models with up to 19 hyperparameters.

The class implements this functionality with a simple method `.tune_with_HORD(max_evaluations=...)`.
Try it below.

In [None]:
mlp = MultiLayerPerceptron(x_train, y_train, loss='mean_squared_error', nfolds=5, prediction_type="regression",
                           stratify_labels=np.array(y_train > 0, dtype=int))

mlp.tune_with_HORD(max_evaluations=100, log=True, log_path="../results/")