# Purpose

This notebook works through an example workflow using [hyperopt](http://hyperopt.github.io/hyperopt/) to tune a Keras model, while tracking the results with MLFlow.

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the `sklearn` data loader.  I'll split off 20% into a test set and an additional 20% into a validation set.  Finally, I'll standardize the data using `StandardScaler` ahead of modeling.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=0.2
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

pd.DataFrame(X_train, columns=housing.feature_names).describe()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0
mean,1.951585e-15,-1.221086e-16,-1.939751e-15,-1.531414e-14,5.648193e-17,-8.122640000000001e-17,2.069821e-14,6.596014e-14
std,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038
min,-1.780359,-2.195702,-1.868116,-1.783648,-1.259356,-0.1968599,-1.438172,-2.398818
25%,-0.6873717,-0.8438056,-0.4103394,-0.2085559,-0.5651305,-0.05433696,-0.7922602,-1.106712
50%,-0.1752336,0.03095082,-0.07775169,-0.1075282,-0.2264624,-0.02285411,-0.6424836,0.5409723
75%,0.4673025,0.6671373,0.2605153,0.009533632,0.2686508,0.01508449,0.9769765,0.7763558
max,5.877788,1.859987,56.19934,57.47692,30.45367,101.7102,2.961517,2.509179


In [4]:
train_data = (X_train, {"main_output": y_train, "aux_output": y_train})
val_data = (X_valid, {"main_output": y_valid, "aux_output": y_valid})

# Model

The model used for this example will be a wide and deep network with the following characteristics:
- a deep path with `n_hidden` hidden layers with `n_neurons` at each layer
- a wide path connecting all inputs to the output
- all layers are fully connected
- two outpus:
    - one from the deep path alone, fit to the target
    - one from the concatenaded wide and deep paths, fit to the target

This type of multi-output architecture is usually used as a regularization technique, but I'm simply employing it here so my example has more than one loss to simultaneously minimize.  This model is very similar to the regression example I used in my [intro to Keras](https://github.com/mcnewcp/book-geron-ml-sklearn-keras-tensorflow/blob/main/10-intro-ann-keras/10-intro-ann-keras.ipynb) notebook and from Chapter 10 of [Hands on ML](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/), so I won't explain the Keras code below.

The model build code should be functionalized so that the hyperparameters are generalized in the build and compile steps for integration into hyperparameter tuning.  I'm pulling out the following hyperparameters for tuning:
- `n_hidden`: number of hidden layers
- `n_neurons`: number of neurons per layer
- `activation`: activation funciton used in hidden layers

*Note*: I'm not tuning learning rate here.  In general I think it's best practice to choose a sufficiently low learning rate, high number of epochs, and use early stopping.  The goal of this stage of hyperparameter tuning is to simply identify promising model candidates.  Once promising candidates have been identified, the learning rate will be fine tuned.

In [2]:
import tensorflow as tf
from tensorflow import keras

print("tf version:", tf.__version__, ", keras version:", keras.__version__)



2023-02-15 16:01:54.831442: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


tf version: 2.11.0 , keras version: 2.11.0


In [3]:
def build_model(n_hidden=1, n_neurons=20, activation="relu"):
    inp = keras.layers.Input(shape=[8], name="input")  # input layer
    for layer in range(n_hidden):  # sequentially add hidden layers
        if layer == 0:
            hl = keras.layers.Dense(n_neurons, activation=activation)(inp)
        else:
            hl = keras.layers.Dense(n_neurons, activation=activation)(hl)
    concat = keras.layers.Concatenate()([hl, inp])  # concat deep and wide paths
    main_output = keras.layers.Dense(1, name="main_output")(concat)  # combined output
    aux_output = keras.layers.Dense(1, name="aux_output")(hl)  # deep output
    model = keras.Model(inputs=[inp], outputs=[main_output, aux_output])
    model.compile(
        loss={"main_output": "mse", "aux_output": "mse"},
        loss_weights={
            "main_output": 0.9,
            "aux_output": 0.1,
        },  # weighting heavily towards main output
        optimizer=keras.optimizers.SGD(learning_rate=1e-3),
    )
    return model

# Hyperopt with Validation Split

[Hyperopt](http://hyperopt.github.io/hyperopt/) is an optimization library commonly used to tune hyperparameters during model experimentation, though it is an entirely general package.  It will search through an arbitrarily complex search space to minimize and objective function.  The steps of using hyperopt include:
1. define objective function
2. define search space
3. run the minimization

## Objective Function

Hyperopt works by changing values in your hyperparameter space and evaluating the objective function to receive a score (`loss`).  It then investigates promising areas of the search space more thoroughly.  The objective function should take in the chosen values of hyperparameters and output a loss for minimization and a status.  It can also output anything else you'd like to log in the trials object, but I won't include anything else here.

In [12]:
from utils import fit_eval_log
from hyperopt import STATUS_OK

def objective(hyper_params):
    model = build_model(**hyper_params)
    run_name = "test-hp"
    mean_val_loss = fit_eval_log(
        run_name=run_name,
        train_data=train_data,
        val_data=val_data,
        model=model,
        hyper_params=hyper_params,
    )
    return {"loss": mean_val_loss, "status": STATUS_OK}


## Search Space

The hyperparameter search space needs to be set up in a way to inform hyperopt not only the bounds of the hyperparameters but also how to choose values in between the bounds.  This is done by using the most relevant parameter expression from `hyperopt.hp`.  The [documentation](http://hyperopt.github.io/hyperopt/getting-started/search_spaces/) lists all options, but the most relevant in my experience are:
- `hp.choice()` chooses an option from a supplied list
- `hp.uniform()` samples a continuous value between a lower and upper bound
- `hp.quniform()` samples an integer between a lower and upper bound

*Note*: there is an uresolved type issue around using `hp.quniform()` which is resolved by wrapping it in `scope.int()` from `hyperopt.pyll`.

Below, I'll set up the search space for my example which includes number of neurons per layer, number of hidden layers, and activation function.

In [8]:
from hyperopt import hp
from hyperopt.pyll import scope

hyper_params = {
    "n_hidden": scope.int(hp.quniform("n_hidden", 1, 10, 1)),
    "n_neurons": scope.int(hp.quniform("n_neurons", 3, 50, 1)),
    "activation": hp.choice("activation", ["relu", "sigmoid", "tanh"]),
}


## Minimize Objective

Now to run the optimization, I'll use `fmin()` and ask hyperopt to suggest the best optimization algorithm with `tpe.suggest`.  You simply supply the objective function along with the search space and tell hyperopt how many trials you want to run, then it'll return the best trial along with a history of the trials in a `Trials()` object.

In [14]:
from hyperopt import Trials, fmin, tpe
import mlflow

mlflow.set_experiment(experiment_name='test-hp')
mlflow.tensorflow.autolog(silent=True)

trials = Trials()
best = fmin(
    fn=objective,
    space=hyper_params,
    algo=tpe.suggest,
    max_evals=10,
    trials=trials,
    verbose=1,
)



  0%|          | 0/10 [00:17<?, ?trial/s, best loss=?]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpm3kianqu/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpm3kianqu/model/data/model/assets




 10%|█         | 1/10 [00:45<04:35, 30.60s/trial, best loss: 0.548612654209137]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpz_7_yy1a/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpz_7_yy1a/model/data/model/assets




 20%|██        | 2/10 [01:17<03:41, 27.73s/trial, best loss: 0.5344837307929993]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpg0ba1_yc/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpg0ba1_yc/model/data/model/assets




 30%|███       | 3/10 [01:42<03:23, 29.13s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpsdprups5/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpsdprups5/model/data/model/assets




 40%|████      | 4/10 [02:04<02:42, 27.06s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp382k9inp/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp382k9inp/model/data/model/assets




 50%|█████     | 5/10 [02:32<02:06, 25.29s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpegwkklit/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpegwkklit/model/data/model/assets




 60%|██████    | 6/10 [02:59<01:46, 26.74s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp4gdkcc39/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp4gdkcc39/model/data/model/assets




 70%|███████   | 7/10 [03:24<01:19, 26.62s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprho3ao_7/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprho3ao_7/model/data/model/assets




 80%|████████  | 8/10 [03:46<00:51, 25.58s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpiodnhpeh/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpiodnhpeh/model/data/model/assets




 90%|█████████ | 9/10 [04:15<00:24, 24.77s/trial, best loss: 0.4567939043045044]




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpyv2tq59p/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpyv2tq59p/model/data/model/assets



100%|██████████| 10/10 [04:25<00:00, 26.53s/trial, best loss: 0.4567939043045044]


## Evaluate Results

The object returned by optimizationa above contains some info on