# Purpose

This notebook works through an example workflow of tracking Keras experiments using [MLFlow](https://www.mlflow.org).  

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the `sklearn` data loader.  I'll split off 20% into a test set and an additional 20% into a validation set.  Finally, I'll standardize the data using `StandardScaler` ahead of modeling.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=0.2
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

pd.DataFrame(X_train, columns=housing.feature_names).describe()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0
mean,-5.310378e-15,1.308498e-16,2.609196e-15,5.782674e-17,6.078532e-17,1.288057e-15,3.642117e-14,-3.56791e-13
std,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038
min,-1.772628,-2.193979,-1.712899,-1.465095,-1.273094,-0.1836009,-1.449869,-2.392963
25%,-0.6883496,-0.8470288,-0.371557,-0.1767937,-0.5741628,-0.05380296,-0.7985255,-1.11098
50%,-0.1788321,0.02452731,-0.07644289,-0.09453273,-0.2278374,-0.02349515,-0.648576,0.5415767
75%,0.4498437,0.6583863,0.2287333,0.002691922,0.2656315,0.01151715,0.9774388,0.7819486
max,5.832215,1.846872,50.92312,63.20694,24.35409,95.86278,2.959584,2.554691


# Model

The model used for this example will be a wide and deep network with the following characteristics:
- a deep path with `n_hidden` hidden layers with `n_neurons` at each layer
- a wide path connecting all inputs to the output
- all layers are fully connected
- two outpus:
    - one from the deep path alone, fit to the target
    - one from the concatenaded wide and deep paths, fit to the target

This type of multi-output architecture is usually used as a regularization technique, but I'm simply employing it here so my example has more than one loss to simultaneously minimize.  This model is very similar to the regression example I used in my [intro to Keras](https://github.com/mcnewcp/book-geron-ml-sklearn-keras-tensorflow/blob/main/10-intro-ann-keras/10-intro-ann-keras.ipynb) notebook and from Chapter 10 of [Hands on ML](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/), so I won't explain the Keras code below.

The model build code should be functionalized so that the hyperparameters are generalized in the build and compile steps for integration into hyperparameter tuning.  I'm pulling out the following hyperparameters for tuning:
- `n_hidden`: number of hidden layers
- `n_neurons`: number of neurons per layer
- `activation`: activation funciton used in hidden layers

*Note*: I'm not tuning learning rate here.  In general I think it's best practice to choose a sufficiently low learning rate, high number of epochs, and use early stopping.  The goal of this stage of hyperparameter tuning is to simply identify promising model candidates.  Once promising candidates have been identified, the learning rate will be fine tuned.


In [2]:
import tensorflow as tf
from tensorflow import keras

print("tf version:", tf.__version__, ", keras version:", keras.__version__)



2023-02-15 13:14:35.634366: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


tf version: 2.11.0 , keras version: 2.11.0


In [3]:
def build_model(n_hidden=1, n_neurons=20, activation="relu"):
    inp = keras.layers.Input(shape=[8], name="input")  # input layer
    for layer in range(n_hidden):  # sequentially add hidden layers
        if layer == 0:
            hl = keras.layers.Dense(n_neurons, activation=activation)(inp)
        else:
            hl = keras.layers.Dense(n_neurons, activation=activation)(hl)
    concat = keras.layers.Concatenate()([hl, inp])  # concat deep and wide paths
    main_output = keras.layers.Dense(1, name="main_output")(concat)  # combined output
    aux_output = keras.layers.Dense(1, name="aux_output")(hl)  # deep output
    model = keras.Model(inputs=[inp], outputs=[main_output, aux_output])
    model.compile(
        loss={"main_output": "mse", "aux_output": "mse"},
        loss_weights={
            "main_output": 0.9,
            "aux_output": 0.1,
        },  # weighting heavily towards main output
        optimizer=keras.optimizers.SGD(learning_rate=1e-3),
    )
    return model


# MLFlow Experiment Tracking

[MLFLow](https://www.mlflow.org) is a full-featured end-to-end ML lifecycle management platform, but all I'll be using it for in this example is experiment tracking.  The [documentation on tracking](https://www.mlflow.org/docs/latest/tracking.html) is quite good and so I'm working primarily from that.  There is even an automatic logging submodule for Keras and Tensorflow, `mlflow.tensorflow.autolog()` which I will try out first. In addition, nearly anything can be logged manually, including categories of metrics, parameters, tags, and artifacts.  Artifacts can be nearly anything including plots or the modles themselves.

## Auto-Logging

First I'll give the auto logging a shot and see what it logs.  I've had issues with the auto logging submodule for scikit-learn because it simply logged too many parameters to be useful and after the experiment count reached a certain threshold, performance in the dashboard tool suffered.

**Note**: the auto-logging submodule only works for tensorflow versions 2.3.0 - 2.11.0, which I had to specify manually with `pip` as the `conda` installer chose a version outside of that range.

By default, on first execution, MLFlow creates the directory `./mlruns` on default and stores all experiment related information as individual files within.  Another option is to store the information as a SQLite database, or incorporate into Databricks. 

In [4]:
import mlflow

mlflow.set_experiment(experiment_name="auto-log")  # this will create ./mlruns
mlflow.tensorflow.autolog()  # turn on auto logging

model = build_model(n_hidden=2)

with mlflow.start_run(run_name="auto-log-1") as run:
    history = model.fit(
        X_train,
        {"main_output": y_train, "aux_output": y_train},
        epochs=250,
        validation_data=(X_valid, {"main_output": y_valid, "aux_output": y_valid}),
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=0,
    )
    total_loss, main_loss, aux_loss = model.evaluate(
        X_test, {"main_output": y_test, "aux_output": y_test}
    )



2023/02/15 13:14:41 INFO mlflow.tracking.fluent: Experiment with name 'auto-log' does not exist. Creating a new experiment.
2023-02-15 13:14:41.955604: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpmrosk6g6/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpmrosk6g6/model/data/model/assets




## Local Server

To serve the exploration tool locally, you simply run the following in the command line:

```zsh
mlflow ui
```

By default, this will look for local logged files in `./mlruns` and it will launch on port 5000.  If you need to change the location use `--backend-store-uri` and if you need to specify the port use `-p`.

The UI provides a simple table comparison of all runs within an experiment so you can quickly check the parameters used for each run and corresponding metrics to choose promising model candidates.  

It looks like the auto logger logs a lot of useful information including many parameters inferred from the model, early stopping results, and learning curves for each loss (see below).  Unsurprisingly though it doesn't log anything about model architecture which will be important in my case since most of my tuning will involve architecture changes, so these will need to be logged manually.

![MLFLow UI Screenshot](images/mlflow_ui_sn.png)

## Manual Logging

It's quite easy to log additional hyperparameters via `mlflow.log_params()`.  It accepts a dictionary of parameters and logs them to the corresponding run, as long as it's called under `with mlflow.start_run():`.  This means I'll need to define my hyperparameters in a dictionary at the start of my run, which will also aid in integrating optimization later.  I'll modify the above workflow to include both of these changes below.

In [5]:
hyper_params = {"n_hidden": 1, "n_neurons": 20, "activation": "relu"}

mlflow.set_experiment(experiment_name="auto-and-manual")
mlflow.tensorflow.autolog()  # turn on auto logging

model = build_model(**hyper_params)  # names must match

with mlflow.start_run(run_name="run-1") as run:
    history = model.fit(
        X_train,
        {"main_output": y_train, "aux_output": y_train},
        epochs=250,
        validation_data=(X_valid, {"main_output": y_valid, "aux_output": y_valid}),
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=0,
    )
    mlflow.log_params(hyper_params)  # log all hyperparams



2023/02/15 13:16:39 INFO mlflow.tracking.fluent: Experiment with name 'auto-and-manual' does not exist. Creating a new experiment.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpn_s62a7w/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpn_s62a7w/model/data/model/assets


# Train/Val Workflow

Now to use the above workflow in hyperparameter tuning experiments, I'm going to functionalize it below.  The inputs include the following: 
- run_name
- training data
- validation data
- model
- hyperparameters (for logging)

The function should output a measure of loss to pass to optimization.

In [6]:
def run_log_exp(
    run_name: str,
    train_data: tuple,
    val_data: tuple,
    model: keras.Model,
    hyper_params: dict,
):
    with mlflow.start_run(run_name=run_name):
        history = model.fit(
            train_data[0],
            train_data[1],
            epochs=25,
            validation_data=val_data,
            callbacks=[
                keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
            ],
            verbose=0,
        )
        mlflow.log_params(hyper_params)
        # customization needed here depending on output shape
        total_loss, __, __ = model.evaluate(
            val_data[0], val_data[1], verbose=0
        )  # get validation loss for optimization
        return total_loss



In [7]:
mlflow.set_experiment(experiment_name="exp-2")

train_data = (X_train, {"main_output": y_train, "aux_output": y_train})
val_data = (X_valid, {"main_output": y_valid, "aux_output": y_valid})
hyper_params = {"n_hidden": 1, "n_neurons": 10, "activation": "relu"}
model = build_model(**hyper_params)
val_loss = run_log_exp(
    run_name="run-1",
    train_data=train_data,
    val_data=val_data,
    model=model,
    hyper_params=hyper_params,
)



2023/02/15 13:19:15 INFO mlflow.tracking.fluent: Experiment with name 'exp-2' does not exist. Creating a new experiment.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5s9i7zhn/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5s9i7zhn/model/data/model/assets


## Running Experiments

Now I can run any number of experiments by calling the above function inside a looped grid of hyperparameters or inside an optimization function.  Below, I'll execute a small grid as an example.

In [8]:
for n_h in [1, 2, 3]:
    for n_n in [10, 20, 30]:
        for act in ["sigmoid", "relu"]:
            hyper_params = {"n_hidden": n_h, "n_neurons": n_n, "activation": act}
            run_name = f"run-{n_h}-{n_n}-{act}"
            model = build_model(**hyper_params)
            val_loss = run_log_exp(
                run_name=run_name,
                train_data=train_data,
                val_data=val_data,
                model=model,
                hyper_params=hyper_params,
            )
            print(f"validation loss for {run_name}: {round(val_loss, 4)}")






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpmoq_1pwk/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpmoq_1pwk/model/data/model/assets


validation loss for run-1-10-sigmoid: 0.5662








INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp7xu9_s6g/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp7xu9_s6g/model/data/model/assets


validation loss for run-1-10-relu: 0.4987








INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpkkx3bbu7/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpkkx3bbu7/model/data/model/assets


validation loss for run-1-20-sigmoid: 0.5558




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp7wqc59x5/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp7wqc59x5/model/data/model/assets


validation loss for run-1-20-relu: 0.4879




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpiyx9czbp/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpiyx9czbp/model/data/model/assets


validation loss for run-1-30-sigmoid: 0.554




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmphi0v0rua/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmphi0v0rua/model/data/model/assets


validation loss for run-1-30-relu: 0.4885




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprc2m22g0/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprc2m22g0/model/data/model/assets


validation loss for run-2-10-sigmoid: 0.6079




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpjmtdy9v7/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpjmtdy9v7/model/data/model/assets


validation loss for run-2-10-relu: 0.4813




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp9hluv7j0/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp9hluv7j0/model/data/model/assets


validation loss for run-2-20-sigmoid: 0.5913




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpf43co8do/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpf43co8do/model/data/model/assets


validation loss for run-2-20-relu: 0.4839




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpdf8q83k4/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpdf8q83k4/model/data/model/assets


validation loss for run-2-30-sigmoid: 0.5963




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp2sk9vcu5/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp2sk9vcu5/model/data/model/assets


validation loss for run-2-30-relu: 0.4526




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpugb9fvbx/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpugb9fvbx/model/data/model/assets


validation loss for run-3-10-sigmoid: 0.6019




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpoa15vqcd/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpoa15vqcd/model/data/model/assets


validation loss for run-3-10-relu: 0.44




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxwpjd5ac/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxwpjd5ac/model/data/model/assets


validation loss for run-3-20-sigmoid: 0.6073




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp9z6m_ikn/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp9z6m_ikn/model/data/model/assets


validation loss for run-3-20-relu: 0.5018




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxri07gr8/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxri07gr8/model/data/model/assets


validation loss for run-3-30-sigmoid: 0.6047




INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpimys2243/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpimys2243/model/data/model/assets


validation loss for run-3-30-relu: 0.4519


# Cross Fold Validation

Cross fold validation is the gold standard for model candidate evaluation, however it's usually not implemented for DL models due to the model complexity and computational constraints.  If the models and/or data are small enough, however, it can still be done.  Here, I'm going to line out a work flow for applying CV and logging within MLFlow.

## Modify Fit and Validate

First, I'll need to modify the keras fit and validation strategy above, before adding the MLFlow logging.  I'll use `sklearn.model_selection.KFold()` to make the splits.  Since I no longer need a static validation set, I'll start from `X_train_full` and `y_train_full` from above.

### Reload Full Training Set

In [9]:
X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train_full)
X_test = scaler.transform(X_test)
y_train = y_train_full  # for consistency in naming

pd.DataFrame(X_train, columns=housing.feature_names).describe()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0
mean,-6.175938e-15,1.09301e-16,-1.218556e-15,2.580408e-15,-7.552098e-17,-9.596114e-17,1.009528e-14,-1.715914e-13
std,1.00003,1.00003,1.00003,1.00003,1.00003,1.00003,1.00003,1.00003
min,-1.773772,-2.200435,-1.930454,-1.78321,-1.268526,-0.2254516,-1.44025,-2.38777
25%,-0.6894856,-0.8489049,-0.4139994,-0.2079826,-0.5697874,-0.05943509,-0.7940214,-1.110494
50%,-0.1773854,0.02561467,-0.08182015,-0.1076106,-0.2293532,-0.0221274,-0.6441712,0.5359956
75%,0.4677667,0.6616289,0.2655375,0.01071542,0.2683419,0.02234489,0.9760834,0.7804744
max,5.848827,1.854156,57.57216,57.5139,30.61166,118.4893,2.961598,2.621549


### Manual K-Fold CV

As I loop through each fold I'll log train and validation CV scores so I have access to the individual values as well as the aggregate.

In [10]:
from sklearn.model_selection import KFold
import numpy as np

mlflow.tensorflow.autolog(disable=True)  # turn off logging

hyper_params = {"n_hidden": 2, "n_neurons": 15, "activation": "relu"}

kf = KFold(
    n_splits=5, shuffle=True, random_state=629
)  # define random for reproducibility
cv_train_losses = []
cv_val_losses = []
for fit, val in kf.split(X_train, y_train):
    # define train and validation set per fold
    train_data = (
        X_train[fit],
        {"main_output": y_train[fit], "aux_output": y_train[fit]},
    )
    val_data = (X_train[val], {"main_output": y_train[val], "aux_output": y_train[val]})
    model = build_model(**hyper_params)
    history = model.fit(
        train_data[0],
        train_data[1],
        epochs=20,
        validation_data=val_data,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=0,
    )
    # update per fold loss
    train_loss, __, __ = model.evaluate(train_data[0], train_data[1], verbose=0)
    val_loss, __, __ = model.evaluate(val_data[0], val_data[1], verbose=0)
    cv_train_losses.append(train_loss)
    cv_val_losses.append(val_loss)

print(f"mean train mse: {round(np.mean(cv_train_losses), 4)}")
print(f"mean validation mse: {round(np.mean(cv_val_losses), 4)}")


mean train mse: 0.5014
mean validation mse: 0.5014


## Nested Logging Folds

Now to modify the MLFlow logging piece, I'll add nested runs for each fold.  In this way, the autologger will log all relevant information for each model trained on each fold, then in the parent run I'll manually log the aggregated loss values along with the chosen hyperparameters.  This will let me identify promising hyperparameter combinations by sorting through the aggregated metrics, but also still allow me to deep dive into any one experiment and view metrics and learning curves of each fold.  The nesting is handled by `nested=True` within `mlflow.start_run()`.

In [11]:
mlflow.tensorflow.autolog()  # turn on logging
mlflow.set_experiment(experiment_name="cv-test")

run_name = "test-cv"
with mlflow.start_run(run_name=run_name):
    cv_train_losses = []
    cv_val_losses = []
    k_fold = 1  # keep track of fold number
    for fit, val in kf.split(X_train, y_train):
        with mlflow.start_run(run_name=f"f{k_fold}-{run_name}", nested=True):
            # define train and validation set per fold
            train_data = (
                X_train[fit],
                {"main_output": y_train[fit], "aux_output": y_train[fit]},
            )
            val_data = (
                X_train[val],
                {"main_output": y_train[val], "aux_output": y_train[val]},
            )
            model = build_model(**hyper_params)
            history = model.fit(
                train_data[0],
                train_data[1],
                epochs=20,
                validation_data=val_data,
                callbacks=[
                    keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
                ],
                verbose=0,
            )
            mlflow.log_params(hyper_params)  # log chosen params in child
            # update per fold loss
            train_loss, __, __ = model.evaluate(train_data[0], train_data[1], verbose=0)
            val_loss, __, __ = model.evaluate(val_data[0], val_data[1], verbose=0)
            cv_train_losses.append(train_loss)
            cv_val_losses.append(val_loss)
            k_fold += 1  # update fold number
    mlflow.log_params(hyper_params)  # log chosen params in parent
    # log aggregated metrics in parent
    mlflow.log_metrics(
        {
            "train_mean_cv_loss": np.mean(cv_train_losses),
            "val_mean_cv_loss": np.mean(cv_val_losses),
        }
    )



2023/02/15 13:28:03 INFO mlflow.tracking.fluent: Experiment with name 'cv-test' does not exist. Creating a new experiment.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmplou5go_w/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmplou5go_w/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpus4xqg5n/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpus4xqg5n/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpkj6kibeb/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpkj6kibeb/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpk_6xrenw/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpk_6xrenw/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmppgvnmmbw/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmppgvnmmbw/model/data/model/assets


## Functionalize Workflow

Now all I need to do is functionalize the above workflow so that it can be used with any number of hyperparameter combinations and/or optimization.  I'll add one additional variable to the previous workflow function, which is the `KFold()` object to use for splitting.

In [12]:
def run_log_exp_cv(
    run_name: str,
    X_train: np.ndarray,
    y_train: np.ndarray,
    model: keras.Model,
    hyper_params: dict,
    kf: KFold,
):
    with mlflow.start_run(run_name=run_name):
        cv_train_losses = []
        cv_val_losses = []
        k_fold = 1  # keep track of fold number
        for fit, val in kf.split(X_train, y_train):
            with mlflow.start_run(run_name=f"f{k_fold}-{run_name}", nested=True):
                # define train and validation set per fold
                # customize depending on output shape
                train_data = (
                    X_train[fit],
                    {"main_output": y_train[fit], "aux_output": y_train[fit]},
                )
                val_data = (
                    X_train[val],
                    {"main_output": y_train[val], "aux_output": y_train[val]},
                )
                history = model.fit(
                    train_data[0],
                    train_data[1],
                    epochs=25,
                    validation_data=val_data,
                    callbacks=[
                        keras.callbacks.EarlyStopping(
                            patience=5, restore_best_weights=True
                        )
                    ],
                    verbose=0,
                )
                mlflow.log_params(hyper_params)  # log chosen params in child
                # update per fold loss
                train_loss, __, __ = model.evaluate(
                    train_data[0], train_data[1], verbose=0
                )
                val_loss, __, __ = model.evaluate(val_data[0], val_data[1], verbose=0)
                cv_train_losses.append(train_loss)
                cv_val_losses.append(val_loss)
                k_fold += 1  # update fold number
        mlflow.log_params(hyper_params)  # log chosen params in parent
        # log aggregated metrics in parent
        mlflow.log_metrics(
            {
                "train_mean_cv_loss": np.mean(cv_train_losses),
                "val_mean_cv_loss": np.mean(cv_val_losses),
            }
        )
        return np.mean(cv_val_losses) # return aggregated loss for optimization



## Running Experiments

Now I'll run a few experiments on a small grid to ensure it's working as intended.

In [13]:
mlflow.set_experiment(experiment_name="cv-test-2")

for n_h in [2, 3]:
    for n_n in [20, 30]:
        for act in ["relu"]:
            hyper_params = {"n_hidden": n_h, "n_neurons": n_n, "activation": act}
            run_name = f"run-{n_h}-{n_n}-{act}"
            model = build_model(**hyper_params)
            val_loss = run_log_exp_cv(
                run_name=run_name,
                X_train=X_train,
                y_train=y_train,
                model=model,
                hyper_params=hyper_params,
                kf=kf,
            )


2023/02/15 13:29:58 INFO mlflow.tracking.fluent: Experiment with name 'cv-test-2' does not exist. Creating a new experiment.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprckce76q/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmprckce76q/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpgead45xb/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpgead45xb/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp8ml0p_qv/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp8ml0p_qv/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpj655xaqt/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpj655xaqt/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5vrxwivx/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5vrxwivx/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpff73r9pl/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpff73r9pl/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpb46y2gpq/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpb46y2gpq/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpyv6dv7xq/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpyv6dv7xq/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpq_lbj8t4/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpq_lbj8t4/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpi8y59lkf/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpi8y59lkf/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmplmwaeldb/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmplmwaeldb/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpclnruel9/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpclnruel9/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp8m0peswm/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp8m0peswm/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp53d9kmol/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp53d9kmol/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpcxuvakv8/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpcxuvakv8/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxtygs07e/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpxtygs07e/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5mwh0cn6/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmp5mwh0cn6/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpa4okv2k5/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpa4okv2k5/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpsdfht20w/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpsdfht20w/model/data/model/assets






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpz083bbf2/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpz083bbf2/model/data/model/assets
