# Purpose

This notebook works through an example workflow of tracking Keras experiments using [MLFlow](https://www.mlflow.org).  

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the `sklearn` data loader.  I'll split off 20% into a test set and an additional 20% into a validation set.  Finally, I'll standardize the data using `StandardScaler` ahead of modeling.

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_train_full, y_train_full, test_size=0.2
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

pd.DataFrame(X_train, columns=housing.feature_names).describe()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0,13209.0
mean,-4.20387e-15,-2.3668620000000003e-17,-6.491791e-15,-7.838617e-15,6.697144e-17,6.724040000000001e-17,5.434665e-14,-3.987625e-15
std,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038,1.000038
min,-1.762384,-2.211806,-1.726104,-1.146007,-1.218236,-0.1615666,-1.447234,-2.352574
25%,-0.681843,-0.8539298,-0.374943,-0.1777574,-0.5506318,-0.05476924,-0.7973662,-1.1121
50%,-0.1803598,0.02469603,-0.07949644,-0.09572724,-0.2241327,-0.02382653,-0.6477564,0.5319027
75%,0.4522506,0.6636966,0.2282902,0.002179204,0.2492909,0.01196224,0.9652246,0.780994
max,5.843102,1.861823,51.30163,63.03746,29.43487,96.82297,2.947555,2.624269


# Model

The model used for this example will be a wide and deep network with the following characteristics:
- a deep path with `n_hidden` hidden layers with `n_neurons` at each layer
- a wide path connecting all inputs to the output
- all layers are fully connected
- two outpus:
    - one from the deep path alone, fit to the target
    - one from the concatenaded wide and deep paths, fit to the target

This type of multi-output architecture is usually used as a regularization technique, but I'm simply employing it here so my example has more than one loss to simultaneously minimize.  This model is very similar to the regression example I used in my [intro to Keras](https://github.com/mcnewcp/book-geron-ml-sklearn-keras-tensorflow/blob/main/10-intro-ann-keras/10-intro-ann-keras.ipynb) notebook and from Chapter 10 of [Hands on ML](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/), so I won't explain the Keras code below.

The model build code should be functionalized so that the hyperparameters are generalized in the build and compile steps for integration into hyperparameter tuning.  I'm pulling out the following hyperparameters for tuning:
- `n_hidden`: number of hidden layers
- `n_neurons`: number of neurons per layer
- `activation`: activation funciton used in hidden layers

*Note*: I'm not tuning learning rate here.  In general I think it's best practice to choose a sufficiently low learning rate, high number of epochs, and use early stopping.  The goal of this stage of hyperparameter tuning is to simply identify promising model candidates.  Once promising candidates have been identified, the learning rate will be fine tuned.


In [2]:
import tensorflow as tf
from tensorflow import keras
print('tf version:', tf.__version__, ", keras version:", keras.__version__)

2023-02-14 15:11:41.805855: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


tf version: 2.11.0 , keras version: 2.11.0


In [3]:
def build_model(n_hidden=1, n_neurons=20, activation="relu"):
    inp = keras.layers.Input(shape=[8], name="input")  # input layer
    for layer in range(n_hidden):  # sequentially add hidden layers
        if layer == 0:
            hl = keras.layers.Dense(n_neurons, activation=activation)(inp)
        else:
            hl = keras.layers.Dense(n_neurons, activation=activation)(hl)
    concat = keras.layers.Concatenate()([hl, inp])  # concat deep and wide paths
    main_output = keras.layers.Dense(1, name="main_output")(concat)  # combined output
    aux_output = keras.layers.Dense(1, name="aux_output")(hl)  # deep output
    model = keras.Model(inputs=[inp], outputs=[main_output, aux_output])
    model.compile(
        loss=["mse", "mse"],
        loss_weights=[0.9, 0.1],  # weighting heavily towards main output
        optimizer=keras.optimizers.SGD(learning_rate=5e-3),
    )
    return model


# MLFlow Experiment Tracking

[MLFLow](https://www.mlflow.org) is a full-featured end-to-end ML lifecycle management platform, but all I'll be using it for in this example is experiment tracking.  The [documentation on tracking](https://www.mlflow.org/docs/latest/tracking.html) is quite good and so I'm working primarily from that.  There is even an automatic logging submodule for Keras and Tensorflow, `mlflow.tensorflow.autolog()` which I will try out first. In addition, nearly anything can be logged manually, including categories of metrics, parameters, tags, and artifacts.  Artifacts can be nearly anything including plots or the modles themselves.

## Auto-Logging

First I'll give the auto logging a shot and see what it logs.  I've had issues with the auto logging submodule for scikit-learn because it simply logged too many parameters to be useful and after the experiment count reached a certain threshold, performance in the dashboard tool suffered.

**Note**: the auto-logging submodule only works for tensorflow versions 2.3.0 - 2.11.0, which I had to specify manually with `pip` as the `conda` installer chose a version outside of that range.

By default, on first execution, MLFlow creates the directory `./mlruns` on default and stores all experiment related information as individual files within.  Another option is to store the information as a SQLite database, or incorporate into Databricks. 

In [6]:
import mlflow

mlflow.set_experiment(experiment_name="auto-log")
mlflow.tensorflow.autolog()  # turn on auto logging

model = build_model(n_hidden=2)

with mlflow.start_run(run_name="auto-log-1") as run:
    history = model.fit(
        X_train,
        [y_train, y_train],
        epochs=250,
        validation_data=(X_valid, [y_valid, y_valid]),
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=0,
    )
    total_loss, main_loss, aux_loss = model.evaluate([X_test], [y_test, y_test])
    mlflow.log_metrics(
        {  # manually log test losses
            "test_total": total_loss,
            "test_main": main_loss,
            "test_aux": aux_loss,
        }
    )



2023-02-14 15:13:56.531114: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.






INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpzkj3b5iy/model/data/model/assets


INFO:tensorflow:Assets written to: /var/folders/m8/0_prp1tj41s9n5xm0bfqp6wm0000gn/T/tmpzkj3b5iy/model/data/model/assets




## Local Server

To serve the exploration tool locally, you simply run the following in the command line:

```zsh
mlflow ui
```

By default, this will look for local logged files in `./mlruns` and it will launch on port 5000.  If you need to change the location use `--backend-store-uri` and if you need to specify the port use `-p`.

The UI provides a simple table comparison of all runs within an experiment so you can quickly check the parameters used for each run and corresponding metrics to choose promising model candidates.  It looks like the auto logger logs everything I need, including other helpful information like early stopping results and learning curves for each loss (see below).  It looks like the best option is the one investigated above: use the auto logger with a couple additional manually logged metrics and/or tags where necessary. 

![MLFLow UI Screenshot](images/mlflow_ui_sn.png)