# Experiment Tracking with MLFlow

In this demo we will see how to use MLFlow for tracking experiments, using a toy data set. In the attached lab (below), you will download a larger dataset and attempt to train the best model that you can.

We should first install mlflow, and add it to the requirements.txt file.

`pip install mlflow` or `python3 -m pip install mlflow` or 


In [86]:
import mlflow
import pandas as pd
import os
import json
from google.cloud import storage
from google.oauth2 import service_account

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_wine
from sklearn.metrics import accuracy_score

After loading the libraries, we can first check the mlflow version you have. And, just for fun, let's look at the mlflow UI by running `mlflow ui`. After this, we should do two things:
- set the tracking uri
- create or set the experiment

Setting the tracking uri tells mlflow where to save the results of our experiments. We will save these locally in a sqlite instance. If you want, you can also experiment with different ways of recording your runs, such as on a remote server (perhaps in GCP or AWS), but for this course it is not required.

If you've already created an experiment previously that you'd like to use, you can tell mlflow by setting the experiment. You can also use `set_experiment` even if the experiment has not yet been created - mlflow will first check if the experiment exists, and if not, it will create it for you. 

In [30]:
mlflow.__version__

'2.2.2'

In [34]:
def fetch_gcs_bucket():
    credentials = service_account.Credentials.from_service_account_file('../mlopspoc-project-904c861ab479.json')
    client = storage.Client(credentials=credentials, project=credentials.project_id)
    bucket = client.get_bucket("chemical_property_data")
    return bucket

def fetch_blob_to_df(bucket, blob_name):
    blob = bucket.blob(blob_name)
    temp_file = '/tmp/' + blob_name.split('/')[-1]
    blob.download_to_filename(temp_file)
    df = pd.read_csv(temp_file)
    return df

In [35]:
df = fetch_blob_to_df(fetch_gcs_bucket(), 'datasets/classification/ames.tab.csv')

In [36]:
df.to_csv("../data/classification/ames.tab.csv", index=False)

In [62]:
mlflow.set_tracking_uri('sqlite:///mlflow.db')
mlflow.set_experiment('demo-chem-ames-classification')

<Experiment: artifact_location='/Users/gurug/USF/MLOPs MSDS 626/MLops_POC/notebooks/mlruns/1', creation_time=1680123440916, experiment_id='1', last_update_time=1680123440916, lifecycle_stage='active', name='demo-chem-ames-classification', tags={}>

From here, we can take a look at the data. Then let's play around with some models, without using mlflow for now, to get a sense of why mlflow might come in handy.

In [41]:
df.head()

Unnamed: 0,ABC,ABCGG,nAcid,nBase,SpAbs_A,SpMax_A,SpDiam_A,SpAD_A,SpMAD_A,LogEE_A,...,MW,AMW,WPath,WPol,Zagreb1,Zagreb2,mZagreb1,mZagreb2,property,Drug
0,21.165481,15.890359,0.0,0.0,34.747536,2.633113,5.266226,34.747536,1.336444,4.235114,...,342.064057,9.501779,1331.0,55.0,152.0,192.0,7.833333,5.5,1,O=[N+]([O-])c1ccc2ccc3ccc([N+](=O)[O-])c4c5ccc...
1,18.906262,13.884794,0.0,0.0,31.449179,2.607123,5.214247,31.449179,1.367356,4.123629,...,301.110279,7.923955,970.0,48.0,136.0,172.0,5.861111,4.861111,1,O=[N+]([O-])c1c2c(c3ccc4cccc5ccc1c3c45)CCCC2
2,41.909982,27.436902,0.0,0.0,68.986805,2.703846,5.287332,68.986805,1.379736,4.91462,...,646.116486,9.501713,8149.0,118.0,312.0,407.0,13.388889,10.305556,0,O=c1c2ccccc2c(=O)c2c1ccc1c2[nH]c2c3c(=O)c4cccc...
3,7.289847,7.483711,0.0,0.0,12.806544,2.074313,4.148627,12.806544,1.164231,3.219608,...,157.059974,8.725554,188.0,10.0,42.0,42.0,5.472222,2.833333,1,[N-]=[N+]=CC(=O)NCC(=O)NN
4,7.249407,6.976306,0.0,0.0,11.945822,2.267184,4.534368,11.945822,1.194582,3.197666,...,138.017775,11.501481,116.0,12.0,46.0,51.0,4.333333,2.361111,1,[N-]=[N+]=C1C=NC(=O)NC1=O


In [39]:
df.describe()

Unnamed: 0,ABC,ABCGG,nAcid,nBase,SpAbs_A,SpMax_A,SpDiam_A,SpAD_A,SpMAD_A,LogEE_A,...,TSRW10,MW,AMW,WPath,WPol,Zagreb1,Zagreb2,mZagreb1,mZagreb2,property
count,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,...,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0,7278.0
mean,12.799813,10.890535,0.131217,0.149217,21.16982,2.354348,4.658381,21.16982,1.244268,3.623935,...,53.538026,242.263256,8.544627,782.987084,25.486123,85.647156,100.528854,6.135634,3.755806,0.546029
std,6.288677,4.488191,0.431622,0.48019,10.219996,0.219055,0.418718,10.219996,0.097152,0.51935,...,14.800335,107.899443,3.27234,1269.27399,16.989088,45.631956,58.24869,2.781058,1.631686,0.497911
min,0.0,0.0,0.0,0.0,2.0,1.0,2.0,2.0,0.8,1.407606,...,7.493061,41.026549,4.148446,1.0,0.0,2.0,1.0,0.75,0.75,0.0
25%,8.113471,7.76336,0.0,0.0,13.571944,2.247466,4.472136,13.571944,1.193872,3.302156,...,42.132146,165.039006,7.004143,156.0,13.0,52.0,56.0,4.277778,2.5625,0.0
50%,12.239595,10.597207,0.0,0.0,20.345386,2.375406,4.706009,20.345386,1.25482,3.699881,...,52.8975,230.071154,7.948769,442.0,22.0,82.0,94.0,5.611111,3.583333,1.0
75%,16.647311,13.440677,0.0,0.0,27.475554,2.51134,4.958928,27.475554,1.311538,3.998897,...,63.974559,296.060407,9.051875,898.0,36.0,114.0,137.75,7.305556,4.611111,1.0
max,44.256184,36.144607,4.0,7.0,70.836678,3.203127,6.108831,70.836678,1.465206,4.960596,...,115.108035,795.175432,78.744249,15949.0,126.0,320.0,417.0,25.388889,12.583333,1.0


In [55]:
df_clean = df.fillna(0)

In [56]:
import xgboost as xgb
y = df_clean.property
X = df_clean.drop(["property","Drug"], axis=1)
xgb_model = xgb.XGBClassifier()
xgb_model.fit(X, y)

In [57]:
xgb_model.score(X, y)

1.0

## Train a Model Using MLFLow

In this section, let's train a simple decision tree model, where we will now adjust the maximum depth (`max_depth`) of the tree, and save the results of each run of the experiment using mlflow. To do so, we need to tell mlflow to start recording. We do this with `start_run`. 

The things we might want to record in this simple case are:
- the value of `max_depth`
- the corresponding accuracy of the model

We can also tag each run to make it easier to identify them later.

After running the below code, be sure to check the mlflow UI by running the following in the terminal from the same directory as where you saved this notebook:

`mlflow ui` note that just running this you will not see any of your experiments. You must specify the uri (the place where all of your results are being stored)

`mlflow ui --backend-store-uri sqlite:///mlflow.db`

In [63]:
with mlflow.start_run():
    # log parameters and log metrics
    # parameters: hyperparameters
    # metrics: model performance metrics

    mlflow.set_tags({"Model":"decision-tree", "Train Data": "all-data"})

    tree_depth = 5
    dt = DecisionTreeClassifier(max_depth=tree_depth)
    dt.fit(X, y)
    acc = accuracy_score(y, dt.predict(X))

    mlflow.log_param("max_depth", tree_depth)
    mlflow.log_metric("accuracy", acc)

mlflow.end_run()

Let's do it again, but this time we'll use a random forest, which has some other hyperparameters we can tune, which makes keeping track of things a little more complex without a tool like mlflow.

In [61]:
from sklearn.ensemble import RandomForestClassifier

with mlflow.start_run():
    mlflow.set_tags({"Model":"random-forest", "Train Data": "all-data"})

    ntree = 1000
    mtry = 4

    mlflow.log_params({'n_estimators':ntree, 'max_features':mtry})

    rf = RandomForestClassifier(n_estimators = ntree, max_features = mtry, oob_score = True)
    rf.fit(X,y)
    acc = rf.oob_score_
    #acc = accuracy_score(y, rf.predict(X))
    mlflow.log_metric('accuracy', acc)

mlflow.end_run()

Typically, in a real-world scenario, you wouldn't change your parameter values manually and re-run your code, you would either use a loop to loop through different parameter values, or you'd use a built-in method for doing cross-validation, of which there are a few. First, let's use a simple loop to run the experiment multiple times, and save the results of each run.

In [64]:
ntrees = [20,40,60,80,100]
mtrys = [3,4,5]
for i in ntrees:
    for j in mtrys:
        with mlflow.start_run():
            mlflow.set_tags({"Model":"random-forest", "Train Data": "all-data"})

            mlflow.log_params({'n_estimators':i, 'max_features':j})

            rf = RandomForestClassifier(n_estimators = i, max_features = j, oob_score = True)
            rf.fit(X,y)
            acc = rf.oob_score_
            #acc = accuracy_score(y, rf.predict(X))
            mlflow.log_metric('accuracy', acc)
        mlflow.end_run()

  warn(


## Training a Model with mlflow and hyperopt

One way of tuning your model is to use the `hyperopt` library. `hyperopt` is a library that does hyperparameter tuning, and does so in a way that makes it easy for mlflow to keep track of the results. 

First, install the libraries you don't have, and then load them below.

For this exercise, we'll split the data into training and validation, and then we'll train decision trees and random forests and use `hyperopt` to do the hyperparameter tuning and find the best model for us.

In [65]:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score, train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

From the above we will use `cross_val_score` for our metric, `fmin` is used by `hyperopt` to do the tuning, `tpe` (Tree of Parzen Estimators) is the algorithm used to search the hyperparameter space,  `hp` has methods we need to use for defining our search space, `STATUS_OK` is a status message that each run completed, and `Trials` keeps track of each run.

In [66]:
def objective(params):
    with mlflow.start_run():
        classifier_type = params['type']
        del params['type']
        if classifier_type == 'dt':
            clf = DecisionTreeClassifier(**params)
        elif classifier_type == 'rf':
            clf = RandomForestClassifier(**params)        
        else:
            return 0
        acc = cross_val_score(clf, X, y).mean()

        mlflow.set_tag("Model", classifier_type)
        mlflow.log_params(params)
        mlflow.log_metric("accuracy", acc)

        return {'loss': -acc, 'status': STATUS_OK}

search_space = hp.choice('classifier_type', [
    {
        'type': 'dt',
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
        'max_depth': hp.choice('dtree_max_depth', [None, hp.randint('dtree_max_depth_int', 1,10)]),
        'min_samples_split': hp.randint('dtree_min_samples_split', 2,10)
    },
    {
        'type': 'rf',
        'n_estimators': hp.randint('rf_n_estimators', 20, 500),
        'max_features': hp.randint('rf_max_features', 2,9),
        'criterion': hp.choice('criterion', ['gini', 'entropy'])
    },
])

algo = tpe.suggest
trials = Trials()

In [67]:
best_result = fmin(
        fn=objective, 
        space=search_space,
        algo=algo,
        max_evals=32,
        trials=trials)

100%|██████████| 32/32 [10:50<00:00, 20.34s/trial, best loss: -0.8318263094294022]


In [68]:
best_result

{'classifier_type': 1,
 'criterion': 1,
 'rf_max_features': 6,
 'rf_n_estimators': 253}

### Using Autologging

Rather than manually logging parameters and metrics, mlflow has an autolog feature, which is compatible with a subset of python libraries, such as sklearn. Autologging makes it easy to log all of the important stuff, without having to manually write lines of code to log the parameters. However, sometimes you will want to have finer control over what gets logged, and should instead skip autologging.

In [69]:

with mlflow.start_run():
    mlflow.sklearn.autolog()
    tree_depth = 5
    dt = DecisionTreeClassifier(max_depth=tree_depth)
    dt.fit(X_train, y_train)
    mlflow.sklearn.autolog(disable=True)




# Artifact Tracking and Model Registry

In this section we will save some artifacts from our model as we go through the model development process. There are a few things that might be worth saving, such as datasets, plots, and the final model itself that might go into production later.

## Data

First, let's see how we can store our important datasets, in a compressed format, for use for later, for example, in case we get a new request about our model and need to run some analyses (such as "what is the distribution of this feature, but only for this specific subset of data?" or "how did the model do on these particular observations from your validation set?").

In [70]:
import os 

os.makedirs('../data/save_data', exist_ok = True)

X_train.to_parquet('../data/save_data/x_train.parquet')


In [71]:
X_test.to_parquet('../data/save_data/x_test.parquet')

mlflow.log_artifacts('../data/save_data/')

## Images

As part of the model dev process you may end up creating visualizations that can be useful for analysis, or for reporting. You can use mlflow to log the important ones and ignore the rest. After creating the below figure, save into a folder called images, and then you can log whatever is in the `images` folder as an artifact.

In [73]:
os.makedirs('../data/images', exist_ok = True)
X_train.iloc[:,2:5].plot.density(subplots = True, figsize = (20,10), layout = (4,4), sharey = False, sharex = False)

array([[<Axes: ylabel='Density'>, <Axes: ylabel='Density'>,
        <Axes: ylabel='Density'>, <Axes: ylabel='Density'>],
       [<Axes: ylabel='Density'>, <Axes: ylabel='Density'>,
        <Axes: ylabel='Density'>, <Axes: ylabel='Density'>],
       [<Axes: ylabel='Density'>, <Axes: ylabel='Density'>,
        <Axes: ylabel='Density'>, <Axes: ylabel='Density'>],
       [<Axes: ylabel='Density'>, <Axes: ylabel='Density'>,
        <Axes: ylabel='Density'>, <Axes: ylabel='Density'>]], dtype=object)

In [74]:
mlflow.log_artifacts('../data/images')

In [75]:
mlflow.end_run()

## Model Management and Model Registry

As you are developing your models you may want to save certain versions of the model, or maybe even all of them, so that you don't have to go back and retrain them later. We can do this in mlflow by logging the models, not as artifacts, but as models, using `log_model`. 

In this section we'll log a couple of models to see how mlflow handles model management. Above, we used `hyperopt` to train a bunch of models at once. Let's do this again, and log some of the models that we train.

### Logging as an Artifact

First we can try logging a model as an artifact. To do this, we must first save the model itself, which we can do by using the `pickle` library. We then log the model as an artifact like we did with data and images. 

In [79]:
import pickle

os.makedirs('models', exist_ok = True)

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)

with open('./models/model.pkl','wb') as f:
    pickle.dump(dt,f)

# First we'll log the model as an artifact
mlflow.log_artifact('./models/model.pkl', artifact_path='my_models')

### Logging as a Model

Logging the model as an artifact only logs the pickle file (the serialized version of the model). It's not really very useful, especially since models contain so much metadata that might be critical to know for deploying the model later. mlflow has a built-in way of logging models specifically, so let's see how to use this, and how it's different from logging models as an artifact.

In [78]:
# Let's do it again, but this time we will log the model using log_model
mlflow.sklearn.log_model(dt, artifact_path = 'better_models')

<mlflow.models.model.ModelInfo at 0x7f7c8ed47a30>

Ok, so if you go to the mlflow UI at this point you can see the difference in `log_artifact`, which simply logs the pickle file, and `log_model`, which also gives you information about the environment, required packages, and model flavor.

Let's do this one more time, but this time let's use `hyperopt` and log all of the trained models separately.

In [80]:
mlflow.end_run()
mlflow.set_experiment('demo-chem-artifacts')
def objective(params):
    with mlflow.start_run():
        classifier_type = params['type']
        del params['type']
        if classifier_type == 'dt':
            clf = DecisionTreeClassifier(**params)
        elif classifier_type == 'rf':
            clf = RandomForestClassifier(**params)        
        else:
            return 0
        acc = cross_val_score(clf, X, y).mean()

        mlflow.set_tag("Model", classifier_type)
        mlflow.log_params(params)
        mlflow.log_metric("accuracy", acc)
        mlflow.sklearn.log_model(clf, artifact_path = 'better_models')

        return {'loss': -acc, 'status': STATUS_OK}
search_space = hp.choice('classifier_type', [
    {
        'type': 'dt',
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
        'max_depth': hp.choice('dtree_max_depth', [None, hp.randint('dtree_max_depth_int', 1,10)]),
        'min_samples_split': hp.randint('dtree_min_samples_split', 2,10)
    },
    {
        'type': 'rf',
        'n_estimators': hp.randint('rf_n_estimators', 20, 500),
        'max_features': hp.randint('rf_max_features', 2,9),
        'criterion': hp.choice('criterion', ['gini', 'entropy'])
    },
])

algo = tpe.suggest
trials = Trials()
best_result = fmin(
        fn=objective, 
        space=search_space,
        algo=algo,
        max_evals=10,
        trials=trials)

2023/03/29 14:34:47 INFO mlflow.tracking.fluent: Experiment with name 'demo-chem-artifacts' does not exist. Creating a new experiment.


100%|██████████| 10/10 [1:05:37<00:00, 393.80s/trial, best loss: -0.834987632642272]


### Loading Models

Now that models have been logged, you can load specific models back into python for predicting and further analysis. There are two main ways to do this. The mlflow UI actually gives you some instructions, with code that you copy and paste.

In [81]:
logged_model = 'runs:/35a6ed3c32164bd0a77cc6e0969fac3c/better_models'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
loaded_model

mlflow.pyfunc.loaded_model:
  artifact_path: better_models
  flavor: mlflow.sklearn
  run_id: 35a6ed3c32164bd0a77cc6e0969fac3c

In [82]:
sklearn_model = mlflow.sklearn.load_model(logged_model)
sklearn_model

In [83]:
sklearn_model.fit(X_train, y_train)
preds = sklearn_model.predict(X_test)
preds[:5]

array([0, 1, 0, 0, 0])

### Model Registry

Typically, you will **register** your *chosen* model, the model you plan to put into production. But, sometimes, after you've chosen and registered a model, you may need to replace that model with a new version. For example, the model may have gone into production and started to degrade in performance, and so the model needed to be retrained. Or, you go to deploy your model and notice an error or bug, and now have to go back and retrain it.

In this section let's see how we take our logged models and register them in the model registry, which then can get picked up by the production process, or engineer, for deployment. First, I'll demonstrate how this is done within the UI, but then below I'll show how we can use the python API to do the same thing.

In [84]:
runid = '35a6ed3c32164bd0a77cc6e0969fac3c'
mod_path = f'runs:/{runid}/artifacts/better_models'
mlflow.register_model(model_uri = mod_path, name = 'ames_classification_model_from_nb')

2023/03/29 16:08:41 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2023/03/29 16:08:41 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
Successfully registered model 'ames_classification_model_from_nb'.
2023/03/29 16:08:41 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: ames_classification_model_from_nb, version 1
Created version '1' of model 'ames_classification_model_from_nb'.


<ModelVersion: creation_timestamp=1680131321526, current_stage='None', description=None, last_updated_timestamp=1680131321526, name='ames_classification_model_from_nb', run_id='35a6ed3c32164bd0a77cc6e0969fac3c', run_link=None, source=('/Users/gurug/USF/MLOPs MSDS '
 '626/MLops_POC/notebooks/mlruns/2/35a6ed3c32164bd0a77cc6e0969fac3c/artifacts/artifacts/better_models'), status='READY', status_message=None, tags={}, user_id=None, version=1>

# Experiment Tracking and Model Registry Lab

## Overview

In this lab you will each download a new dataset and attempt to train a good model, and use mlflow to keep track of all of your experiments, log your metrics, artifacts and models, and then register a final set of models for "deployment", though we won't actually deploy them anywhere. 

## Goal

Your goal is **not** to become a master at MLFlow - this is not a course on learning all of the ins and outs of MLFlow. Instead, your goal is to understand when and why it is important to track your model development process (tracking experiments, artifacts and models) and to get into the habit of doing so, and then learn at least the basics of how MLFlow helps you do this so that you can then compare with other tools that are available.

## Data

You can choose your own dataset to use here, but keep in mind that whatever dataset you choose, you should continue to use the same dataset throughout the rest of the course to make your life easy. Also, it will be helpful to choose a dataset that is already fairly clean and easy to work with. You can even use a dataset that you've used in a previous course if it is interesting enough. Doing all of this will make it easier for you to complete all of the stages of the project. 

There are tons of places where you can find open public datasets. Choose something that interests you.

[Kaggle Datasets](https://www.kaggle.com/datasets)  
[UCI](https://archive.ics.uci.edu/ml/datasets.php)  
[Open Data on AWS](https://registry.opendata.aws/)  
[Yelp](https://www.yelp.com/dataset)  
[MovieLens](https://grouplens.org/datasets/movielens/)  
And so many more...

## Instructions

Once you have selected a set of data, create a brand new experiment in MLFlow and begin exploring your data. Do some EDA, clean up, and learn about your data. You do not need to begin tracking anything yet, but you can if you want to (e.g. you can log different versions of your data as you clean it up and do any feature engineering). Do not spend a ton of time on this part. Your goal isn't really to build a great model, so don't spend hours on feature engineering and missing data imputation and things like that.

Once your data is clean, begin training models and tracking your experiments. **NOTE:** you will be referring back to your final model in the coming weeks, so please keep that in mind as you go through this. When you engineer new features, be sure to save the code that does this, as you will need this in the future. If your final model has 1000 complex features, you might have a difficult time deploying it later on. If your final model takes 15 minutes to train, or takes a long time to score a new batch of data, you may want to think about training a less complex model.

At a minimum, you should:

1. Try at least 3 different algorithms and engineer at least 2 new features.
2. Do hyperparameter tuning for each model.
3. Do some feature selection, and repeat the above steps with these reduced sets of features.
4. Remember to log your results so that you can compare models and choose your favorite.
5. Choose the top 3 best models and retrain them on the training + validation set and register these models.
6. Choose the final model you want to deploy and stage it (in MLFlow) and run it on the test set to get a final measure of performance.
7. Log the exact training, validation, and testing datasets for the 3 best models, as well as hyperparameter values, and the values of your metrics.

### Final Project

For the final project, after you've completed the above steps, repeat those steps using at least one other tool. Some possible options are Weights and Biases, DVC, Dagshub, or Metaflow. Check [here](https://mymlops.com/builder) for a list of other options. 

Once you've worked with other tools, make a comparison. List out pros and cons of each. Think about user-friendliness, costs, integrations with other tools, etc.. You **do not** need to decide which tool you'd pick for your stack yet, because that may depend on the other tools you choose for the rest of the pipeline. Just be sure to write up a good comparison, with evidence, so that you can include it in your final presentation.