# AI Clinique #13 : Model Lifecycle Management with mlflow

- Date : 29-10-2021
- Presentators : A. Massiot and N. Clavel
- Dataset : For this hands-on, we will be using the [Power Plant dataset](https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant) where the goal is to predict the net hourly electrical energy output (EP) of a plant.
- Packages : requirements.txt

### Table of contents
- 1. Case introduction
- 2. Train 1st regression model
- 3. Track experiments with MLflow Tracking
- 4. Visualize experiments with MLflow tracking UI
- 5. Serve model with MLflow Model
- 6. Search in experiment
- 7. Load model from experiment
- 8. Backend & artifact Stores (to go further)

#### Imports

In [None]:
from datetime import datetime

import mlflow
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, max_error
import pickle
import lightgbm as lgb

pd.set_option('display.max_columns', None)

## 1. Case introduction

#### Load the Power Plant dataset

In [None]:
# Power Plant dataset
df = pd.read_csv('../input_data/power_plants.csv')
df.head()

In [None]:
df.describe()

Features consist of hourly :
- Ambient Temperature (AT)
- Ambient Pressure (AP)
- Relative Humidity (RH)
- Exhaust Steam Vacuum (V)

...to predict the net hourly electrical energy output (PE) of the plant.  

A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, the other three ambient variables effect the GT performance.

In [None]:
# features & target
target = 'PE'
features = ['AT', 'V', 'AP', 'RH']

# Split data
X = df[features]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 2. Train 1st regression model

In [None]:
# Fit model
max_depth = 6
model = RandomForestRegressor(max_depth=max_depth)
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
me = max_error(y_test, y_pred)
print(f'Test mse = {mse}, Test max error = {me}, Random forest max depth = {max_depth}')

In [None]:
# save model as a pickle file
model_filename = '../models/29-10-2021-rf-model-v3.pkl'
pickle.dump(model, open(model_filename, "wb"))

In [None]:
# load model and test it
model = pickle.load(open(model_filename, "rb"))

# check results on test
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
me = max_error(y_test, y_pred)
print(f'Test mse = {mse}, Test max error = {me}, Random forest max depth = {max_depth}')

#### 29-10-2021-rf-model-v1.pkl is an artifact :  
In common ML term , it is used to describe the output created by the training processfile generated by the training. It can be a model (pickle, joblib format), a model checkpoints, an image...

#### Let's say we want to modify the hyperparam and change the features

In [None]:
features = ['V', 'AP']

# Split data
X = df[features]
y = df[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
max_depth = 7
model = RandomForestRegressor(max_depth=max_depth)
model.fit(X_train, y_train)

# Evaluate the model
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
me = max_error(y_test, y_pred)
print(f'Test mse = {mse}, Test max error = {me}, Random forest max depth = {max_depth}')

In [None]:
# save model as a pickle file v2
model_filename = '../models/29-10-2021-rf-model-v5.pkl'
pickle.dump(model, open(model_filename, 'wb'))

In [None]:
!ls ../models/

#### If another Datascient or ML engineer or DevOps, get the code, he may have no clue about the hyperparameters, metrics obtained, and all information regarding how this model has been trained. Maybe he want to know what was the performance of the model, which were the features used...

#### ...It's hard to keep tracks of experiments configurations, params, metrics, models artifacts, features used...

## 3. MLflow Tracking

#### Vocabulary:
- **run**: single execution of model training code. Each run can record different informations (model parameters, metrics, tags, artifacts, etc).
- **experiment**: the primary unit of organization and access control for MLflow runs; all MLflow runs belong to an experiment. Experiments let you visualize, search for, and compare runs, as well as download run artifacts and metadata for analysis in other tools.

#### 3.1. Random Forest experimentation

In [None]:
!ls

In [None]:
experiment_name = 'ep_prediction_with_random_forest'
mlflow.set_experiment(experiment_name)

In [None]:
!ls

In [None]:
!ls mlruns/

In [None]:
!ls mlruns/1/

In [None]:
!cat mlruns/1/meta.yaml 

#### Log metrics & params, model & tag
1. Log max_depth as param
2. Log Tag
3. Log features as param
4. Log metrics
5. Log features as artifact
6. Log model as artifact
7. Log model with log_model (MLFlow Model)

In [None]:
with mlflow.start_run():
    
    features = ['AT', 'V', 'AP']
    max_depth = 10
    tag = {'artifact': 'without artifacts log'}
    
    # Split data
    X = df[features]
    y = df[target]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Fit model
    model = RandomForestRegressor(max_depth=max_depth)
    model.fit(X_train, y_train)
    
    ##--- Log tags
    mlflow.set_tags(tag)

    ##--- Log params
    mlflow.log_param('max_depth', max_depth)
    #mlflow.log_param('features', features)
    
    
    ##--- Log artifacts
    
    #-Log features artifact
    features_filename = '../features.txt'
    with open(features_filename, 'w') as f:
        f.write(str(features))
    mlflow.log_artifact(features_filename, artifact_path='features')
    
    #-Log model with log_model
    mlflow.sklearn.log_model(model, 'rf_model')
    
    #-Log model artifact
    #model_filename = '../models/29-10-2021-rf-model.pkl'
    #pickle.dump(model, open(model_filename, 'wb'))
    #mlflow.log_artifact(model_filename)
    
    # Get artfact URI
    #artifact_uri = mlflow.get_artifact_uri()
    #print(f'Artifact uri: {artifact_uri}', '\n')
    #features_artifact_uri = mlflow.get_artifact_uri(artifact_path='features/features.txt')
    #print(f'Features artifact uri: {features_artifact_uri}', '\n')
    #model_artifact_uri = mlflow.get_artifact_uri(artifact_path='29-10-2021-rf-model-v4.pkl')
    #print(f'Model artifact uri: {model_artifact_uri}', '\n')
    
    # Evaluate the model
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    me = max_error(y_test, y_pred)

    ##--- Log metrics
    mlflow.log_metrics({'mse': mse, 'me': me}) # or mlflow.log_metric('mse', mse) & mlflow.log_metric('me', me)
    print(f'Test mse = {mse}, Test max error  = {me}, Random forest max depth = {max_depth}')

#### 3.2. Lightgbm experimentation
1. Create a new experiment
2. Log tag, hyperparams as params, features and model as artifacts with mlflow.lightgbm.log_model (MLFlow model)

In [None]:
experiment_name = 'ep_prediction_with_lightgbm'
mlflow.set_experiment(experiment_name)

In [None]:
!cat  mlruns/2/meta.yaml

In [None]:
with mlflow.start_run():
    
    features = ['AT', 'V', 'AP', 'RH']
    early_topping_rounds = 10
    parameters = {
        'objective': 'regression',
        'metric': 'rmse',
        'num_leaves': 40,
        'learning_rate': 0.05,
        'verbose': 0
    }
    
    # Split data
    X = df[features]
    y = df[target]
    
    X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
    X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=1) # 0.25 x 0.8 = 0.2
    
    train_data = lgb.Dataset(X_train, label=y_train)
    val_data = lgb.Dataset(X_val, label=y_val)

    # Fit model
    model = lgb.train(parameters,
                      train_data,
                      valid_sets=val_data,
                      early_stopping_rounds=early_topping_rounds)

    ##--- Log params : model hyperparameters & features
    mlflow.log_params(parameters)
    mlflow.log_param('early_topping_rounds', early_topping_rounds)
    mlflow.log_param('features', features)
    mlflow.lightgbm.log_model(model, 'lgbm_model')
    
    ##- Get model artifact URI
    model_artifact_uri = mlflow.get_artifact_uri(artifact_path='lgbm_model')
    print(f'Model artifact uri: {model_artifact_uri}', '\n')

    # Evaluate the model
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    me = max_error(y_test, y_pred)

    ##--- Log metrics
    mlflow.log_metrics({'mse': mse, 'me': me})
    print(f'Test mse = {mse}, Test max error  = {me}')

#### 3.3. Autolog

- Automatic logging allows you to log metrics, parameters, and models without the need for explicit log statements
- Be careful : autolog does not log test metrics, so you need to log them with log_metrics()

In [None]:
experiment_name = 'ep_prediction_with_lightgbm_autolog'
mlflow.set_experiment(experiment_name)

In [None]:
with mlflow.start_run():
    features = ["AT", "V", "AP", "RH"]
    max_depth = 6
    
    mlflow.autolog()

    # Split data
    X = df[features]
    y = df[target]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Fit model
    model = RandomForestRegressor(max_depth=max_depth)
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    me = max_error(y_test, y_pred)

    ##--- mlflow: log metrics
    mlflow.log_metrics({'test_mse': mse, 'test_me': me})
    print(f'Test mse = {mse}, Test max error  = {me}')

#### 3.4. Conclusion :
MLflow Tracking enable to track experimentations configurations.
- Log parameters (hyperparameters, features, and others...) : log_param() or log_params()
- Log metrics : log_metric() or log_metrics()
- Log artifacts (models, and any files) : log_artifact() or log_artifacts()
- Set tags : set_tag() or set_tags()
- Log model : abstraction to load any kind of model library (sklearn, lightgbm, tensorflow, pytorch...) : mlflow.library.log_model()
- Autolog() : log everything except metrics performance

## 4. Serve predictions with MLflow model

In [None]:
# run in shell
#!mlflow models serve -m file:///C:/Users/nicolas.clavel/Documents/projets/Engie/mlflow_hands_on/notebooks/mlruns/2/27ead034ce7a4a02891697a226e910dc/artifacts/lgbm_model -p 1234 --no-conda

In [None]:
!curl -X POST -H "Content-Type:application/json; format=pandas-split" --data "{\"columns\":[\"AT\", \"V\", \"AP\", \"RH\"],\"data\":[[12.8, 0.029, 0.48, 0.98]]}" http://127.0.0.1:1234/invocations

## 5. Visualize experiments with MLflow tracking UI

To run the [MLflow Tracking UI](https://www.mlflow.org/docs/latest/tracking.html#tracking-ui), you can run the command ```mlflow ui``` (needs to be executed from the *notebooks* folder)

In [None]:
# run in command line
#!mlflow ui

## 6. Search  in experiments
- [In the UI directly](https://www.mlflow.org/docs/latest/search-syntax.html#search)
- [Programmatically with search_runs](https://www.mlflow.org/docs/latest/search-syntax.html#programmatically-searching-runs)

- Get the id of the experiment where we want to search runs

In [None]:
experiment_name = 'ep_prediction_with_random_forest'
mlflow.get_experiment_by_name(experiment_name)

In [None]:
experiment_id = mlflow.get_experiment_by_name(experiment_name).experiment_id
experiment_id

- Get all runs for the experiment

In [None]:
mlflow.search_runs(experiment_id)

- Filter runs by max_depth and mse and order them by mse

In [None]:
mlflow.search_runs(
    experiment_id,
    filter_string=f"metrics.me <= 30",
    order_by=['metrics.me asc']
)

## 7. Load model from experiment

- [More informations on other format of model_uri](https://www.mlflow.org/docs/latest/python_api/mlflow.sklearn.html#mlflow.sklearn.load_model)

#### 7.1. With the results of search_run()

In [None]:
experiment_id = '2'
run = mlflow.search_runs(
    experiment_id,
    order_by=['metrics.mse asc']
).iloc[0]
run

In [None]:
run.artifact_uri

In [None]:
model = mlflow.lightgbm.load_model(model_uri=f'{run.artifact_uri}/lgbm_model')
model

In [None]:
model.predict(df[:5][['AT', 'V', 'AP', 'RH']])

#### 7.2. With the run id

In [None]:
!ls mlruns/1/

In [None]:
!ls mlruns/1/f4d1ec170de24fe69e2d2d4774e956e7/artifacts

In [None]:
run_id = '8bb8efe5f46e45b3920ec0f83f33cf58'
model_uri = f'runs:/{run_id}/rf_model_v3'

model = mlflow.sklearn.load_model(model_uri=model_uri)
model

model.predict(df[:5][['AT', 'V', 'AP', 'RH']])

## 8. Backend & artifact Stores (to go further)

#### Where mlflow saves the data :
- in local filesystem : mlruns/
- in backend & artifact stores (local or remote)

#### Some vocabulary:
- **Backend store**: for MLflow entities (parameters, metrics, tags, metadata, etc) ~ SQL, SQLite, Postgres
- **Artefact store**: for artifacts (files, models, images, etc)
- For more information, [check the official documentation](https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded)

#### 8.1. Local file system mlruns if no prior config
- When no prior configuration is set, MLflow creates an *mlruns* folder where the data will be saved

In [None]:
!ls

- MLflow created a new folder *mlruns* where it will store the different run informations

In [None]:
!tree mlruns

#### 8.2. Backend (sqlite) & Artifact stores locally (to go further)

- Set the **Backend store** to an sqlite database located in */tmp/mlruns.db* and the **Artefact store**  to a folder located in */tmp/mlruns*. For more informations on the different possibilities available (S3, blobstorage, etc) check [the official documentation](https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded).
- To run the MLflow server, you needd to execute the following command in your terminal
```mlflow server --backend-store-uri sqlite:////tmp/mlruns.db --default-artifact-root /tmp/mlruns```
- Set the tracking uri in the notebook ```mlflow.set_tracking_uri('http://127.0.0.1:5000')```

#### 8.3. Backend & Artefact stores remotely (to go further)
- Documentation : https://www.mlflow.org/docs/latest/tracking.html#where-runs-are-recorded