# Experiment Tracking and Model Management with MLFlow

There are many ways to use the MLFlow Tracking API. For simple local uses, the best is to leave the data management to MLFlow and let it store runs, metrics, models and artifacts locally. For more advanced usage, all of this information can be stored in databases. You can find the detailed on MLFlow's documentation [here](https://mlflow.org/docs/latest/tracking.html#scenario-1-mlflow-on-localhost).

## Exploring MLFlow

MLflow setup:
* Tracking server: no
* Backend store: local filesystem
* Artifacts store: local filesystem

The experiments can be explored locally by launching the MLflow UI.

Let's print the tracking server URI, where the experiments and runs are going to be logged. We observe it refers to a local path.

In [2]:
import mlflow

mlflow.set_tracking_uri("http://localhost:5002")

print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://localhost:5002'


After this initialization, we can connect create a client to connect to the API and see what experiments are present.

By refering to mlflow's [documentation](https://mlflow.org/docs/latest/python_api/mlflow.client.html), create a client and display a list of the available experiments using the search_experiments function. This function could prove useful later to programatically explore experiments (rather than in the UI)

In [3]:
from mlflow import MlflowClient

client = MlflowClient()
experiments = client.search_experiments()
for experiment in experiments:
    print(f"Experiment ID: {experiment.experiment_id}, Name: {experiment.name}")
### STRIP_START ###
### STRIP_END ###

Experiment ID: 588343512068939149, Name: taxi-trip-duration
Experiment ID: 605513876365684597, Name: iris-experiment-1
Experiment ID: 0, Name: Default


We see that there is a default experiment for which the runs are stored locally in the mlruns folder.

### Creating an experiment and logging a new run

An experiment is a logical entity regrouping the logs of multiple attempts at solving a same problem, called runs. \
We will now work with the classic sklearn dataset iris. Our goal here is to manage to classify the different iris species. To track our models performance, we will log every attempt as a "run" and create a new experiment "iris-experiment-1" to regroup them.

Lookup the mlflow.run and mlflow.start_run functions [here](https://mlflow.org/docs/latest/python_api/mlflow.html?highlight=start_run#mlflow.start_run) to find out how to manage runs.
Explore [this part](https://mlflow.org/docs/latest/python_api/mlflow.html) to learn more about the log_params, log_metrics and log_artifact functions. Find out how to log sklearn models [here](https://mlflow.org/docs/latest/python_api/mlflow.sklearn.html])

Complete the following in order to log the parameters, interesting metrics and the model.

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("iris-experiment-1")

with mlflow.start_run() as run:
    run_id = run.info.run_id
    
    # Set tags for the run
    mlflow.set_tag("model", "Logistic Regression")
    mlflow.set_tag("dataset", "Iris")
    
    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    model = LogisticRegression(**params).fit(X, y)
    y_pred = model.predict(X)
    
    ### STRIP_START ###
    accuracy = accuracy_score(y, y_pred)
    mlflow.log_metric("accuracy", accuracy)
    mlflow.log_params(params)
    mlflow.sklearn.log_model(model, "model") 
    ### STRIP_END ###   
    
    # Register your model in mlflow model registry
    result = mlflow.register_model(f"runs:/{run_id}/model", "iris_lr_model")

    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

Registered model 'iris_lr_model' already exists. Creating a new version of this model...
2025/02/05 16:54:39 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris_lr_model, version 3


default artifacts URI: 'mlflow-artifacts:/605513876365684597/991ff26161f9418587c9d05088e59dba/artifacts'
🏃 View run bouncy-wolf-956 at: http://localhost:5002/#/experiments/605513876365684597/runs/991ff26161f9418587c9d05088e59dba
🧪 View experiment at: http://localhost:5002/#/experiments/605513876365684597


Created version '3' of model 'iris_lr_model'.


Try running the training script with various parameters to have runs to compare.
You can now explore your run(s) using the ui: \
(Paste "mlflow ui --host 0.0.0.0 --port 5002" in your terminal, or run the cell below)

**N.B.** Make sure you are in the lecture folder and not the repo root!

In [5]:
!mlflow ui --host 0.0.0.0 --port 5004

[2025-02-05 16:54:41 +0100] [23225] [INFO] Starting gunicorn 23.0.0
[2025-02-05 16:54:41 +0100] [23225] [INFO] Listening at: http://0.0.0.0:5004 (23225)
[2025-02-05 16:54:41 +0100] [23225] [INFO] Using worker: sync
[2025-02-05 16:54:41 +0100] [23226] [INFO] Booting worker with pid: 23226
[2025-02-05 16:54:41 +0100] [23227] [INFO] Booting worker with pid: 23227
[2025-02-05 16:54:41 +0100] [23228] [INFO] Booting worker with pid: 23228
[2025-02-05 16:54:41 +0100] [23229] [INFO] Booting worker with pid: 23229
[2025-02-05 16:55:30 +0100] [23225] [CRITICAL] WORKER TIMEOUT (pid:23228)
[2025-02-05 16:55:30 +0100] [23228] [ERROR] Error handling request (no URI read)
Traceback (most recent call last):
  File "/home/lordbenzo/.pyenv/versions/3.10.12/envs/Artefact/lib/python3.10/site-packages/gunicorn/workers/sync.py", line 133, in handle
    req = next(parser)
  File "/home/lordbenzo/.pyenv/versions/3.10.12/envs/Artefact/lib/python3.10/site-packages/gunicorn/http/parser.py", line 41, in __next__


You will have to kill the cell to continue experimenting

### Interacting with the model registry

If you are satisfied with the last run's model, you can transform the logged model into a registered model. It will be logged in the Model Registry, which makes it easier to use in production and manage versions.

In [6]:
# We already have our run id from above. Let's use it to register the model

result = mlflow.register_model(f"runs:/{run_id}/models", "iris_lr_model")

Registered model 'iris_lr_model' already exists. Creating a new version of this model...
2025/02/05 16:56:16 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris_lr_model, version 4
Created version '4' of model 'iris_lr_model'.


In [None]:
# Use Case

Now we will get back to our taxi rides use case: 

In [7]:
import pandas as pd
import seaborn as sns
import numpy as np

from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LinearRegression

from sklearn.metrics import root_mean_squared_error

from typing import List
from scipy.sparse import csr_matrix

## 0 - Download Data

In [8]:
!pip install gdown --quiet
import gdown
import os

DATA_FOLDER = "../../data"
train_path = f"{DATA_FOLDER}/yellow_tripdata_2021-01.parquet"
test_path = f"{DATA_FOLDER}/yellow_tripdata_2021-02.parquet"
predict_path = f"{DATA_FOLDER}/yellow_tripdata_2021-03.parquet"


if not os.path.exists(DATA_FOLDER):
    os.makedirs(DATA_FOLDER)
    print(f"New directory {DATA_FOLDER} created!")

    gdown.download(
        "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet",
        train_path,
        quiet=False,
    )
    gdown.download(
        "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-02.parquet",
        test_path,
        quiet=False,
    )
    gdown.download(
        "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-03.parquet",
        predict_path,
        quiet=False,
    )

## 1 - Load data

In [9]:
def load_data(path: str):
    return pd.read_parquet(path)


train_df = load_data(train_path)
train_df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee
0,1,2021-01-01 00:30:10,2021-01-01 00:36:12,1.0,2.1,1.0,N,142,43,2,8.0,3.0,0.5,0.0,0.0,0.3,11.8,2.5,
1,1,2021-01-01 00:51:20,2021-01-01 00:52:19,1.0,0.2,1.0,N,238,151,2,3.0,0.5,0.5,0.0,0.0,0.3,4.3,0.0,
2,1,2021-01-01 00:43:30,2021-01-01 01:11:06,1.0,14.7,1.0,N,132,165,1,42.0,0.5,0.5,8.65,0.0,0.3,51.95,0.0,
3,1,2021-01-01 00:15:48,2021-01-01 00:31:01,0.0,10.6,1.0,N,138,132,1,29.0,0.5,0.5,6.05,0.0,0.3,36.35,0.0,
4,2,2021-01-01 00:31:49,2021-01-01 00:48:21,1.0,4.94,1.0,N,68,33,1,16.5,0.5,0.5,4.06,0.0,0.3,24.36,2.5,


## 2 - Prepare the data

Let's prepare the data to make it Machine Learning ready. \
For this, we need to clean it, compute the target (what we want to predict), and compute some features to help the model understand the data better.

### 2-1 Compute the target

We want to predict a taxi trip duration in minutes. Let's compute it as a difference between the drop-off time and the pick-up time for each trip.

In [10]:
def compute_target(
    df: pd.DataFrame,
    pickup_column: str = "tpep_pickup_datetime",
    dropoff_column: str = "tpep_dropoff_datetime",
) -> pd.DataFrame:
    df["duration"] = df[dropoff_column] - df[pickup_column]
    df["duration"] = df["duration"].dt.total_seconds() / 60
    return df


train_df = compute_target(train_df)

In [11]:
train_df["duration"].describe()

count    1.369769e+06
mean     1.391168e+01
std      1.312006e+02
min     -1.350846e+05
25%      5.566667e+00
50%      9.066667e+00
75%      1.461667e+01
max      2.881770e+04
Name: duration, dtype: float64

Let's remove outliers and reduce the scope to trips between 1 minute and 1 hour

In [12]:
MIN_DURATION = 1
MAX_DURATION = 60


def filter_outliers(df: pd.DataFrame, min_duration: int = 1, max_duration: int = 60) -> pd.DataFrame:
    return df[df["duration"].between(min_duration, max_duration)]


train_df = filter_outliers(train_df)

### 2-2 Prepare features

#### 2-2-1 Categorical features

Most machine learning models don't work with categorical features. Because of this, they must be transformed so that the ML model can consume them.

In [13]:
CATEGORICAL_COLS = ["PULocationID", "DOLocationID"]


def encode_categorical_cols(df: pd.DataFrame, categorical_cols: List[str] = None) -> pd.DataFrame:
    if categorical_cols is None:
        categorical_cols = ["PULocationID", "DOLocationID", "passenger_count"]
    df[categorical_cols] = df[categorical_cols].fillna(-1).astype("int")
    df[categorical_cols] = df[categorical_cols].astype("str")
    return df


train_df = encode_categorical_cols(train_df)

In [14]:
def extract_x_y(
    df: pd.DataFrame,
    categorical_cols: List[str] = None,
    dv: DictVectorizer = None,
    with_target: bool = True,
) -> dict:

    if categorical_cols is None:
        categorical_cols = ["PULocationID", "DOLocationID", "passenger_count"]
    dicts = df[categorical_cols].to_dict(orient="records")

    y = None
    if with_target:
        if dv is None:
            dv = DictVectorizer()
            dv.fit(dicts)
        y = df["duration"].values

    x = dv.transform(dicts)
    return x, y, dv


X_train, y_train, dv = extract_x_y(train_df)

## 3 - Train model

We train a basic linear regression model to have a baseline performance

In [15]:
def train_model(x_train: csr_matrix, y_train: np.ndarray):
    lr = LinearRegression()
    lr.fit(x_train, y_train)
    return lr


model = train_model(X_train, y_train)

## 4 - Evaluate model

We evaluate the model on train and test data

### 4-1 On train data

In [16]:
def predict_duration(input_data: csr_matrix, model: LinearRegression):
    return model.predict(input_data)


def evaluate_model(y_true: np.ndarray, y_pred: np.ndarray):
    return root_mean_squared_error(y_true, y_pred)


prediction = predict_duration(X_train, model)
train_me = evaluate_model(y_train, prediction)
train_me

6.782411841111456

### 4-2 On test data

In [17]:
test_df = load_data(test_path)

In [18]:
test_df = compute_target(test_df)
test_df = encode_categorical_cols(test_df)
X_test, y_test, _ = extract_x_y(test_df, dv=dv)

In [19]:
y_pred_test = predict_duration(X_test, model)
test_me = evaluate_model(y_test, y_pred_test)
test_me

58.375056461631154

## 5 - Log Model Parameters to MlFlow

Now that all our development functions are built and tested, let's create a training pipeline and log the training parameters, logs and model to MlFlow.

Create a training flow, log all the important parameters, metrics and model. Try to find what could be important and needs to be logged.

In [20]:
# Set the experiment name
mlflow.set_experiment("taxi-trip-duration")

# Start a run
with mlflow.start_run() as run:
    run_id = run.info.run_id

    # Set tags for the run
    mlflow.set_tag("model", "Linear Regression")
    mlflow.set_tag("dataset", "Taxi Trip Data")

    # Load data
    train_df = load_data(train_path)
    test_df = load_data(test_path)

    # Compute target
    train_df = compute_target(train_df)
    test_df = compute_target(test_df)

    # Filter outliers
    train_df = filter_outliers(train_df)
    test_df = filter_outliers(test_df)
    
    # Print columns to debug
    print("Train DataFrame columns:", train_df.columns)
    print("Test DataFrame columns:", test_df.columns)
    
    # Encode categorical columns
    train_df = encode_categorical_cols(train_df, CATEGORICAL_COLS)
    test_df = encode_categorical_cols(test_df, CATEGORICAL_COLS)

    # Extract X and y
    X_train, y_train, dv = extract_x_y(train_df, categorical_cols=CATEGORICAL_COLS)
    X_test, y_test, _ = extract_x_y(test_df, categorical_cols=CATEGORICAL_COLS, dv=dv)

    # Train model
    model = train_model(X_train, y_train)

    # Evaluate model
    train_pred = predict_duration(X_train, model)
    train_rmse = evaluate_model(y_train, train_pred)
    mlflow.log_metric("train_rmse", train_rmse)

    # Evaluate model on test set
    test_pred = predict_duration(X_test, model)
    test_rmse = evaluate_model(y_test, test_pred)
    mlflow.log_metric("test_rmse", test_rmse)

    # Log your model
    mlflow.sklearn.log_model(model, "model")

    # Register your model in mlflow model registry
    result = mlflow.register_model(f"runs:/{run_id}/model", "taxi_trip_lr_model")

Train DataFrame columns: Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'RatecodeID', 'store_and_fwd_flag',
       'PULocationID', 'DOLocationID', 'payment_type', 'fare_amount', 'extra',
       'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge',
       'total_amount', 'congestion_surcharge', 'airport_fee', 'duration'],
      dtype='object')
Test DataFrame columns: Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'RatecodeID', 'store_and_fwd_flag',
       'PULocationID', 'DOLocationID', 'payment_type', 'fare_amount', 'extra',
       'mta_tax', 'tip_amount', 'tolls_amount', 'improvement_surcharge',
       'total_amount', 'congestion_surcharge', 'airport_fee', 'duration'],
      dtype='object')


Registered model 'taxi_trip_lr_model' already exists. Creating a new version of this model...
2025/02/05 16:57:21 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: taxi_trip_lr_model, version 2


🏃 View run adorable-shark-585 at: http://localhost:5002/#/experiments/588343512068939149/runs/a83f8d8db58d4b95aafea7bf7fe0679d
🧪 View experiment at: http://localhost:5002/#/experiments/588343512068939149


Created version '2' of model 'taxi_trip_lr_model'.


If the model is satisfactory, we stage it as production using the appropriate version. This will help us retreiving it for predictions.

Create a mlflow client and use the [mlflow documentation](https://mlflow.org/docs/latest/python_api/mlflow.client.html?highlight=transition_model_version_stage#mlflow.client.MlflowClient.transition_model_version_stage) to stage the appropriate model as being in "production".

In [21]:
client = MlflowClient()
### STRIP_START ###
#TODO:
client.transition_model_version_stage(
    name="taxi_trip_lr_model",
    version=result.version,
    stage="Production"
)
### STRIP_END ###

  client.transition_model_version_stage(


<ModelVersion: aliases=[], creation_timestamp=1738771041041, current_stage='Production', description='', last_updated_timestamp=1738771041075, name='taxi_trip_lr_model', run_id='a83f8d8db58d4b95aafea7bf7fe0679d', run_link='', source='mlflow-artifacts:/588343512068939149/a83f8d8db58d4b95aafea7bf7fe0679d/artifacts/model', status='READY', status_message=None, tags={}, user_id='', version='2'>

## 6 - Predict

We can now use our model to predict on fresh unseen data and forecast what is going to be the duration of a tawi trip depending on trip characteristics.

In [22]:
import mlflow

In [27]:
# List all registered models
client = mlflow.tracking.MlflowClient()
for rm in client.search_registered_models():
    print(f"Name: {rm.name}")
    for version in rm.latest_versions:
        print(f"  Version: {version.version}")
        print(f"  Stage: {version.current_stage}")
        print(f"  URI: {version.source}")

# Set the mlflow_experiment_path to the name of the registered model
mlflow_experiment_path = "taxi_trip_lr_model"

Name: iris_lr_model
  Version: 4
  Stage: None
  URI: mlflow-artifacts:/605513876365684597/991ff26161f9418587c9d05088e59dba/artifacts/models
Name: taxi_trip_lr_model
  Version: 2
  Stage: Production
  URI: mlflow-artifacts:/588343512068939149/a83f8d8db58d4b95aafea7bf7fe0679d/artifacts/model


In [28]:
# Load prediction data
predict_df = load_data(predict_path)

# Apply feature engineering
predict_df = encode_categorical_cols(predict_df)
X_pred, _, _ = extract_x_y(predict_df, dv=dv, with_target=False)

# Load production model in mlflow in a variable `model`
model = None
### STRIP_START ###
model_uri = f"models:/{mlflow_experiment_path}/production"
model = mlflow.sklearn.load_model(model_uri) 
### STRIP_END ###

# Make predictions
y_pred = predict_duration(X_pred, model)
y_pred

  latest = client.get_latest_versions(name, None if stage is None else [stage])
  from .autonotebook import tqdm as notebook_tqdm
Downloading artifacts: 100%|██████████| 5/5 [00:00<00:00, 16.28it/s]


array([11.36649613, 11.92660868, 11.92660868, ...,  8.94074986,
       16.73054341, 18.67003372], shape=(1925152,))

## 7 - To go further

If you managed to go this far, you can try solving the use case using an other regression model like [XGBoost](https://xgboost.readthedocs.io/en/stable/) for instance.