## Example of using MLflow as experiment tracking and model registry

MLflow setup- Scenario 1:
* Tracking server: no
* Backend store: local filesystem
* Artifacts store: local filesystem, i.e. /mlruns

The experiments can be explored locally by launching the MLflow UI.

In [28]:
import os
import sys
import mlflow 

sys.path.append('..')

# Import our custom preprocessing functions

from taxi_ride.data.preprocess_data import (
    get_project_paths
)

# Get project paths from pyproject.toml - this should point to project root
paths = get_project_paths()
RAW_DATA_DIR = paths["RAW_DATA_DIR"]
PROCESSED_DATA_DIR = paths["PROCESSED_DATA_DIR"]


# Let's manually verify and set the correct paths
# Since we're in notebooks/, we need to go up one level to project root
NOTEBOOK_DIR = os.getcwd()
PROJECT_ROOT = os.path.dirname(NOTEBOOK_DIR)  # Go up one level

print(f"\nCurrent notebook directory: {NOTEBOOK_DIR}")
print(f"Project root: {PROJECT_ROOT}")

# Construct correct paths manually
MODELS_ARTIFACTS = os.path.join(PROJECT_ROOT, 'models')
MLFLOW_TRACKING_URI = os.path.join(PROJECT_ROOT, 'models', 'mlruns')

# Create the directory if it doesn't exist
os.makedirs(MLFLOW_TRACKING_URI, exist_ok=True)

# Set MLflow tracking URI to project root's models directory
mlflow.set_tracking_uri(f"file://{MLFLOW_TRACKING_URI}")

print(f"\nCorrected paths:")
print(f"Models artifacts directory: {MODELS_ARTIFACTS}")



Current notebook directory: /home/lisanab/fujitsu_laptop_files/MLOps/cookiecutter/cookiecutter-ml-course/notebooks
Project root: /home/lisanab/fujitsu_laptop_files/MLOps/cookiecutter/cookiecutter-ml-course

Corrected paths:
Models artifacts directory: /home/lisanab/fujitsu_laptop_files/MLOps/cookiecutter/cookiecutter-ml-course/models


In [12]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'file:///home/lisanab/fujitsu_laptop_files/MLOps/cookiecutter/cookiecutter-ml-course/models/mlruns'


In [15]:
mlflow.search_experiments()

[<Experiment: artifact_location='file:///home/lisanab/fujitsu_laptop_files/MLOps/cookiecutter/cookiecutter-ml-course/models/mlruns/840068150784448506', creation_time=1763558260303, experiment_id='840068150784448506', last_update_time=1763558260303, lifecycle_stage='active', name='my-experiment-1', tags={}>]

### Creating an experiment and logging a new run

In [None]:
import mlflow
import mlflow.data
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_tracking_uri("http://127.0.0.1:5000")

# 1. Load the Iris dataset
X, y = load_iris(return_X_y=True)
feature_names = load_iris().feature_names
df = pd.DataFrame(X, columns=feature_names)
df["target"] = y

print(df.head(10))

# 2. Create a Dataset object for MLflow
dataset = mlflow.data.from_pandas(
    df,
    source="scikit‚Äëlearn:load_iris",   # you can pick a descriptive string
    name="iris_dataset",
    targets="target"

)

# 3. Start an MLflow run and log dataset, model, etc.
mlflow.set_experiment("iris‚Äëexperiment‚Äëwith‚Äëdataset")


with mlflow.start_run():
    # Log the dataset as input
    mlflow.log_input(dataset, context="training")

    # Log parameters
    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    # Train model
    lr = LogisticRegression(**params, max_iter=200)  # max_iter increased for safety
    lr.fit(X, y)

    # Predict and log metric
    #Input: 4 numeric features (flower measurements)
    #Output: A class label (0, 1, or 2) representing which iris species the flower belongs to, e.g. Iris setosa, iris versicolor, or iris virginica.
    y_pred = lr.predict(X)
    acc = accuracy_score(y, y_pred)
    mlflow.log_metric("accuracy", acc)

    # Log the model (sklearn) with a small input_example for inference
    mlflow.sklearn.log_model(
        lr,
        name="model",
        input_example=X[:5]
    )

    print(f"Run‚ÄØID: {mlflow.active_run().info.run_id}")


   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   
5                5.4               3.9                1.7               0.4   
6                4.6               3.4                1.4               0.3   
7                5.0               3.4                1.5               0.2   
8                4.4               2.9                1.4               0.2   
9                4.9               3.1                1.5               0.1   

   target  
0       0  
1       0  
2       0  
3       0  
4       0  
5       0  
6       0  
7       0  
8       0  
9       0 

  return _dataset_source_registry.resolve(
  return _dataset_source_registry.resolve(


Run‚ÄØID: a912c4072b244a79a69ad3164e621f97
üèÉ View run sedate-mule-532 at: http://127.0.0.1:5000/#/experiments/268956126154477783/runs/a912c4072b244a79a69ad3164e621f97
üß™ View experiment at: http://127.0.0.1:5000/#/experiments/268956126154477783


In [26]:
mlflow.search_experiments()

[<Experiment: artifact_location='mlflow-artifacts:/723872913022227632', creation_time=1763558435991, experiment_id='723872913022227632', last_update_time=1763558435991, lifecycle_stage='active', name='my-experiment-1', tags={'mlflow.experimentKind': 'custom_model_development'}>,
 <Experiment: artifact_location='mlflow-artifacts:/0', creation_time=1763558317564, experiment_id='0', last_update_time=1763558317564, lifecycle_stage='active', name='Default', tags={'mlflow.experimentKind': 'custom_model_development'}>]

### Interacting with the model registry

In [16]:
from mlflow.tracking import MlflowClient


client = MlflowClient()

In [17]:
from mlflow.exceptions import MlflowException

try:
    client.search_registered_models()
except MlflowException:
    print("It's not possible to access the model registry :(")

## Scenario 2: A cross-functional team with one data scientist working on an ML model


MLflow setup:
- tracking server: yes, local server
- backend store: sqlite database
- artifacts store: local filesystem

The experiments can be explored locally by accessing the local tracking server.

To run this example you need to launch the mlflow server locally by running the following command in your terminal:

`mlflow server --backend-store-uri sqlite:///backend.db`

In [19]:
import mlflow


mlflow.set_tracking_uri("http://127.0.0.1:5000")

In [20]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://127.0.0.1:5000'


## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.
```bash
mlflow server \
  --backend-store-uri "postgresql://<USER>:<PASSWORD>@<HOST>:<PORT>/<DB_NAME>" \
  --artifacts-destination "s3://<YOUR_BUCKET_NAME>/mlflow-artifacts" \
  --host 0.0.0.0 \
  --port 5000
```

In [None]:
import mlflow

#set the environmental vars to allow 'mlflow_user' to track experiments using MLFlow
import os
import getpass

# IMPORTANT CONSTANTS TO DEFINE

# Remote MLFlow server
MLFLOW_REMOTE_SERVER="your-mlflow-server.com"  # e.g., http://mlflow.your-domain.com:5000
#Set the MLflow server and backend and artifact stores
mlflow.set_tracking_uri(MLFLOW_REMOTE_SERVER)

# for direct API calls via HTTP we need to inject credentials
MLFLOW_TRACKING_USERNAME = 'lisana.berberi@kit.edu'
MLFLOW_TRACKING_PASSWORD =  getpass.getpass()  # inject password by typing manually
# for MLFLow-way we have to set the following environment variables
os.environ['MLFLOW_TRACKING_USERNAME'] = MLFLOW_TRACKING_USERNAME
os.environ['MLFLOW_TRACKING_PASSWORD'] = MLFLOW_TRACKING_PASSWORD
#os.environ["LOGNAME"] = MLFLOW_TRACKING_USERNAME # User who is logging the experiment, if not set then the default value of a user will be your local username

# Name of the experiment (e.g. name of the  code repository)
mlflow.set_experiment("green-taxi-duration-x")