
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Tracking and Evaluation Experiments with MLflow

In this demo, we will learn how to use MLflow integration in Databricks to track and evaluate experiments.

**Learning Objectives**

*By the end of this demo, you will be able to:*

* Create MLflow experiment to track development

* Log runs to capture individual changes

* Calculate evaluation metrics using mlflow.evaluate API

## Setup Demo Environment

### Install dependencies and configure environment

In [0]:
!pip install mlflow databricks_genai_inference==0.2.3 evaluate torch transformers textstat
dbutils.library.restartPython()

### Reset MLflow Experiment

In [0]:
import mlflow

# Default MLflow experiment name in Databricks is /Users/{username}/{notebook filename}
notebook_name = dbutils.notebook.entry_point.getDbutils().notebook().getContext().notebookPath().get()
experiment = mlflow.get_experiment_by_name(notebook_name)
print(experiment)

from mlflow.utils.databricks_utils import get_databricks_host_creds
from mlflow.utils.request_utils import augmented_raise_for_status
from mlflow.utils.rest_utils import http_request
import time

def delete_all_runs(experiment_id: str) -> int:
    """
    Bulk delete all runs in an experiment.
    
    :param experiment_id: The ID of the experiment containing the runs to delete.
    :return: The number of runs deleted.
    """
    # Current time in milliseconds
    max_timestamp_millis = int(time.time() * 1000)
    
    json_body = {
        "experiment_id": experiment_id, 
        "max_timestamp_millis": max_timestamp_millis,
        "max_runs": 10000  # Maximum allowed value
    }
    
    response = http_request(
        host_creds=get_databricks_host_creds(),
        endpoint="/api/2.0/mlflow/databricks/runs/delete-runs",
        method="POST",
        json=json_body,
    )
    
    augmented_raise_for_status(response)
    return response.json()["runs_deleted"]

experiment_id = experiment.experiment_id
deleted_runs_count = delete_all_runs(experiment_id)
print(f"Deleted {deleted_runs_count} runs.")

## Create an experiment to compare multiple models
Here we will use MLflow to run an experiment to compare multiple models.

### Create Evaluation Dataset
In order to do that we are going to first define an evaluation data set to use as input for each model to compare the outputs against.

In [0]:
import pandas as pd

eval_data = pd.DataFrame(
    {
        "inputs": [
            "What is MLflow?",
            "What is Spark?",
        ],
        "ground_truth": [
            "MLflow is an open-source platform for managing the end-to-end machine learning (ML) "
            "lifecycle. It was developed by Databricks, a company that specializes in big data and "
            "machine learning solutions. MLflow is designed to address the challenges that data "
            "scientists and machine learning engineers face when developing, training, and deploying "
            "machine learning models.",
            "Apache Spark is an open-source, distributed computing system designed for big data "
            "processing and analytics. It was developed in response to limitations of the Hadoop "
            "MapReduce computing model, offering improvements in speed and ease of use. Spark "
            "provides libraries for various tasks such as data ingestion, processing, and analysis "
            "through its components like Spark SQL for structured data, Spark Streaming for "
            "real-time data processing, and MLlib for machine learning tasks",
        ],
    }
)

### Define Experiment Configuration and Variables
Next, we are going to define the models we want to compare and their configurations.  Keeping the model parameters the same here allows us to focus on the differences between the models.

In [0]:
MODELS = [
          "databricks-dbrx-instruct", 
          "databricks-llama-2-70b-chat", 
          "databricks-mixtral-8x7b-instruct"
        ]
max_tokens=128
temperature=0.0

### Create runs using MLFlow
Now we are going to use MLFlow to automate the experiment by having it generate runs and evaluations for each of the models against our eval dataset.

In [0]:
import mlflow
import mlflow.deployments
import pandas as pd
import mlflow.metrics.genai

mlflow.deployments.set_deployments_target("databricks")

# Loop throw our models
for model in MODELS:
  # Create a MLflow run for each
  with mlflow.start_run() as run:
    mlflow.log_params({
      "model": model,
      "max_tokens": max_tokens,
      "temperature": temperature,
    })

    # Calculating evaluation metrics for each run
    results = mlflow.evaluate(
      model=f"endpoints:/{model}",
      data=eval_data,
      targets="ground_truth",
      inference_params={
        "max_tokens":   max_tokens,
        "temperature":  temperature,
      },
      model_type="question-answering",
      extra_metrics=[mlflow.metrics.token_count(), mlflow.metrics.latency()]
    )

Go over MLflow tracking artifacts for a run
* Select the **beaker** icon on the right navigation bar
  * Select a run to open up details in MLflow run UI
* Go over
  * **Overview** tab
  * **System Metric** tab
  * **Artifacts** tab

### Compare models using the MLflow
Now that we have ran our experiment against the three models and calculated the metrics we can compare it using MLflow.

* Select the **beaker** icon on the right navigation bar
  * Select **Experiment UI** to open up the MLflow experiments UI
* Select the **Chart** icon to compare the runs in graphical view

## Conclusion
In this demo we learned how MLflow integration into Databricks make it easy to track and log our data science and machine learning experiments.  As well as tools to store and compare results.


&copy; 2024 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>