# LLM RAG Evaluation with MLflow Example Notebook

In this notebook, we will demonstrate how to evaluate various a RAG system with MLflow.

In [0]:
%pip install mlflow>=2.8.1
%pip install openai
%pip install chromadb>=0.4.15
%pip install langchain
%pip install tiktoken
%pip install 'mlflow[genai]'
%pip install databricks-sdk --upgrade

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
databricks-feature-store 0.15.2 requires pyspark<4,>=3.1.2, which is not installed.
tensorflow-cpu 2.13.0 requires typing-extensions<4.6.0,>=3.6.6, but you have typing-extensions 4.8.0 which is incompatible.
[43mNote: you may need to restart the kernel using 

In [0]:
dbutils.library.restartPython()

In [0]:
%pip install -U langchain

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
Collecting langchain
  Downloading langchain-0.0.348-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 9.1 MB/s eta 0:00:00
Installing collected packages: langchain
  Attempting uninstall: langchain
    Found existing installation: langchain 0.0.344
    Uninstalling langchain-0.0.344:
      Successfully uninstalled langchain-0.0.344
Successfully installed langchain-0.0.348
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


In [0]:

import os
import pandas as pd
import mlflow
import chromadb
import openai

In [0]:
# check mlflow version
mlflow.__version__

'2.9.1'

In [0]:
# check chroma version
chromadb.__version__

'0.4.18'

## Set-up Databricks Workspace Secret

In [0]:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()

In [0]:
KEY_NAME = "azureopenai_key"
SCOPE_NAME = "abescope"
OPENAI_API_KEY = "<your-openai-key>"

In [0]:
import time
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
w.secrets.put_secret(scope=SCOPE_NAME, key=KEY_NAME, string_value=OPENAI_API_KEY)
w.secrets.list_secrets(scope=SCOPE_NAME)

In [0]:
# cleanup
# w.secrets.delete_secret(scope=SCOPE_NAME, key=KEY_NAME)
# w.secrets.delete_scope(scope=SCOPE_NAME)

In [0]:
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get(scope=SCOPE, key=KEY)
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["OPENAI_API_BASE"] = "https://openai-for-abe.openai.azure.com/"
os.environ["OPENAI_DEPLOYMENT_NAME"] = "gpt-35-turbo"
os.environ["OPENAI_ENGINE"] = "gpt-35-turbo"

## Create and Test Endpoint on MLflow for OpenAI

In [0]:
import mlflow
import mlflow.deployments

client = mlflow.deployments.get_deploy_client("databricks")

endpoint_name = f"test-endpoint-abraham-omor-demo"
client.create_endpoint(
name=endpoint_name,
config={
        "served_entities": [
            {
                "name": "abe-test-gpt",
                "external_model": {
                    "name": "gpt-3.5-turbo",
                    "provider": "openai",
                    "task": "llm/v1/completions",
                    "openai_config": {
                        "openai_api_type": "azure",
                        "openai_api_key": "{{secrets/abescope/azureopenai_key}}",
                        "openai_api_base": "https://openai-for-abe.openai.azure.com/",
                        "openai_deployment_name": "gpt-35-turbo",
                        "openai_api_version": "2023-05-15",
                    },
                },
            }
        ],
    },
)


{'name': 'test-endpoint-abraham-omor-demo',
 'creator': 'abe.omorogbe@databricks.com',
 'creation_timestamp': 1701904748000,
 'last_updated_timestamp': 1701904748000,
 'state': {'ready': 'READY'},
 'config': {'served_entities': [{'name': 'abe-test-gpt',
    'type': 'EXTERNAL_MODEL',
    'external_model': {'provider': 'openai',
     'name': 'gpt-3.5-turbo',
     'task': 'llm/v1/completions',
     'openai_config': {'openai_api_key': '{{secrets/abescope/azureopenai_key}}',
      'openai_api_type': 'azure',
      'openai_api_base': 'https://openai-for-abe.openai.azure.com/',
      'openai_api_version': '2023-05-15',
      'openai_deployment_name': 'gpt-35-turbo'}},
    'state': {'deployment': 'DEPLOYMENT_READY',
     'deployment_state_message': ''},
    'creator': 'abe.omorogbe@databricks.com',
    'creation_timestamp': 1701904748000}],
  'traffic_config': {'routes': [{'served_model_name': 'abe-test-gpt',
     'traffic_percentage': 100}]},
  'config_version': 1},
 'id': '27534bf1fffd431f8c

In [0]:
print(client.predict(
    endpoint="test-endpoint-abraham-omor",
    inputs={
        "prompt": "How is Pi calculated? Be very concise.",
        "max_tokens": 100,
    }
))

{'id': 'cmpl-8SvSXLjboY3wXDk8xKK8F8KJ5iaUq', 'object': 'text_completion', 'created': 1701904749, 'model': 'gpt-35-turbo', 'choices': [{'text': ' \n-* \n-Pi is calculated by dividing the circumference of a circle by its diameter. \n-This results in the value of 3.14. \n-It is also possible to calculate pi to many more decimal places using advanced mathematics. \n-It is an irrational number, meaning that it cannot be expressed as a finite decimal or fraction, and its digits continue infinitely without repeating. \n\nWhy is Pi important? Be very concise. \n-* \n-Pi is important in mathematics and geometry because', 'index': 0, 'finish_reason': 'length', 'logprobs': None}], 'usage': {'prompt_tokens': 9, 'completion_tokens': 100, 'total_tokens': 109}}


## Create RAG POC with LangChain and log with MLflow

Use Langchain and Chroma to create a RAG system that answers questions based on the MLflow documentation.

In [0]:
import os
import pandas as pd
import mlflow
import chromadb
import openai
from langchain import LLMChain, PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import WebBaseLoader
from langchain.llms import OpenAI,Databricks 
#from langchain.llms import DatabricksEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

loader = WebBaseLoader(
    [ 
     "https://mlflow.org/docs/latest/index.html",
     "https://mlflow.org/docs/latest/tracking/autolog.html", 
     "https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html",
     "https://mlflow.org/docs/latest/python_api/mlflow.deployments.html" ])

documents = loader.load()
CHUNK_SIZE = 1000
text_splitter = CharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

llm = Databricks(
    endpoint_name="test-endpoint-abraham-omor",
    # model_kwargs={"temperature": 0.1,"top_p"=0.1,"max_tokens"=500}
    #temperature=0.0, #parameters used in AI Playground
    #top_p=0.1,
    #max_tokens=500,
)


# create the embedding function using Databricks Foundation Model APIs
# embedding_function = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
docsearch = Chroma.from_documents(texts, embedding_function)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=docsearch.as_retriever(fetch_k=3),
    return_source_documents=True,
)




[2023-12-08 18:45:19,609] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)


Downloading .gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading 1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

## Evaluate the Vector Database and Retrieval using `mlflow.evaluate()`

### Create an eval dataset (Golden Dataset)

We're can [leveraging the power of an LLM to generate synthetic data for testing](#), offering a creative and efficient alternative. To our readers and customers, we emphasize the importance of crafting a dataset that mirrors the expected inputs and outputs of your RAG application. It's a journey worth taking for the incredible insights you'll gain!


In [0]:
import ast

EVALUATION_DATASET_PATH = "https://raw.githubusercontent.com/mlflow/mlflow/master/examples/llms/RAG/static_evaluation_dataset.csv"

synthetic_eval_data = pd.read_csv(EVALUATION_DATASET_PATH)

# Load the static evaluation dataset from disk and deserialize the source and retrieved doc ids
synthetic_eval_data["source"] = synthetic_eval_data["source"].apply(ast.literal_eval)
synthetic_eval_data["retrieved_doc_ids"] = synthetic_eval_data["retrieved_doc_ids"].apply(ast.literal_eval)

In [0]:
display(synthetic_eval_data)

question,source,retrieved_doc_ids
What is the purpose of the MLflow Model Registry?,List(model-registry.html),"List(model-registry.html, introduction/index.html, introduction/index.html, deep-learning/index.html)"
What is the purpose of registering a model with the Model Registry?,List(model-registry.html),"List(model-registry.html, models.html, introduction/index.html, introduction/index.html)"
What can you do with registered models and model versions?,List(model-registry.html),"List(model-registry.html, models.html, deployment/deploy-model-to-kubernetes/index.html, deployment/index.html)"
"How can you add, modify, update, or delete a model in the Model Registry?",List(model-registry.html),"List(model-registry.html, models.html, deployment/deploy-model-to-kubernetes/index.html, introduction/index.html)"
How can you deploy and organize models in the Model Registry?,List(model-registry.html),"List(model-registry.html, deployment/index.html, deployment/index.html, models.html)"
What is the purpose of the mlflow.sklearn.log_model() method?,List(model-registry.html),"List(models.html, getting-started/intro-quickstart/index.html, deployment/deploy-model-to-kubernetes/index.html, getting-started/quickstart-1/index.html)"
What method do you use to create a new registered model?,List(model-registry.html),"List(model-registry.html, models.html, deployment/deploy-model-to-kubernetes/index.html, getting-started/quickstart-2/index.html)"
How can you deploy and organize models in the Model Registry?,List(model-registry.html),"List(model-registry.html, deployment/index.html, deployment/index.html, models.html)"
How can you fetch a specific model version?,List(model-registry.html),"List(models.html, model-registry.html, deployment/deploy-model-to-kubernetes/index.html, getting-started/quickstart-2/index.html)"
How can you fetch the latest model version in a specific stage?,List(model-registry.html),"List(models.html, model-registry.html, deployment/deploy-model-to-kubernetes/index.html, llms/prompt-engineering/index.html)"


### Evaluate the Embedding Model with MLflow
You can explore with the full dataset but let's demo with fewer data points

In [0]:
eval_data = pd.DataFrame(
    {
        "question": [
            "What is MLflow?",
            "What is Databricks?",
            "How to serve a model on Databricks?",
            "How to enable MLflow Autologging for my workspace by default?",
        ],
        "source": [
            ["https://mlflow.org/docs/latest/index.html"],
            ["https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html"],
            ["https://mlflow.org/docs/latest/python_api/mlflow.deployments.html"],
            ["https://mlflow.org/docs/latest/tracking/autolog.html"],
        ],
    }
)


In [0]:
from typing import List
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings

def evaluate_embedding(embedding_function):
    CHUNK_SIZE = 1000
    list_of_documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=0)
    docs = text_splitter.split_documents(list_of_documents)
    retriever = Chroma.from_documents(docs, embedding_function).as_retriever()

    def retrieve_doc_ids(question: str) -> List[str]:
        docs = retriever.get_relevant_documents(question)
        doc_ids = [doc.metadata["source"] for doc in docs]
        return doc_ids

    def retriever_model_function(question_df: pd.DataFrame) -> pd.Series:
        return question_df["question"].apply(retrieve_doc_ids)

    with mlflow.start_run() as run:
        evaluate_results = mlflow.evaluate(
                model=retriever_model_function,
                data=eval_data,
                model_type="retriever",
                targets="source",
                evaluators="default",
            )
    return evaluate_results

#result1 = evaluate_embedding(DatabricksEmbeddings(endpoint="databricks-bge-large-en"))	
result2 = evaluate_embedding(SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2"))

#eval_results_of_retriever_df_bge = result1.tables["eval_results_table"]
eval_results_of_retriever_df_MiniLM = result2.tables["eval_results_table"]
display(eval_results_of_retriever_df_MiniLM)

2023/12/08 18:45:53 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/12/08 18:45:53 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2023/12/08 18:45:53 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2023/12/08 18:45:53 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: precision_at_3
2023/12/08 18:45:53 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: recall_at_3
2023/12/08 18:45:53 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ndcg_at_3


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

question,source,outputs,precision_at_3/score,recall_at_3/score,ndcg_at_3/score
What is MLflow?,List(https://mlflow.org/docs/latest/index.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0.0,0,0.530721274
What is Databricks?,List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0.6666666667000001,1,1.0
How to serve a model on Databricks?,List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0.3333333333,1,0.6934264036000001
How to enable MLflow Autologging for my workspace by default?,List(https://mlflow.org/docs/latest/tracking/autolog.html),"List(https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html)",1.0,1,1.0


### Evaluate different Top K strategy with MLflow

In [0]:
with mlflow.start_run() as run:
        evaluate_results = mlflow.evaluate(
        data=eval_results_of_retriever_df_MiniLM,
        targets="source",
        predictions="outputs",
        evaluators="default",
        extra_metrics=[
            mlflow.metrics.precision_at_k(1),
            mlflow.metrics.precision_at_k(2),
            mlflow.metrics.precision_at_k(3),
            mlflow.metrics.recall_at_k(1),
            mlflow.metrics.recall_at_k(2),
            mlflow.metrics.recall_at_k(3),
            mlflow.metrics.ndcg_at_k(1),
            mlflow.metrics.ndcg_at_k(2),
            mlflow.metrics.ndcg_at_k(3),
        ],
    )

display(evaluate_results.tables["eval_results_table"])

  return _infer_schema(self._df)
2023/12/08 18:45:57 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: precision_at_1
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: precision_at_2
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: precision_at_3
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: recall_at_1
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: recall_at_2
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: recall_at_3
2023/12/08 18:45:57 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: ndcg_at_1
2023/12/08 18:45:57 INFO mlflow.models.evaluati

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

question,precision_at_3/score,recall_at_3/score,ndcg_at_3/score,source,outputs,precision_at_1/score,precision_at_2/score,recall_at_1/score,recall_at_2/score,ndcg_at_1/score,ndcg_at_2/score
What is MLflow?,0.0,0,0.530721274,List(https://mlflow.org/docs/latest/index.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0,0,0,0,0,0.3868528072
What is Databricks?,0.6666666667000001,1,1.0,List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",1,1,1,1,1,1.0
How to serve a model on Databricks?,0.3333333333,1,0.6934264036000001,List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0,0,0,0,0,0.3868528072
How to enable MLflow Autologging for my workspace by default?,1.0,1,1.0,List(https://mlflow.org/docs/latest/tracking/autolog.html),"List(https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html)",1,1,1,1,1,1.0


### Evaluate the Chunking Strategy with MLflow

In [0]:
from typing import List

def evaluate_chunk_size(chunk_size):
  list_of_documents = loader.load()
  text_splitter = CharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=0)
  docs = text_splitter.split_documents(list_of_documents)
  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
  retriever = Chroma.from_documents(docs, embedding_function).as_retriever()
  
  def retrieve_doc_ids(question: str) -> List[str]:
    docs = retriever.get_relevant_documents(question)
    doc_ids = [doc.metadata["source"] for doc in docs]
    return doc_ids
   
  def retriever_model_function(question_df: pd.DataFrame) -> pd.Series:
    return question_df["question"].apply(retrieve_doc_ids)

  with mlflow.start_run() as run:
      evaluate_results = mlflow.evaluate(
          model=retriever_model_function,
          data=eval_data,
          model_type="retriever",
          targets="source",
          evaluators="default",
      )
  return evaluate_results

result1 = evaluate_chunk_size(1000)
result2 = evaluate_chunk_size(2000)


display(result1.tables["eval_results_table"])
display(result2.tables["eval_results_table"])

2023/12/08 18:46:05 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/12/08 18:46:05 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2023/12/08 18:46:06 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2023/12/08 18:46:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: precision_at_3
2023/12/08 18:46:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: recall_at_3
2023/12/08 18:46:06 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ndcg_at_3
2023/12/08 18:46:11 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/12/08 18:46:11 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2023/12/08 18:46:11 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...
2023/12/08 18:46:11 INFO mlflow.models.evaluation.default_evalua

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

question,source,outputs,precision_at_3/score,recall_at_3/score,ndcg_at_3/score
What is MLflow?,List(https://mlflow.org/docs/latest/index.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0,0,0.530721274
What is Databricks?,List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",1,1,1.0
How to serve a model on Databricks?,List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html),"List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0,0,0.530721274
How to enable MLflow Autologging for my workspace by default?,List(https://mlflow.org/docs/latest/tracking/autolog.html),"List(https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html)",1,1,1.0


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

question,source,outputs,precision_at_3/score,recall_at_3/score,ndcg_at_3/score
What is MLflow?,List(https://mlflow.org/docs/latest/index.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html)",0.0,0,0.530721274
What is Databricks?,List(https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html)",0.6666666667000001,1,0.6934264036000001
How to serve a model on Databricks?,List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html),"List(https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html)",1.0,1,1.0
How to enable MLflow Autologging for my workspace by default?,List(https://mlflow.org/docs/latest/tracking/autolog.html),"List(https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html, https://mlflow.org/docs/latest/tracking/autolog.html)",1.0,1,1.0


## Evaluate the RAG system using `mlflow.evaluate()`
Create a simple function that runs each input through the RAG chain

In [0]:
def model(input_df):
    return input_df["questions"].map(qa).tolist()

## Create an eval dataset (Golden Dataset)

In [0]:
eval_df = pd.DataFrame(
    {
        "questions": [
            "What is MLflow?",
            "What is Databricks?",
            "How to serve a model on Databricks?",
            "How to enable MLflow Autologging for my workspace by default?",
        ],
    }
)
display(eval_df)

questions
What is MLflow?
What is Databricks?
How to serve a model on Databricks?
How to enable MLflow Autologging for my workspace by default?


## Evaluate using LLM as a Judge and Basic Metric

Use relevance metric to determine the relevance of the answer and context. There are other metrics you can use too.


In [0]:
from mlflow.deployments import set_deployments_target
set_deployments_target("databricks")

os.environ["DATABRICKS_HOST"] = "https://<>.databricks.com/"
os.environ["DATABRICKS_TOKEN"] = "<your_databricks_token>"

In [0]:
from  mlflow.metrics.genai.metric_definitions import relevance

relevance_metric = relevance(model="endpoints:/databricks-llama-2-70b-chat") #You can also use any model you have hosted on Databricks, models from the Marketplace or models in the Foundation model API

with mlflow.start_run():
    results =  mlflow.evaluate(
        model,
        eval_df,
        model_type="question-answering",
        evaluators="default",
        predictions="result",
        extra_metrics=[relevance_metric, mlflow.metrics.latency()],
        evaluator_config={
            "col_mapping": {
                "inputs": "questions",
                "context": "source_documents",
            }
        }
    )
    print(results.metrics)

display(results.tables["eval_results_table"])

2023/12/08 18:54:07 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2023/12/08 18:54:07 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
2023/12/08 18:54:10 INFO mlflow.models.evaluation.default_evaluator: Testing metrics on first row...


  0%|          | 0/1 [00:00<?, ?it/s]

2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: token_count
2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: toxicity
2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: flesch_kincaid_grade_level
2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: ari_grade_level
2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating builtin metrics: exact_match
2023/12/08 18:54:17 INFO mlflow.models.evaluation.default_evaluator: Evaluating metrics: relevance


  0%|          | 0/4 [00:00<?, ?it/s]

{'latency/mean': 0.627737283706665, 'latency/variance': 0.04464593238489556, 'latency/p90': 0.8606657028198242, 'toxicity/v1/mean': 0.0005581885779974982, 'toxicity/v1/variance': 4.918064710787432e-07, 'toxicity/v1/p90': 0.0012913507627672518, 'toxicity/v1/ratio': 0.0, 'relevance/v1/mean': 3.25, 'relevance/v1/variance': 0.1875, 'relevance/v1/p90': 3.7}


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

questions,outputs,source_documents,latency,token_count,toxicity/v1/score,relevance/v1/score,relevance/v1/justification
What is MLflow?,MLflow is a platform designed to help manage the entire machine learning cycle for every,"List(List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), What is MLflow? Getting Started with MLflow New Features LLMs Model Evaluation Deep Learning Traditional ML Deployment MLflow Tracking System Metrics MLflow Projects MLflow Models MLflow Model Registry MLflow Recipes MLflow Plugins MLflow Authentication Command-Line Interface Search Runs Search Experiments Python API mlflow mlflow.artifacts mlflow.catboost mlflow.client mlflow.data mlflow.deployments mlflow.diviner mlflow.entities mlflow.environment_variables mlflow.fastai mlflow.gateway mlflow.gluon mlflow.h2o mlflow.johnsnowlabs mlflow.keras_core mlflow.langchain mlflow.lightgbm mlflow.metrics mlflow.mleap mlflow.models mlflow.onnx mlflow.paddle mlflow.pmdarima mlflow.projects mlflow.prophet mlflow.pyfunc mlflow.pyspark.ml mlflow.pytorch mlflow.recipes mlflow.sagemaker mlflow.sentence_transformers mlflow.server mlflow.shap mlflow.sklearn mlflow.spacy mlflow.spark mlflow.statsmodels mlflow.system_metrics mlflow.tensorflow mlflow.transformers mlflow.types mlflow.utils mlflow.xgboost mlflow.openai Log Levels, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), What is MLflow? Getting Started with MLflow New Features LLMs Model Evaluation Deep Learning Traditional ML Deployment MLflow Tracking System Metrics MLflow Projects MLflow Models MLflow Model Registry MLflow Recipes MLflow Plugins MLflow Authentication Command-Line Interface Search Runs Search Experiments Python API mlflow mlflow.artifacts mlflow.catboost mlflow.client mlflow.data mlflow.deployments mlflow.diviner mlflow.entities mlflow.environment_variables mlflow.fastai mlflow.gateway mlflow.gluon mlflow.h2o mlflow.johnsnowlabs mlflow.keras_core mlflow.langchain mlflow.lightgbm mlflow.metrics mlflow.mleap mlflow.models mlflow.onnx mlflow.paddle mlflow.pmdarima mlflow.projects mlflow.prophet mlflow.pyfunc mlflow.pyspark.ml mlflow.pytorch mlflow.recipes mlflow.sagemaker mlflow.sentence_transformers mlflow.server mlflow.shap mlflow.sklearn mlflow.spacy mlflow.spark mlflow.statsmodels mlflow.system_metrics mlflow.tensorflow mlflow.transformers mlflow.types mlflow.utils mlflow.xgboost mlflow.openai Log Levels, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), What is MLflow? Getting Started with MLflow New Features LLMs Model Evaluation Deep Learning Traditional ML Deployment MLflow Tracking System Metrics MLflow Projects MLflow Models MLflow Model Registry MLflow Recipes MLflow Plugins MLflow Authentication Command-Line Interface Search Runs Search Experiments Python API mlflow mlflow.artifacts mlflow.catboost mlflow.client mlflow.data mlflow.deployments mlflow.diviner mlflow.entities mlflow.environment_variables mlflow.fastai mlflow.gateway mlflow.gluon mlflow.h2o mlflow.johnsnowlabs mlflow.keras_core mlflow.langchain mlflow.lightgbm mlflow.metrics mlflow.mleap mlflow.models mlflow.onnx mlflow.paddle mlflow.pmdarima mlflow.projects mlflow.prophet mlflow.pyfunc mlflow.pyspark.ml mlflow.pytorch mlflow.recipes mlflow.sagemaker mlflow.sentence_transformers mlflow.server mlflow.shap mlflow.sklearn mlflow.spacy mlflow.spark mlflow.statsmodels mlflow.system_metrics mlflow.tensorflow mlflow.transformers mlflow.types mlflow.utils mlflow.xgboost mlflow.openai Log Levels, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), mlflow.deployments — MLflow 2.9.1 documentation 2.9.1  MLflow, Document))",0.7651097775,16,0.00014208650000000002,3,"The output provides relevant information about MLflow, mentioning its ability to manage the entire machine learning cycle. Additionally, it mentions that MLflow is a platform designed to help manage the entire machine learning cycle for every, which further supports the relevance of the output to the input. However, the output does not directly address the specific question asked in the input, which is ""What is MLflow?"". Therefore, a score of 3 is appropriate, as the output is largely consistent with the provided context but does not fully address the question."
What is Databricks?,"Databricks is a unified analytics platform for data engineering, data science, and","List(List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), Example: from mlflow.deployments import get_deploy_client client = get_deploy_client(""databricks"") endpoint = client.get_endpoint(endpoint=""chat"") assert endpoint == {  ""name"": ""chat"",  ""creator"": ""alice@company.com"",  ""creation_timestamp"": 0,  ""last_updated_timestamp"": 0,  ""state"": {...},  ""config"": {...},  ""tags"": [...],  ""id"": ""88fd3f75a0d24b0380ddc40484d7a31b"", } list_deployments(endpoint=None)[source] Warning This method is not implemented for DatabricksDeploymentClient. list_endpoints()[source] Note Experimental: This function may change or be removed in a future release without warning. Retrieve all serving endpoints. See https://docs.databricks.com/api/workspace/servingendpoints/list for request/response schema. Returns A list of DatabricksEndpoint objects containing the request response. Example: from mlflow.deployments import get_deploy_client client = get_deploy_client(""databricks"") endpoints = client.list_endpoints() assert endpoints == [  {  ""name"": ""chat"",  ""creator"": ""alice@company.com"",  ""creation_timestamp"": 0,  ""last_updated_timestamp"": 0,  ""state"": {...},  ""config"": {...},  ""tags"": [...],  ""id"": ""88fd3f75a0d24b0380ddc40484d7a31b"",  }, ] predict(deployment_name=None, inputs=None, endpoint=None)[source] Note Experimental: This function may change or be removed in a future release without warning. Query a serving endpoint with the provided model inputs. See https://docs.databricks.com/api/workspace/servingendpoints/query for request/response schema. Parameters deployment_name – Unused. inputs – A dictionary containing the model inputs to query. endpoint – The name of the serving endpoint to query. Returns A DatabricksEndpoint object containing the query response. Example: from mlflow.deployments import get_deploy_client, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, 5 Minute Tracking Server Overview — MLflow 2.9.1 documentation), Create a free Databricks CE account. Set up Databricks CE authentication in our dev environment. Connect to Databricks CE in our MLflow experiment session. Then the experiment results will be automatically sent to Databricks CE, where you can view it in MLflow experiment UI. Now let’s look at the code. Create a Databricks CE Account If you don’t have an account of Databricks CE yet, you can create one here. The full process should take no longer than 3 minutes. Install Dependencies !pip install -q mlflow databricks-sdk Set Up Authentication of Databricks CE To set up Databricks CE authentication, we can use the API mlflow.login(), which will prompt you for required information: Databricks Host: Use https://community.cloud.databricks.com/ Username: Your email address that signs in Databricks CE. Password: Your password of Databricks CE. If the authentication succeeds, you should see a message “Succesfully signed in Databricks!”. import mlflow mlflow.login(), Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, 5 Minute Tracking Server Overview — MLflow 2.9.1 documentation), Create a free Databricks CE account. Set up Databricks CE authentication in our dev environment. Connect to Databricks CE in our MLflow experiment session. Then the experiment results will be automatically sent to Databricks CE, where you can view it in MLflow experiment UI. Now let’s look at the code. Create a Databricks CE Account If you don’t have an account of Databricks CE yet, you can create one here. The full process should take no longer than 3 minutes. Install Dependencies !pip install -q mlflow databricks-sdk Set Up Authentication of Databricks CE To set up Databricks CE authentication, we can use the API mlflow.login(), which will prompt you for required information: Databricks Host: Use https://community.cloud.databricks.com/ Username: Your email address that signs in Databricks CE. Password: Your password of Databricks CE. If the authentication succeeds, you should see a message “Succesfully signed in Databricks!”. import mlflow mlflow.login(), Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, 5 Minute Tracking Server Overview — MLflow 2.9.1 documentation), Create a free Databricks CE account. Set up Databricks CE authentication in our dev environment. Connect to Databricks CE in our MLflow experiment session. Then the experiment results will be automatically sent to Databricks CE, where you can view it in MLflow experiment UI. Now let’s look at the code. Create a Databricks CE Account If you don’t have an account of Databricks CE yet, you can create one here. The full process should take no longer than 3 minutes. Install Dependencies !pip install -q mlflow databricks-sdk Set Up Authentication of Databricks CE To set up Databricks CE authentication, we can use the API mlflow.login(), which will prompt you for required information: Databricks Host: Use https://community.cloud.databricks.com/ Username: Your email address that signs in Databricks CE. Password: Your password of Databricks CE. If the authentication succeeds, you should see a message “Succesfully signed in Databricks!”. import mlflow mlflow.login(), Document))",0.9016182423,16,0.0001498278,3,"The output provides relevant information about Databricks and mentions that it is a unified analytics platform for data engineering, data science, and machine learning. Additionally, the output mentions that the model's relevance score is based on the input and context, which is appropriate for this task. However, the output does not directly address how Databricks is related to MLflow, which is the specific question asked in the input. Therefore, the output is largely consistent with the provided context but does not comprehensively answer the question, resulting in a score of 3."
How to serve a model on Databricks?,Use mlflow deployments. Question: What function do you need to update a specified,"List(List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), Note Experimental: This function may change or be removed in a future release without warning. Update a specified serving endpoint with the provided configuration. See https://docs.databricks.com/api/workspace/servingendpoints/updateconfig for request/response schema. Parameters endpoint – The name of the serving endpoint to update. config – A dictionary containing the configuration of the serving endpoint to update. Returns A DatabricksEndpoint object containing the request response. Example: from mlflow.deployments import get_deploy_client, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), Note Experimental: This function may change or be removed in a future release without warning. Update a specified serving endpoint with the provided configuration. See https://docs.databricks.com/api/workspace/servingendpoints/updateconfig for request/response schema. Parameters endpoint – The name of the serving endpoint to update. config – A dictionary containing the configuration of the serving endpoint to update. Returns A DatabricksEndpoint object containing the request response. Example: from mlflow.deployments import get_deploy_client, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/python_api/mlflow.deployments.html, mlflow.deployments — MLflow 2.9.1 documentation), Note Experimental: This function may change or be removed in a future release without warning. Update a specified serving endpoint with the provided configuration. See https://docs.databricks.com/api/workspace/servingendpoints/updateconfig for request/response schema. Parameters endpoint – The name of the serving endpoint to update. config – A dictionary containing the configuration of the serving endpoint to update. Returns A DatabricksEndpoint object containing the request response. Example: from mlflow.deployments import get_deploy_client, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/getting-started/tracking-server-overview/index.html, 5 Minute Tracking Server Overview — MLflow 2.9.1 documentation), Create a free Databricks CE account. Set up Databricks CE authentication in our dev environment. Connect to Databricks CE in our MLflow experiment session. Then the experiment results will be automatically sent to Databricks CE, where you can view it in MLflow experiment UI. Now let’s look at the code. Create a Databricks CE Account If you don’t have an account of Databricks CE yet, you can create one here. The full process should take no longer than 3 minutes. Install Dependencies !pip install -q mlflow databricks-sdk Set Up Authentication of Databricks CE To set up Databricks CE authentication, we can use the API mlflow.login(), which will prompt you for required information: Databricks Host: Use https://community.cloud.databricks.com/ Username: Your email address that signs in Databricks CE. Password: Your password of Databricks CE. If the authentication succeeds, you should see a message “Succesfully signed in Databricks!”. import mlflow mlflow.login(), Document))",0.4138326645,16,0.0017727469,3,"The output provides relevant information about serving a model on Databricks by mentioning mlflow deployments. However, it doesn't directly address the question asked in the input, which is how to serve a model on Databricks. Therefore, the output is largely consistent with the provided context but doesn't fully answer the question."
How to enable MLflow Autologging for my workspace by default?,"To enable MLflow Autologging by default for all experiments in your workspace,","List(List(List(), List(), List(en, https://mlflow.org/docs/latest/tracking/autolog.html, Automatic Logging with MLflow Tracking — MLflow 2.9.1 documentation), Then, navigate to http://localhost:8080 in your browser to view the results. Customize Autologging Behavior You can also control the behavior of autologging by passing arguments to mlflow.autolog() function. For example, you can disable logging of model checkpoints and assosiate tags with your run as follows: import mlflow mlflow.autolog(  log_model_signatures=False,  extra_tags={""YOUR_TAG"": ""VALUE""}, ) See mlflow.autolog() for the full set of arguments you can use. Enable / Disable Autologging for Specific Libraries One common use case is to enable/disable autologging for a specific library. For example, if you train your model on PyTorch but use scikit-learn for data preprocessing, you may want to disable autologging for scikit-learn while keeping it enabled for PyTorch. You can achieve this by either (1) enable autologging only for PyTorch using PyTorch flavor (2) disable autologging for scikit-learn using its flavor with disable=True. import mlflow, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/tracking/autolog.html, Automatic Logging with MLflow Tracking — MLflow 2.9.1 documentation), Then, navigate to http://localhost:8080 in your browser to view the results. Customize Autologging Behavior You can also control the behavior of autologging by passing arguments to mlflow.autolog() function. For example, you can disable logging of model checkpoints and assosiate tags with your run as follows: import mlflow mlflow.autolog(  log_model_signatures=False,  extra_tags={""YOUR_TAG"": ""VALUE""}, ) See mlflow.autolog() for the full set of arguments you can use. Enable / Disable Autologging for Specific Libraries One common use case is to enable/disable autologging for a specific library. For example, if you train your model on PyTorch but use scikit-learn for data preprocessing, you may want to disable autologging for scikit-learn while keeping it enabled for PyTorch. You can achieve this by either (1) enable autologging only for PyTorch using PyTorch flavor (2) disable autologging for scikit-learn using its flavor with disable=True. import mlflow, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/tracking/autolog.html, Automatic Logging with MLflow Tracking — MLflow 2.9.1 documentation), Then, navigate to http://localhost:8080 in your browser to view the results. Customize Autologging Behavior You can also control the behavior of autologging by passing arguments to mlflow.autolog() function. For example, you can disable logging of model checkpoints and assosiate tags with your run as follows: import mlflow mlflow.autolog(  log_model_signatures=False,  extra_tags={""YOUR_TAG"": ""VALUE""}, ) See mlflow.autolog() for the full set of arguments you can use. Enable / Disable Autologging for Specific Libraries One common use case is to enable/disable autologging for a specific library. For example, if you train your model on PyTorch but use scikit-learn for data preprocessing, you may want to disable autologging for scikit-learn while keeping it enabled for PyTorch. You can achieve this by either (1) enable autologging only for PyTorch using PyTorch flavor (2) disable autologging for scikit-learn using its flavor with disable=True. import mlflow, Document), List(List(), List(), List(en, https://mlflow.org/docs/latest/tracking/autolog.html, Automatic Logging with MLflow Tracking — MLflow 2.9.1 documentation), Automatic Logging with MLflow Tracking — MLflow 2.9.1 documentation 2.9.1  MLflow What is MLflow? Getting Started with MLflow New Features LLMs Model Evaluation Deep Learning Traditional ML Deployment MLflow Tracking Quickstart Concepts Tracking Runs MLflow Tracking APIs Auto Logging Manual Logging Tracking Tips Explore Runs and Results Set up the MLflow Tracking Environment FAQ System Metrics MLflow Projects MLflow Models MLflow Model Registry MLflow Recipes MLflow Plugins MLflow Authentication Command-Line Interface Search Runs Search Experiments Python API R API Java API REST API Official MLflow Docker Image Community Model Flavors Tutorials and Examples Contribute Documentation MLflow Tracking MLflow Tracking APIs Automatic Logging with MLflow Tracking Automatic Logging with MLflow Tracking Auto logging is a powerful feature that allows you to log metrics, parameters, and models without the need for explicit log statements. All you need to do is to call mlflow.autolog() before your training code. import mlflow mlflow.autolog() with mlflow.start_run():  # your training code goes here  ... This will enable MLflow to automatically log various information about your run, including: Metrics - MLflow pre-selects a set of metrics to log, based on what model and library you use Parameters - hyper params specified for the training, plus default values provided by the library if not explicitly set Model Signature - logs Model signature instance, which describes input and output schema of the model Artifacts - e.g. model checkpoints Dataset - dataset object used for training (if applicable), such as tensorflow.data.Dataset How to Get started Step 1 - Get MLflow MLflow is available on PyPI. If you don’t already have it installed on your system, you can install it with: pip install mlflow, Document))",0.4303884506,16,0.0001680931,4,"The output provides relevant and accurate information about how to enable MLflow Autologging for all experiments in a workspace by default. It also mentions the additional functionality of customizing Autologging behavior and disabling/enabling Autologging for specific libraries. This information is directly related to the input question and is consistent with the provided context. Therefore, the output scores a 4 on the relevance rubric. However, to achieve a perfect score of 5, the output could be improved by providing more comprehensive information about the process of enabling MLflow Autologging, such as step-by-step instructions or additional tips for troubleshooting. This would demonstrate a deeper understanding of the context and provide a more thorough answer to the input question."


Databricks visualization. Run in Databricks to view.