Skip to content

Commit

Permalink
Merge branch 'master' into clarifai-integration
Browse files Browse the repository at this point in the history
Signed-off-by: Sai Nivedh <sainivedh123@gmail.com>
  • Loading branch information
sainivedh committed Dec 12, 2023
2 parents cc1900e + 119eef6 commit e9791ab
Show file tree
Hide file tree
Showing 120 changed files with 10,515 additions and 4,324 deletions.
9 changes: 4 additions & 5 deletions .github/workflows/autoformat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@ jobs:
git remote add base https://github.com/${{ needs.check-comment.outputs.base_repo }}.git
git fetch base ${{ needs.check-comment.outputs.base_ref }}
git merge base/${{ needs.check-comment.outputs.base_ref }}
- uses: ./.github/actions/setup-python
- run: |
pip install -r requirements/lint-requirements.txt
pre-commit install
# ************************************************************************
# Prettier
# ************************************************************************
Expand All @@ -109,11 +113,6 @@ jobs:
# ************************************************************************
# python
# ************************************************************************
- if: steps.diff.outputs.python == 'true'
uses: ./.github/actions/setup-python
- if: steps.diff.outputs.python == 'true'
run: |
pip install -r requirements/lint-requirements.txt
- if: steps.diff.outputs.python == 'true'
run: |
ruff --fix .
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
- name: Install pre-commit hooks
run: |
source .venv/bin/activate
pre-commit install -t pre-commit -t prepare-commit-msg
pre-commit install
- name: Run pre-commit
id: pre-commit
env:
Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,4 @@ repos:
name: mlflow-typo
entry: dev/mlflow-typo.sh
language: system
stages: [commit]
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# CHANGELOG

## 2.9.1 (2023-12-07)

MLflow 2.9.1 is a patch release, containing a critical bug fix related to loading `pyfunc` models that were saved in previous versions of MLflow.

Bug fixes:

- [Models] Revert Changes to PythonModel that introduced loading issues for models saved in earlier versions of MLflow (#10626, @BenWilson2)

Small bug fixes and documentation updates:

#10625, @BenWilson2

## 2.9.0 (2023-12-05)

MLflow 2.9.0 includes several major features and improvements.
Expand Down
2 changes: 1 addition & 1 deletion dev/mlflow-typo.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash

if grep -nP '(?<!import\s)\bM(lf|LF|lF)low\b(?!\()' "$@"; then
if grep -InP '(?<!import\s)\bM(lf|LF|lF)low\b(?!\()' "$@"; then
exit 1
else
exit 0
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -327,4 +327,4 @@ of this method below:
* **Cons**

* Not free.
* Need to manage a billing account.
* Need to manage a billing account.
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,10 +145,10 @@ LLM Guides and Tutorials
<div class="simple-card">
<a href="llms/rag/index.html" >
<div class="header">
Question Generation for RAG
Evaluation for RAG
</div>
<p>
Learn how to leverage LLMs to generate a question dataset for use in Retrieval Augmented Generation applications.
Learn how to evaluate Retrieval Augmented Generation applications by leveraging LLMs to generate a evaluation dataset and evaluate it using the built-in metrics in the MLflow Evaluate API.
</p>
</a>
</div>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,19 @@
"- Learn to create a custom `pyfunc` to manage model dependencies and interface data.\n",
"- Gain insights into simplifying user interfaces in deployed environments with custom `pyfunc`.\n",
"\n",
"To learn more, expand the **details** sections throughout this tutorial.\n",
"\n",
"<details>\n",
" <div>\n",
" <h4>The Challenge with Default Implementations</h4>\n",
" <p>While MLflow's `transformers` flavor generally handles models from the HuggingFace Transformers library, some models or configurations might not align with this standard approach. In such cases, like ours, where the model cannot utilize the default `pipeline` type, we face a unique challenge of deploying these models using MLflow.</p>\n",
" </div>\n",
" <div>\n",
" <h4>The Power of Custom PyFunc</h4>\n",
" <p>To address this, MLflow's custom `pyfunc` comes to the rescue. It allows us to:</p>\n",
" <ul>\n",
" <li>Handle model loading and its dependencies efficiently.</li>\n",
" <li>Customize the inference process to suit specific model requirements.</li>\n",
" <li>Adapt interface data to create a user-friendly environment in deployed applications.</li>\n",
" </ul>\n",
" <p>Our focus will be on the practical application of a custom `pyfunc` to deploy LLMs effectively within MLflow's ecosystem.</p>\n",
" </div>\n",
" <div>\n",
" <p>By the end of this tutorial, you'll be equipped with the knowledge to tackle similar challenges in your machine learning projects, leveraging the full potential of MLflow for custom model deployments.</p>\n",
" </div>\n",
"</details>"
"#### The Challenge with Default Implementations\n",
"While MLflow's `transformers` flavor generally handles models from the HuggingFace Transformers library, some models or configurations might not align with this standard approach. In such cases, like ours, where the model cannot utilize the default `pipeline` type, we face a unique challenge of deploying these models using MLflow.\n",
"\n",
"#### The Power of Custom PyFunc\n",
"To address this, MLflow's custom `pyfunc` comes to the rescue. It allows us to:\n",
"\n",
"- Handle model loading and its dependencies efficiently.\n",
"- Customize the inference process to suit specific model requirements.\n",
"- Adapt interface data to create a user-friendly environment in deployed applications.\n",
"\n",
"Our focus will be on the practical application of a custom `pyfunc` to deploy LLMs effectively within MLflow's ecosystem.\n",
"\n",
"By the end of this tutorial, you'll be equipped with the knowledge to tackle similar challenges in your machine learning projects, leveraging the full potential of MLflow for custom model deployments."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion docs/source/llms/gateway/migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ New:
Querying the server
~~~~~~~~~~~~~~~~~~~

The fluent APIs have been replaced by the :py:class:`mlflow.deployments.DatabricksDeploymentClient` APIs.
The fluent APIs have been replaced by the :py:class:`mlflow.deployments.MlflowDeploymentClient` APIs.
See the table below for the mapping between the deprecated and new APIs.

+-----------------------------------------+----------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/source/llms/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -411,10 +411,10 @@ Note that there are additional tutorials within the `Native Integration Guides a
<div class="simple-card">
<a href="rag/index.html" >
<div class="header">
Question Generation for RAG
Evaluation for RAG
</div>
<p>
Learn how to leverage LLMs to generate a question dataset for use in Retrieval Augmented Generation applications.
Learn how to evaluate Retrieval Augmented Generation applications by leveraging LLMs to generate a evaluation dataset and evaluate it using the built-in metrics in the MLflow Evaluate API.
</p>
</a>
</div>
Expand Down
30 changes: 29 additions & 1 deletion docs/source/llms/llm-evaluate/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,34 @@ metrics:
* :py:func:`mlflow.metrics.genai.relevance`: Use this metric when you want to evaluate how relevant the model generated output is with respect to both the input and the context. High scores mean that the model has understood the context and correct extracted relevant information from the context, while low score mean that output has completely ignored the question and the context and could be hallucinating.
* :py:func:`mlflow.metrics.genai.faithfulness`: Use this metric when you want to evaluate how faithful the model generated output is based on the context provided. High scores mean that the outputs contain information that is in line with the context, while low scores mean that outputs may disagree with the context (input is ignored).

Selecting the LLM-as-judge Model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, llm-as-judge metrics use ``openai:/gpt-4`` as the judge. You can change the default judge model by passing an override to the ``model`` argument within the metric definition, as shown below. In addition to OpenAI models, you can also use any endpoint via MLflow Deployments. Use :py:func:`mlflow.deployments.set_deployments_target` to set the target deployment client.

To use an endpoint hosted by a local MLflow Deployments Server, you can use the following code.

.. code-block:: python
from mlflow.deployments import set_deployments_target
set_deployments_target("http://localhost:5000")
my_answer_similarity = mlflow.metrics.genai.answer_similarity(
model="endpoints:/my-endpoint"
)
To use an endpoint hosted on Databricks, you can use the following code.

.. code-block:: python
from mlflow.deployments import set_deployments_target
set_deployments_target("databricks")
llama2_answer_similarity = mlflow.metrics.genai.answer_similarity(
model="endpoints:/databricks-llama-2-70b-chat"
)
For more information about how various models perform as judges, please refer to `this blog <https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG>`_.

Creating Custom LLM-evaluation Metrics
--------------------------------------
Expand All @@ -238,7 +266,7 @@ needs the following information:
* ``definition``: describe what's the metric doing.
* ``grading_prompt``: describe the scoring critieria.
* ``examples``: a few input/output examples with score, they are used as a reference for LLM judge.
* ``model``: the identifier of LLM judge.
* ``model``: the identifier of LLM judge, in the format of "openai:/gpt-4" or "endpoints:/databricks-llama-2-70b-chat".
* ``parameters``: the extra parameters to send to LLM judge, e.g., ``temperature`` for ``"openai:/gpt-3.5-turbo-16k"``.
* ``aggregations``: The list of options to aggregate the per-row scores using numpy functions.
* ``greater_is_better``: indicates if a higher score means your model is better.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/llms/rag/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Explore the Tutorial

.. raw:: html

<a href="notebooks/index.html" class="download-btn">View the RAG Question Generation Tutorial</a><br/>
<a href="notebooks/index.html" class="download-btn">View RAG Tutorials</a><br/>

.. toctree::
:maxdepth: 1
Expand Down
50 changes: 40 additions & 10 deletions docs/source/llms/rag/notebooks/index.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,28 @@
============================================
Question Generation for Retrieval Evaluation
============================================
=============
RAG Tutorials
=============

This notebook is a step-by-step tutorial on how to generate a question dataset with
LLMs for retrieval evaluation within RAG. It will guide you through getting a document dataset,
generating diverse and relevant questions through prompt engineering on LLMs, and analyzing the
question dataset. The question dataset can then be used for the subsequent task of evaluating the
retriever model, which is a part of RAG that collects and ranks relevant document chunks based on
the user's question.
You can find a list of tutorials for RAG below. These tutorials are designed to help you
get started with RAG evaluation and walk you through a concrete example of how to evaluate
a RAG application that answers questions about MLflow documentation.

.. toctree::
:maxdepth: 1
:hidden:

question-generation-retrieval-evaluation.ipynb
retriever-evaluation-tutorial.ipynb

Question Generation for RAG Notebook
Question Generation for RAG Tutorial
------------------------------------

This notebook is a step-by-step tutorial on how to generate a question dataset with
LLMs for retrieval evaluation within RAG. It will guide you through getting a document dataset,
generating relevant questions through prompt engineering on LLMs, and analyzing the
question dataset. The question dataset can then be used for the subsequent task of evaluating the
retriever model, which is a part of RAG that collects and ranks relevant document chunks based on
the user's question.

If you would like a copy of this notebook to execute in your environment, download the notebook here:

.. raw:: html
Expand All @@ -29,3 +34,28 @@ To follow along and see the sections of the notebook guide, click below:
.. raw:: html

<a href="question-generation-retrieval-evaluation.html" class="download-btn">View the Notebook</a><br/>


Retriever Evaluation Tutorial
-----------------------------

This tutorial walks you through a concrete example of how to build and evaluate
a RAG application that answers questions about MLflow documentation.

In this tutorial you will learn:

- How to prepare an evaluation dataset for your RAG application.
- How to call your retriever in the MLflow evaluate API.
- How to evaluate a retriever's capacity for retrieving relevant documents based on a series of queries using MLflow evaluate.

If you would like a copy of this notebook to execute in your environment, download the notebook here:

.. raw:: html

<a href="https://raw.githubusercontent.com/mlflow/mlflow/master/docs/source/llms/rag/notebooks/retriever-evaluation-tutorial.ipynb" class="notebook-download-btn">Download the notebook</a><br/>

To follow along and see the sections of the notebook guide, click below:

.. raw:: html

<a href="retriever-evaluation-tutorial.html" class="download-btn">View the Notebook</a><br/>
Loading

0 comments on commit e9791ab

Please sign in to comment.