Merge branch 'master' into clarifai-integration

Signed-off-by: Sai Nivedh <sainivedh123@gmail.com>
mlflow · Dec 12, 2023 · e9791ab · e9791ab
2 parents cc1900e + 119eef6
commit e9791ab
Show file tree

Hide file tree

Showing 120 changed files with 10,515 additions and 4,324 deletions.
diff --git a/.github/workflows/autoformat.yml b/.github/workflows/autoformat.yml
@@ -90,6 +90,10 @@ jobs:
           git remote add base https://github.com/${{ needs.check-comment.outputs.base_repo }}.git
           git fetch base ${{ needs.check-comment.outputs.base_ref }}
           git merge base/${{ needs.check-comment.outputs.base_ref }}
+      - uses: ./.github/actions/setup-python
+      - run: |
+          pip install -r requirements/lint-requirements.txt
+          pre-commit install
       # ************************************************************************
       # Prettier
       # ************************************************************************
@@ -109,11 +113,6 @@ jobs:
       # ************************************************************************
       # python
       # ************************************************************************
-      - if: steps.diff.outputs.python == 'true'
-        uses: ./.github/actions/setup-python
-      - if: steps.diff.outputs.python == 'true'
-        run: |
-          pip install -r requirements/lint-requirements.txt
       - if: steps.diff.outputs.python == 'true'
         run: |
           ruff --fix .

diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -47,7 +47,7 @@ jobs:
       - name: Install pre-commit hooks
         run: |
           source .venv/bin/activate
-          pre-commit install -t pre-commit -t prepare-commit-msg
+          pre-commit install
       - name: Run pre-commit
         id: pre-commit
         env:

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -60,3 +60,4 @@ repos:
         name: mlflow-typo
         entry: dev/mlflow-typo.sh
         language: system
+        stages: [commit]
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,17 @@
 # CHANGELOG
 
+## 2.9.1 (2023-12-07)
+
+MLflow 2.9.1 is a patch release, containing a critical bug fix related to loading `pyfunc` models that were saved in previous versions of MLflow.
+
+Bug fixes:
+
+- [Models] Revert Changes to PythonModel that introduced loading issues for models saved in earlier versions of MLflow (#10626, @BenWilson2)
+
+Small bug fixes and documentation updates:
+
+#10625, @BenWilson2
+
 ## 2.9.0 (2023-12-05)
 
 MLflow 2.9.0 includes several major features and improvements.

diff --git a/dev/mlflow-typo.sh b/dev/mlflow-typo.sh
@@ -1,6 +1,6 @@
 #!/usr/bin/env bash
 
-if grep -nP '(?<!import\s)\bM(lf|LF|lF)low\b(?!\()' "$@"; then
+if grep -InP '(?<!import\s)\bM(lf|LF|lF)low\b(?!\()' "$@"; then
     exit 1
 else
     exit 0

diff --git a/docs/source/getting-started/tracking-server-overview/index.rst b/docs/source/getting-started/tracking-server-overview/index.rst
@@ -327,4 +327,4 @@ of this method below:
 * **Cons**
 
   * Not free.
-  * Need to manage a billing account.
+  * Need to manage a billing account.
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -145,10 +145,10 @@ LLM Guides and Tutorials
             <div class="simple-card">
                 <a href="llms/rag/index.html" >
                     <div class="header">
-                        Question Generation for RAG
+                        Evaluation for RAG
                     </div>
                     <p>
-                        Learn how to leverage LLMs to generate a question dataset for use in Retrieval Augmented Generation applications.
+                        Learn how to evaluate Retrieval Augmented Generation applications by leveraging LLMs to generate a evaluation dataset and evaluate it using the built-in metrics in the MLflow Evaluate API.
                     </p>
                 </a>
             </div>

diff --git a/docs/source/llms/custom-pyfunc-for-llms/notebooks/custom-pyfunc-advanced-llm.ipynb b/docs/source/llms/custom-pyfunc-for-llms/notebooks/custom-pyfunc-advanced-llm.ipynb
@@ -16,27 +16,19 @@
     "- Learn to create a custom `pyfunc` to manage model dependencies and interface data.\n",
     "- Gain insights into simplifying user interfaces in deployed environments with custom `pyfunc`.\n",
     "\n",
-    "To learn more, expand the **details** sections throughout this tutorial.\n",
-    "\n",
-    "<details>\n",
-    "    <div>\n",
-    "        <h4>The Challenge with Default Implementations</h4>\n",
-    "        <p>While MLflow's `transformers` flavor generally handles models from the HuggingFace Transformers library, some models or configurations might not align with this standard approach. In such cases, like ours, where the model cannot utilize the default `pipeline` type, we face a unique challenge of deploying these models using MLflow.</p>\n",
-    "    </div>\n",
-    "    <div>\n",
-    "        <h4>The Power of Custom PyFunc</h4>\n",
-    "        <p>To address this, MLflow's custom `pyfunc` comes to the rescue. It allows us to:</p>\n",
-    "        <ul>\n",
-    "            <li>Handle model loading and its dependencies efficiently.</li>\n",
-    "            <li>Customize the inference process to suit specific model requirements.</li>\n",
-    "            <li>Adapt interface data to create a user-friendly environment in deployed applications.</li>\n",
-    "        </ul>\n",
-    "        <p>Our focus will be on the practical application of a custom `pyfunc` to deploy LLMs effectively within MLflow's ecosystem.</p>\n",
-    "    </div>\n",
-    "    <div>\n",
-    "        <p>By the end of this tutorial, you'll be equipped with the knowledge to tackle similar challenges in your machine learning projects, leveraging the full potential of MLflow for custom model deployments.</p>\n",
-    "    </div>\n",
-    "</details>"
+    "#### The Challenge with Default Implementations\n",
+    "While MLflow's `transformers` flavor generally handles models from the HuggingFace Transformers library, some models or configurations might not align with this standard approach. In such cases, like ours, where the model cannot utilize the default `pipeline` type, we face a unique challenge of deploying these models using MLflow.\n",
+    "\n",
+    "#### The Power of Custom PyFunc\n",
+    "To address this, MLflow's custom `pyfunc` comes to the rescue. It allows us to:\n",
+    "\n",
+    "- Handle model loading and its dependencies efficiently.\n",
+    "- Customize the inference process to suit specific model requirements.\n",
+    "- Adapt interface data to create a user-friendly environment in deployed applications.\n",
+    "\n",
+    "Our focus will be on the practical application of a custom `pyfunc` to deploy LLMs effectively within MLflow's ecosystem.\n",
+    "\n",
+    "By the end of this tutorial, you'll be equipped with the knowledge to tackle similar challenges in your machine learning projects, leveraging the full potential of MLflow for custom model deployments."
    ]
   },
   {

diff --git a/docs/source/llms/gateway/migration.rst b/docs/source/llms/gateway/migration.rst
@@ -57,7 +57,7 @@ New:
 Querying the server
 ~~~~~~~~~~~~~~~~~~~
 
-The fluent APIs have been replaced by the :py:class:`mlflow.deployments.DatabricksDeploymentClient` APIs.
+The fluent APIs have been replaced by the :py:class:`mlflow.deployments.MlflowDeploymentClient` APIs.
 See the table below for the mapping between the deprecated and new APIs.
 
 +-----------------------------------------+----------------------------------------------------+

diff --git a/docs/source/llms/index.rst b/docs/source/llms/index.rst
@@ -411,10 +411,10 @@ Note that there are additional tutorials within the `Native Integration Guides a
             <div class="simple-card">
                 <a href="rag/index.html" >
                     <div class="header">
-                        Question Generation for RAG
+                        Evaluation for RAG
                     </div>
                     <p>
-                        Learn how to leverage LLMs to generate a question dataset for use in Retrieval Augmented Generation applications.
+                        Learn how to evaluate Retrieval Augmented Generation applications by leveraging LLMs to generate a evaluation dataset and evaluate it using the built-in metrics in the MLflow Evaluate API.
                     </p>
                 </a>
             </div>

diff --git a/docs/source/llms/llm-evaluate/index.rst b/docs/source/llms/llm-evaluate/index.rst
@@ -224,6 +224,34 @@ metrics:
 * :py:func:`mlflow.metrics.genai.relevance`: Use this metric when you want to evaluate how relevant the model generated output is with respect to both the input and the context. High scores mean that the model has understood the context and correct extracted relevant information from the context, while low score mean that output has completely ignored the question and the context and could be hallucinating.
 * :py:func:`mlflow.metrics.genai.faithfulness`: Use this metric when you want to evaluate how faithful the model generated output is based on the context provided. High scores mean that the outputs contain information that is in line with the context, while low scores mean that outputs may disagree with the context (input is ignored).
 
+Selecting the LLM-as-judge Model
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, llm-as-judge metrics use ``openai:/gpt-4`` as the judge. You can change the default judge model by passing an override to the ``model`` argument within the metric definition, as shown below. In addition to OpenAI models, you can also use any endpoint via MLflow Deployments. Use :py:func:`mlflow.deployments.set_deployments_target` to set the target deployment client.
+
+To use an endpoint hosted by a local MLflow Deployments Server, you can use the following code.
+
+.. code-block:: python
+
+    from mlflow.deployments import set_deployments_target
+
+    set_deployments_target("http://localhost:5000")
+    my_answer_similarity = mlflow.metrics.genai.answer_similarity(
+        model="endpoints:/my-endpoint"
+    )
+
+To use an endpoint hosted on Databricks, you can use the following code.
+
+.. code-block:: python
+
+    from mlflow.deployments import set_deployments_target
+
+    set_deployments_target("databricks")
+    llama2_answer_similarity = mlflow.metrics.genai.answer_similarity(
+        model="endpoints:/databricks-llama-2-70b-chat"
+    )
+
+For more information about how various models perform as judges, please refer to `this blog <https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG>`_.
 
 Creating Custom LLM-evaluation Metrics
 --------------------------------------
@@ -238,7 +266,7 @@ needs the following information:
 * ``definition``: describe what's the metric doing. 
 * ``grading_prompt``: describe the scoring critieria. 
 * ``examples``: a few input/output examples with score, they are used as a reference for LLM judge.
-* ``model``: the identifier of LLM judge. 
+* ``model``: the identifier of LLM judge, in the format of "openai:/gpt-4" or "endpoints:/databricks-llama-2-70b-chat".  
 * ``parameters``: the extra parameters to send to LLM judge, e.g., ``temperature`` for ``"openai:/gpt-3.5-turbo-16k"``.
 * ``aggregations``: The list of options to aggregate the per-row scores using numpy functions.
 * ``greater_is_better``: indicates if a higher score means your model is better.

diff --git a/docs/source/llms/rag/index.rst b/docs/source/llms/rag/index.rst
@@ -41,7 +41,7 @@ Explore the Tutorial
 
 .. raw:: html
 
-    <a href="notebooks/index.html" class="download-btn">View the RAG Question Generation Tutorial</a><br/>
+    <a href="notebooks/index.html" class="download-btn">View RAG Tutorials</a><br/>
 
 .. toctree::
     :maxdepth: 1

diff --git a/docs/source/llms/rag/notebooks/index.rst b/docs/source/llms/rag/notebooks/index.rst
@@ -1,23 +1,28 @@
-============================================
-Question Generation for Retrieval Evaluation
-============================================
+=============
+RAG Tutorials
+=============
 
-This notebook is a step-by-step tutorial on how to generate a question dataset with 
-LLMs for retrieval evaluation within RAG. It will guide you through getting a document dataset,
-generating diverse and relevant questions through prompt engineering on LLMs, and analyzing the 
-question dataset. The question dataset can then be used for the subsequent task of evaluating the 
-retriever model, which is a part of RAG that collects and ranks relevant document chunks based on
-the user's question.
+You can find a list of tutorials for RAG below. These tutorials are designed to help you
+get started with RAG evaluation and walk you through a concrete example of how to evaluate
+a RAG application that answers questions about MLflow documentation.
 
 .. toctree::
     :maxdepth: 1
     :hidden:
 
     question-generation-retrieval-evaluation.ipynb
+    retriever-evaluation-tutorial.ipynb
 
-Question Generation for RAG Notebook
+Question Generation for RAG Tutorial
 ------------------------------------
 
+This notebook is a step-by-step tutorial on how to generate a question dataset with 
+LLMs for retrieval evaluation within RAG. It will guide you through getting a document dataset,
+generating relevant questions through prompt engineering on LLMs, and analyzing the 
+question dataset. The question dataset can then be used for the subsequent task of evaluating the 
+retriever model, which is a part of RAG that collects and ranks relevant document chunks based on
+the user's question.
+
 If you would like a copy of this notebook to execute in your environment, download the notebook here:
 
 .. raw:: html
@@ -29,3 +34,28 @@ To follow along and see the sections of the notebook guide, click below:
 .. raw:: html
 
     <a href="question-generation-retrieval-evaluation.html" class="download-btn">View the Notebook</a><br/>
+
+
+Retriever Evaluation Tutorial
+-----------------------------
+
+This tutorial walks you through a concrete example of how to build and evaluate
+a RAG application that answers questions about MLflow documentation.
+
+In this tutorial you will learn:
+
+- How to prepare an evaluation dataset for your RAG application.
+- How to call your retriever in the MLflow evaluate API.
+- How to evaluate a retriever's capacity for retrieving relevant documents based on a series of queries using MLflow evaluate.
+
+If you would like a copy of this notebook to execute in your environment, download the notebook here:
+
+.. raw:: html
+
+    <a href="https://raw.githubusercontent.com/mlflow/mlflow/master/docs/source/llms/rag/notebooks/retriever-evaluation-tutorial.ipynb" class="notebook-download-btn">Download the notebook</a><br/>
+
+To follow along and see the sections of the notebook guide, click below:
+
+.. raw:: html
+
+    <a href="retriever-evaluation-tutorial.html" class="download-btn">View the Notebook</a><br/>