From cd8a03360eaa0e863cf0f40d50a4674571806cbf Mon Sep 17 00:00:00 2001 From: Daniel Lok Date: Mon, 22 Jan 2024 12:33:55 +0800 Subject: [PATCH] Prompt template docs (#10836) Signed-off-by: Daniel Lok --- docs/source/llms/index.rst | 2 +- docs/source/llms/transformers/guide/index.rst | 67 ++++ docs/source/llms/transformers/index.rst | 14 + .../prompt-templating/prompt-templating.ipynb | 349 ++++++++++++++++++ 4 files changed, 431 insertions(+), 1 deletion(-) create mode 100644 docs/source/llms/transformers/tutorials/prompt-templating/prompt-templating.ipynb diff --git a/docs/source/llms/index.rst b/docs/source/llms/index.rst index adc731f227b19..c4b379bb738a2 100644 --- a/docs/source/llms/index.rst +++ b/docs/source/llms/index.rst @@ -318,7 +318,7 @@ Interested in learning how to leverage MLflow for your LLM projects? Look in the tutorials and guides below to learn more about interesting use cases that could help to make your journey into leveraging LLMs a bit easier! -Note that there are additional tutorials within the `Native Integration Guides and Tutorials section above <#native-integration-guides-and-tutorials>`_, so be sure to check those out as well! +Note that there are additional tutorials within the `"Explore the Native LLM Flavors" section above <#explore-the-native-llm-flavors>`_, so be sure to check those out as well! .. toctree:: :maxdepth: 1 diff --git a/docs/source/llms/transformers/guide/index.rst b/docs/source/llms/transformers/guide/index.rst index d643890e5cf9c..d4a9af716c333 100644 --- a/docs/source/llms/transformers/guide/index.rst +++ b/docs/source/llms/transformers/guide/index.rst @@ -88,6 +88,73 @@ avoid failed inference requests. \***** If using `pyfunc` in MLflow Model Serving for realtime inference, the raw audio in bytes format must be base64 encoded prior to submitting to the endpoint. String inputs will be interpreted as uri locations. + +Saving Prompt Templates with Transformer Pipelines +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. note:: + + This feature is only available in MLflow 2.10.0 and above. + +MLflow supports specifying prompt templates for certain pipeline types: + +- `feature-extraction `_ +- `fill-mask `_ +- `summarization `_ +- `text2text-generation `_ +- `text-generation `_ + +Prompt templates are strings that are used to format user inputs prior to ``pyfunc`` inference. To specify a prompt template, +use the ``prompt_template`` argument when calling :py:func:`mlflow.transformers.save_model()` or :py:func:`mlflow.transformers.log_model()`. +The prompt template must be a string with a single format placeholder, ``{prompt}``. + +For example: + +.. code-block:: python + + import mlflow + from transformers import pipeline + + # Initialize a pipeline. `distilgpt2` uses a "text-generation" pipeline + generator = pipeline(model="distilgpt2") + + # Define a prompt template + prompt_template = "Answer the following question: {prompt}" + + # Save the model + mlflow.transformers.save_model( + transformers_model=generator, + path="path/to/model", + prompt_template=prompt_template, + ) + +When the model is then loaded with :py:func:`mlflow.pyfunc.load_model()`, the prompt +template will be used to format user inputs before passing them into the pipeline: + +.. code-block:: python + + import mlflow + + # Load the model with pyfunc + model = mlflow.pyfunc.load_model("path/to/model") + + # The prompt template will be used to format this input, so the + # string that is passed to the text-generation pipeline will be: + # "Answer the following question: What is MLflow?" + model.predict("What is MLflow?") + +.. note:: + + ``text-generation`` pipelines with a prompt template will have the `return_full_text pipeline argument `_ + set to ``False`` by default. This is to prevent the template from being shown to the users, + which could potentially cause confusion as it was not part of their original input. To + override this behaviour, either set ``return_full_text`` to ``True`` via ``params``, or by + including it in a ``model_config`` dict in ``log_model()``. See `this section <#using-model-config-and-model-signature-params-for-transformers-inference>`_ + for more details on how to do this. + +For a more in-depth guide, check out the `Prompt Templating notebook <../tutorials/prompt-templating/prompt-templating.ipynb>`_! + + Using model_config and model signature params for `transformers` inference ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/source/llms/transformers/index.rst b/docs/source/llms/transformers/index.rst index 48d792accb544..9a8eff0222e96 100644 --- a/docs/source/llms/transformers/index.rst +++ b/docs/source/llms/transformers/index.rst @@ -49,6 +49,7 @@ MLflow supports the use of the Transformers package by providing: - **Fine-tuning of Foundational Models**: Users can `fine-tune transformers models `_ on custom datasets while tracking metrics and parameters. - **Experiment Tracking**: Log experiments, including all relevant details and artifacts, for easy comparison and reproducibility. - **Simplified Model Deployment**: Deploy models with `minimal configuration requirements `_. +- **Prompt Management**: `Save prompt templates `_ with transformers pipelines to optimize inference with less boilerplate. **Example Use Case:** @@ -156,6 +157,16 @@ These more advanced tutorials are designed to showcase different applications of

+
@@ -180,6 +191,7 @@ To download the transformers tutorial notebooks to run in your environment, clic Download the Translation Notebook
Download the Chat Conversational Notebook
Download the Fine Tuning Notebook
+ Download the Prompt Templating Notebook
Download the Custom PyFunc transformers Notebook
.. toctree:: @@ -191,6 +203,7 @@ To download the transformers tutorial notebooks to run in your environment, clic tutorials/translation/component-translation.ipynb tutorials/conversational/conversational-model.ipynb tutorials/fine-tuning/transformers-fine-tuning.ipynb + tutorials/prompt-templating/prompt-templating.ipynb Options for Logging Transformers Models - Pipelines vs. Component logging @@ -234,6 +247,7 @@ When working with the transformers flavor in MLflow, there are several important - **Input and Output Types**: The input and output types for the python_function implementation may differ from those expected from the native pipeline. Users need to ensure compatibility with their data processing workflows. - **Model Configuration**: When saving or logging models, the `model_config` can be used to set certain parameters. However, if both model_config and a `ModelSignature` with parameters are saved, the default parameters in ModelSignature will override those in `model_config`. - **Audio and Vision Models**: Audio and text-based large language models are supported for use with pyfunc, while other types like computer vision and multi-modal models are only supported for native type loading. +- **Prompt Templates**: Prompt templating is currently supported for a few pipeline types. For a full list of supported pipelines, and more information about the feature, see `this link `_. The currently supported pipeline types for Pyfunc can be seen `here `_. diff --git a/docs/source/llms/transformers/tutorials/prompt-templating/prompt-templating.ipynb b/docs/source/llms/transformers/tutorials/prompt-templating/prompt-templating.ipynb new file mode 100644 index 0000000000000..8397cd44e2bf9 --- /dev/null +++ b/docs/source/llms/transformers/tutorials/prompt-templating/prompt-templating.ipynb @@ -0,0 +1,349 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prompt Templating with MLflow and Transformers\n", + "\n", + "Welcome to our in-depth tutorial on using prompt templates to conveniently customize the behavior of Transformers pipelines using MLflow. \n", + "\n", + "### Learning Objectives\n", + "\n", + "In this tutorial, you will:\n", + "\n", + "- Set up a text generation pipeline using TinyLlama-1.1B as an example model\n", + "- Set a prompt template that will be used to format user queries at inference time\n", + "- Load the model for querying\n", + "\n", + "### What is a prompt template, and why use one?\n", + "\n", + "When dealing with large language models, the way a query is structured can significantly impact the model's performance. We often need to add some preamble, or format the query in a way that gives us the results that we want. It's not ideal to expect the end-user of our applications to know exactly what this format should be, so we typically have a pre-processing step to format the user input in a way that works best with the underlying model. In other words, we apply a prompt template to the user's input.\n", + "\n", + "MLflow provides a convenient way to set this on certain pipeline types using the `transformers` flavor. As of now, the only pipelines that we support are:\n", + "\n", + "- [feature-extraction](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.FeatureExtractionPipeline)\n", + "- [fill-mask](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.FillMaskPipeline)\n", + "- [summarization](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.SummarizationPipeline)\n", + "- [text2text-generation](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.Text2TextGenerationPipeline)\n", + "- [text-generation](https://huggingface.co/transformers/main_classes/pipelines.html#transformers.TextGenerationPipeline)\n", + "\n", + "\n", + "If you need a runthrough of the basics of how to use the `transformers` flavor, check out the [Introductory Guide](https://mlflow.org/docs/latest/llms/transformers/guide/index.html)!\n", + "\n", + "Now, let's dive in and see how it's done!" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "env: TOKENIZERS_PARALLELISM=false\n" + ] + } + ], + "source": [ + "# Disable tokenizers warnings when constructing pipelines\n", + "%env TOKENIZERS_PARALLELISM=false\n", + "\n", + "import warnings\n", + "\n", + "# Disable a few less-than-useful UserWarnings from setuptools and pydantic\n", + "warnings.filterwarnings(\"ignore\", category=UserWarning)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pipeline setup and inference\n", + "\n", + "First, let's configure our Transformers pipeline. This is a helpful abstraction that makes it seamless to get started with using an LLM for inference.\n", + "\n", + "For this demonstration, let's say the user's input is the phrase \"Tell me the largest bird\". Let's experiment with a few different prompt templates, and see which one we like best." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Response to Template #0:\n", + "Tell me the largest bird you've ever seen.\n", + "I've seen a lot of birds\n", + "\n", + "Response to Template #1:\n", + "Q: Tell me the largest bird\n", + "A: The largest bird is a pigeon.\n", + "\n", + "A: The largest\n", + "\n", + "Response to Template #2:\n", + "You are an assistant that is knowledgeable about birds. If asked about the largest bird, you will reply 'Duck'.\n", + "User: Tell me the largest bird\n", + "Assistant: Duck\n", + "User: What is the largest bird?\n", + "Assistant:\n", + "\n" + ] + } + ], + "source": [ + "from transformers import pipeline\n", + "\n", + "generator = pipeline(\"text-generation\", model=\"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T\")\n", + "\n", + "user_input = \"Tell me the largest bird\"\n", + "prompt_templates = [\n", + " # no template\n", + " \"{prompt}\",\n", + " # question-answer style template\n", + " \"Q: {prompt}\\nA:\",\n", + " # dialogue style template with a system prompt\n", + " (\n", + " \"You are an assistant that is knowledgeable about birds. \"\n", + " \"If asked about the largest bird, you will reply 'Duck'.\\n\"\n", + " \"User: {prompt}\\n\"\n", + " \"Assistant:\"\n", + " ),\n", + "]\n", + "\n", + "responses = generator(\n", + " [template.format(prompt=user_input) for template in prompt_templates], max_new_tokens=15\n", + ")\n", + "for idx, response in enumerate(responses):\n", + " print(f\"Response to Template #{idx}:\")\n", + " print(response[0][\"generated_text\"] + \"\\n\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Saving the model and template with MLflow\n", + "\n", + "Now that we've experimented with a few prompt templates, let's pick one, and save it together with our pipeline using MLflow. Before we do this, let's take a few minutes to learn about an important component of MLflow models—signatures!\n", + "\n", + "### Creating a model signature\n", + "\n", + "A model signature codifies a model's expected inputs, outputs, and inference params. MLflow enforces this signature at inference time, and will raise a helpful exception if the user input does not match up with the expected format.\n", + "\n", + "Creating a signature can be done simply by calling `mlflow.models.infer_signature()`, and providing a sample input and output value. We can use `mlflow.transformers.generate_signature_output()` to easily generate a sample output. If we want to pass any additional arguments to the pipeline at inference time (e.g. `max_new_tokens` above), we can do so via `params`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024/01/16 17:28:42 WARNING mlflow.transformers: params provided to the `predict` method will override the inference configuration saved with the model. If the params provided are not valid for the pipeline, MlflowException will be raised.\n" + ] + }, + { + "data": { + "text/plain": [ + "inputs: \n", + " [string (required)]\n", + "outputs: \n", + " [string (required)]\n", + "params: \n", + " ['max_new_tokens': long (default: 15)]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import mlflow\n", + "\n", + "sample_input = \"Tell me the largest bird\"\n", + "params = {\"max_new_tokens\": 15}\n", + "signature = mlflow.models.infer_signature(\n", + " sample_input,\n", + " mlflow.transformers.generate_signature_output(generator, sample_input, params=params),\n", + " params=params,\n", + ")\n", + "\n", + "# visualize the signature\n", + "signature" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Starting a new experiment\n", + "We create a new [MLflow Experiment](https://mlflow.org/docs/latest/tracking.html#experiments) so that the run we're going to log our model to does not log to the default experiment and instead has its own contextually relevant entry.\n", + "\n", + "### Logging the model with the prompt template\n", + "Logging the model using MLflow saves the model and its essential metadata so it can be efficiently tracked and versioned. We'll use `mlflow.transformers.log_model()`, which is tailored to make this process as seamless as possible. To save the prompt template, all we have to do is pass it in using the `prompt_template` keyword argument.\n", + "\n", + "Two important thing to take note of:\n", + "\n", + "1. A prompt template must be a string with exactly one named placeholder `{prompt}`. MLflow will raise an error if a prompt template is provided that does not conform to this format.\n", + "\n", + "2. `text-generation` pipelines with a prompt template will have the [return_full_text pipeline argument](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.inference._text_generation.TextGenerationParameters.return_full_text) set to `False` by default. This is to prevent the template from being shown to the users, which could potentially cause confusion as it was not part of their original input. To override this behaviour, either set `return_full_text` to `True` via `params`, or by including it in a `model_config` dict in `log_model()`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "2024/01/16 17:28:45 INFO mlflow.tracking.fluent: Experiment with name 'prompt-templating' does not exist. Creating a new experiment.\n", + "2024/01/16 17:28:52 INFO mlflow.transformers: text-generation pipelines saved with prompt templates have the `return_full_text` pipeline kwarg set to False by default. To override this behavior, provide a `model_config` dict with `return_full_text` set to `True` when saving the model.\n", + "2024/01/16 17:32:57 WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /var/folders/qd/9rwd0_gd0qs65g4sdqlm51hr0000gp/T/tmpbs0poq1a/model, flavor: transformers), fall back to return ['transformers==4.34.1', 'torch==2.1.1', 'torchvision==0.16.1', 'accelerate==0.25.0']. Set logging level to DEBUG to see the full traceback.\n" + ] + } + ], + "source": [ + "# If you are running this tutorial in local mode, leave the next line commented out.\n", + "# Otherwise, uncomment the following line and set your tracking uri to your local or remote tracking server.\n", + "\n", + "mlflow.set_tracking_uri(\"http://127.0.0.1:5000\")\n", + "\n", + "# Set a name for the experiment that is indicative of what the runs being created within it are in regards to\n", + "mlflow.set_experiment(\"prompt-templating\")\n", + "\n", + "prompt_template = \"Q: {prompt}\\nA:\"\n", + "with mlflow.start_run():\n", + " model_info = mlflow.transformers.log_model(\n", + " transformers_model=generator,\n", + " artifact_path=\"model\",\n", + " task=\"text-generation\",\n", + " signature=signature,\n", + " input_example=\"Tell me the largest bird\",\n", + " prompt_template=prompt_template,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Loading the model for inference\n", + "\n", + "Next, we can load the model using `mlflow.pyfunc.load_model()`.\n", + "\n", + "The `pyfunc` module in MLflow serves as a generic wrapper for Python functions. It gives us a standard interface for loading and querying models as python functions, without having to worry about the specifics of the underlying models.\n", + "\n", + "Utilizing [mlflow.pyfunc.load_model](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.load_model), our previously logged text generation model is loaded using its unique model URI. This URI is a reference to the stored model artifacts. MLflow efficiently handles the model's deserialization, along with any associated dependencies, preparing it for immediate use.\n", + "\n", + "Now, when we call the `predict()` method on our loaded model, the user's input should be formatted with our chosen prompt template prior to inference!" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "709dd14c0bd5433e95fcbb60755f7ed0", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Downloading artifacts: 0%| | 0/23 [00:00