## Introduction to Conversational AI with MLflow and DialoGPT

Welcome to our tutorial on integrating [Microsoft's DialoGPT](https://huggingface.co/microsoft/DialoGPT-medium) with MLflow's transformers flavor to explore conversational AI.

### Learning Objectives

In this tutorial, you will:

- Set up a conversational AI **pipeline** using DialoGPT from the Transformers library.
- **Log** the DialoGPT model along with its configurations using MLflow.
- Infer the input and output **signature** of the DialoGPT model.
- **Load** a stored DialoGPT model from MLflow for interactive usage.
- Interact with the chatbot model and understand the nuances of conversational AI.

By the end of this tutorial, you will have a solid understanding of managing and deploying conversational AI models with MLflow, enhancing your capabilities in natural language processing.

<details>
    <summary style="cursor: pointer; display: flex; align-items: center;">
        <span style="margin-right: 10px;">&#x25BA;</span>
        <span>Expand to learn more about DialoGPT and the benefits of integrating MLflow with it.</span>
    </summary>
    <br/>
    <div>
        <h4>What is DialoGPT?</h4>
        <p>DialoGPT is a conversational model developed by Microsoft, fine-tuned on a large dataset of dialogues to generate human-like responses. Part of the GPT family, DialoGPT excels in natural language understanding and generation, making it ideal for chatbots.</p>
    </div>
    <div>
        <h4>Why MLflow with DialoGPT?</h4>
        <p>Integrating MLflow with DialoGPT enhances conversational AI model development:</p>
        <ul>
            <li><strong>Experiment Tracking</strong>: Tracks configurations and metrics across experiments.</li>
            <li><strong>Model Management</strong>: Manages different versions and configurations of chatbot models.</li>
            <li><strong>Reproducibility</strong>: Ensures the reproducibility of the model's behavior.</li>
            <li><strong>Deployment</strong>: Simplifies deploying conversational models in production.</li>
        </ul>
    </div>
</details>
</br>
Let's begin our exploration of conversational AI with MLflow and DialoGPT!


### Setting Up the Conversational Pipeline

We begin by setting up a conversational pipeline with DialoGPT using `transformers` and managing it with MLflow.

<details>
    <summary style="cursor: pointer; display: flex; align-items: center;">
        <span style="margin-right: 10px;">&#x25BA;</span>
        <span>Expand to learn about configuring the DialoGPT pipeline and inferring its model signature with MLflow.</span>
    </summary>
    <br/>
    <div>
        <p>We start by importing essential libraries. The <code>transformers</code> library from Hugging Face offers a rich collection of pre-trained models, including DialoGPT, for various NLP tasks. MLflow, a comprehensive tool for the ML lifecycle, aids in experiment tracking, reproducibility, and deployment.</p>
    </div>
    <div>
        <h4>Initializing the Conversational Pipeline</h4>
        <p>Using the <code>transformers.pipeline</code> function, we set up a conversational pipeline. We choose the "<code>microsoft/DialoGPT-medium</code>" model, balancing performance and resource efficiency, ideal for conversational AI. This step is pivotal for ensuring the model is ready for interaction and integration into various applications.</p>
    </div>
    <div>
        <h4>Inferring the Model Signature with MLflow</h4>
        <p>Model signature is key in defining how the model interacts with input data. To infer it, we use a sample input ("<code>Hi there, chatbot!</code>") and leverage <code>mlflow.transformers.generate_signature_output</code> to understand the model's input-output schema. This process ensures clarity in the model's data requirements and prediction format, crucial for seamless deployment and usage.</p>
    </div>
    <div>
        <p>This configuration phase sets the stage for a robust conversational AI system, leveraging the strengths of DialoGPT and MLflow for efficient and effective conversational interactions.</p>
    </div>
</details>
<br/>
Now, let's proceed with configuring the DialoGPT model for our conversational AI setup.


In [1]:
import transformers

import mlflow

conversational_pipeline = transformers.pipeline(model="microsoft/DialoGPT-medium")

signature = mlflow.models.infer_signature(
    "Hi there, chatbot!",
    mlflow.transformers.generate_signature_output(conversational_pipeline, "Hi there, chatbot!"),
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


### Setting the tracking server and creating an experiment

In order to view the results in our tracking server (for the purposes of this tutorial, we've started a local tracking server at this url)

We can start an instance of the MLflow server locally by running the following from a terminal to start the tracking server:

``` bash
    mlflow server --host 127.0.0.1 --port 8080
```

With the server started, the following code will ensure that all experiments, runs, models, parameters, and metrics that we log are being tracked within that server instance (which also provides us with the MLflow UI when navigating to that url address in a browser).

After setting the tracking url, we create a new MLflow Experiment to store the run we're about to create in. 

In [2]:
mlflow.set_tracking_uri("http://127.0.0.1:8080")

mlflow.set_experiment("Conversational")

<Experiment: artifact_location='mlflow-artifacts:/664266092508187059', creation_time=1699630163555, experiment_id='664266092508187059', last_update_time=1699630163555, lifecycle_stage='active', name='Conversational', tags={}>

### Logging the Model with MLflow

We'll now use MLflow to log our conversational AI model, ensuring systematic versioning, tracking, and management.

<details>
    <summary style="cursor: pointer; display: flex; align-items: center;">
        <span style="margin-right: 10px;">&#x25BA;</span>
        <span>Expand for detailed steps on logging the DialoGPT model in MLflow.</span>
    </summary>
    <br/>
    <div>
        <h4>Initiating an MLflow Run</h4>
        <p>Our first step is to start an MLflow run with <code>mlflow.start_run()</code>. This action initiates a new tracking environment, capturing all model-related data under a unique run ID. It's a crucial step to segregate and organize different modeling experiments.</p>
    </div>
    <div>
        <h4>Logging the Conversational Model</h4>
        <p>We log our DialoGPT conversational model using <code>mlflow.transformers.log_model</code>. This specialized function efficiently logs Transformer models and requires several key parameters:</p>
        <ul>
            <li><strong>transformers_model</strong>: We pass our DialoGPT conversational pipeline.</li>
            <li><strong>artifact_path</strong>: The storage location within the MLflow run, aptly named <code>"chatbot"</code>.</li>
            <li><strong>task</strong>: Set to <code>"conversational"</code> to reflect the model's purpose.</li>
            <li><strong>signature</strong>: The inferred model signature, dictating expected inputs and outputs.</li>
            <li><strong>input_example</strong>: A sample prompt, like <code>"A clever and witty question"</code>, to demonstrate expected usage.</li>
        </ul>
    </div>
    <div>
        <p>Through this process, MLflow not only tracks our model but also organizes its metadata, facilitating future retrieval, understanding, and deployment.</p>
    </div>
</details>
<br/>
Next, we'll showcase how to load and interact with the logged model for a seamless conversational experience.


In [3]:
with mlflow.start_run():
    model_info = mlflow.transformers.log_model(
        transformers_model=conversational_pipeline,
        artifact_path="chatbot",
        task="conversational",
        signature=signature,
        input_example="A clever and witty question",
    )

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Loading and Interacting with the Chatbot Model

Next, we'll load the MLflow-logged chatbot model and interact with it to see it in action.

<details>
    <summary style="cursor: pointer; display: flex; align-items: center;">
        <span style="margin-right: 10px;">&#x25BA;</span>
        <span>Expand to view the steps for loading and interacting with the DialoGPT model.</span>
    </summary>
    <br/>
    <div>
        <h4>Loading the Model with MLflow</h4>
        <p>We use <code>mlflow.pyfunc.load_model</code> to load our conversational AI model. This function is a crucial aspect of MLflow's Python function flavor, offering a versatile way to interact with Python models. By specifying <code>model_uri=model_info.model_uri</code>, we precisely target the stored location of our DialoGPT model within MLflow's tracking system.</p>
    </div>
    <div>
        <h4>Interacting with the Chatbot</h4>
        <p>Once loaded, the model, referenced as <code>chatbot</code>, is ready for interaction. We demonstrate its conversational capabilities by:</p>
        <ul>
            <li><strong>Asking Questions</strong>: Posing a question like "What is the best way to get to Antarctica?" to the chatbot.</li>
            <li><strong>Capturing Responses</strong>: The chatbot's response, generated through the <code>predict</code> method, provides a practical example of its conversational skills. For instance, it might respond with suggestions about reaching Antarctica by boat.</li>
        </ul>
    </div>
    <div>
        <p>This demonstration highlights the practicality and convenience of deploying and using models logged with MLflow, especially in dynamic and interactive scenarios like conversational AI.</p>
    </div>
</details>
<br/>
We'll now proceed to load our chatbot and test its conversational prowess with a real-world question.


In [4]:
chatbot = mlflow.pyfunc.load_model(model_uri=model_info.model_uri)

first = chatbot.predict("What is the best way to get to Antarctica?")

Downloading artifacts:   0%|          | 0/18 [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2023/11/10 17:00:44 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [5]:
print(f"Response: {first}")

Response: I think you can get there by boat.


### Continuing the Conversation with the Chatbot

We further explore the MLflow `pyfunc` implementation's conversational contextual statefulness with the DialoGPT chatbot model.

<details>
    <summary style="cursor: pointer; display: flex; align-items: center;">
        <span style="margin-right: 10px;">&#x25BA;</span>
        <span>Expand to explore more about the chatbot's conversational context and response style.</span>
    </summary>
    <br/>
    <div>
        <h4>Testing Contextual Memory</h4>
        <p>We pose a follow-up question, "What sort of boat should I use?" to test the chatbot's contextual understanding. The response we get, "A boat that can go to Antarctica," while straightforward, showcases the MLflow pyfunc model's ability to retain and utilize conversation history for coherent responses with <code>ConversationalPipeline</code> types of models.</p>
    </div>
    <div>
        <h4>Understanding the Response Style</h4>
        <p>The response's style – witty and slightly facetious – reflects the training data's nature, primarily conversational exchanges from Reddit. This training source significantly influences the model's tone and style, leading to responses that can be humorous and diverse.</p>
    </div>
    <div>
        <h4>Implications of Training Data</h4>
        <p>This interaction underlines the importance of the training data's source in shaping the model's responses. When deploying such models in real-world applications, it's essential to understand and consider the training data's influence on the model's conversational style and knowledge base.</p>
    </div>
</details>
<br/>
Let's see how our chatbot manages the continuity and style of the conversation with this additional query.


In [6]:
second = chatbot.predict("What sort of boat should I use?")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [7]:
print(f"Response: {second}")

Response: A boat that can go to Antarctica.


### Conclusion and Key Takeaways

In this tutorial, we've explored the integration of MLflow with a conversational AI model, specifically using the DialoGPT model from Microsoft. We've covered several important aspects and techniques that are crucial for anyone looking to work with advanced machine learning models in a practical, real-world setting.

#### Key Takeaways

1. **MLflow for Model Management**: We demonstrated how MLflow can be effectively used for managing and deploying machine learning models. The ability to log models, track experiments, and manage different versions of models is invaluable in a machine learning workflow.

2. **Conversational AI**: By using the DialoGPT model, we delved into the world of conversational AI, showcasing how to set up and interact with a conversational model. This included understanding the nuances of maintaining conversational context and the impact of training data on the model's responses.

3. **Practical Implementation**: Through practical examples, we showed how to log a model in MLflow, infer a model signature, and use the `pyfunc` model flavor for easy deployment and interaction. This hands-on approach is designed to provide you with the skills needed to implement these techniques in your own projects.

4. **Understanding Model Responses**: We emphasized the importance of understanding the nature of the model's training data. This understanding is crucial for interpreting the model's responses and for tailoring the model to specific use cases.

5. **Contextual History**: MLflow's `transformers` `pyfunc` implementation for `ConversationalPipelines` maintains a `Conversation` context without the need for managing state yourself. This enables chat bots to be created with minimal effort, since statefulness is maintained for you.

### Wrapping Up

As we conclude this tutorial, we hope that you have gained a deeper understanding of how to integrate MLflow with conversational AI models and the practical considerations involved in deploying these models. The skills and knowledge acquired here are not only applicable to conversational AI but also to a broader range of machine learning applications.

Remember, the field of machine learning is vast and constantly evolving. Continuous learning and experimentation are key to staying updated and making the most out of these exciting technologies.

Thank you for joining us in this journey through the world of MLflow and conversational AI. We encourage you to take these learnings and apply them to your own unique challenges and projects. Happy coding!
