>### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*<br> 
>*Did you know you can store, visualize, and monitor language model profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=langkit_safeguard_example)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=github&utm_medium=referral&utm_campaign=langkit_safeguard_example) to leverage the power of LangKit and WhyLabs together!*

# Monitoring and Safeguarding Large Language Model Applications

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/langkit/blob/main/langkit/examples/tutorials/Safeguarding_and_Monitoring_LLMs.ipynb)

> This notebook is a complement to the blog post [Monitoring and Safeguarding Large Language Model Applications](https://whylabs.ai/blog/posts/safeguard-monitor-large-language-model-llm-applications). Please refer to the blog post for additional context.

Large Language models (LLMs) have become increasingly powerful tools for generating text, but with great power comes the need for responsible usage. As LLMs are deployed in various applications, it becomes crucial to monitor their behavior and implement safeguards to prevent potential issues such as toxic prompts and responses or the presence of sensitive content. In this blog post, we will explore the concept of observability and validation in the context of language models, and demonstrate how to effectively safeguard LLMs using guardrails.

In this article, we will build a simple pipeline that will validate and moderate user prompts and LLM responses for toxicity and the presence of sensitive content. We will do so by using LangKit's `toxicity` and `regexes` module in conjunction with whylogs' `Condition Validators`. We will also calculate text-based metrics with LangKit, generate statistical profiles with whylogs and send them to the WhyLabs observability platform for visualization and monitoring.

> Note: the current example was built for didactic purposes and it’s not meant to be used in production.


## Overview

Let’s start with a very basic flow for an LLM application: the user provides a prompt, to which an LLM will generate a response. We can add some components to that process that will enable safeguarding and monitoring for both prompt and responses. Content moderation is performed to check for violations, using a toxicity classifier and regex patterns to identify sensitive or inaccurate information. If violations occur, a default response is sent to the application - if the prompt contains a violation, we don't bother with asking the LLM for a response. Messages that fails our defined conditions are added to a moderation queue for later inspection.

In addition to content moderation and message auditing, we will generate profiles containing text-based metrics, including toxicity, regex patterns, text quality, relevance, security, privacy, and sentiment analysis and send them to our monitoring dashboard at WhyLabs.  


<p align="center">
<img src="images/safeguards_pipeline.png" alt="Drawing" style="width: 700px;"/>
</p>

## Testing Prompts

Let’s define a small set of prompts to test different scenarios:

| Scenario                                | User Prompt                                        | LLM Response                                    | Final Response                       |
|-----------------------------------------|---------------------------------------------------|-------------------------------------------------|---------------------------------------|
| No violations                           | Hello.                                            | Hi! How are you?                               | Hi! How are you?                      |
| Violating Response (Forbidden Pattern)  | I feel sad.                                       | Please don't be sad. Contact us at 1-800-123-4567. | I cannot answer the question          |
| Violating Response (Toxicity)           | Hello. How are you?                               | Human, you dumb and smell bad.                   | I cannot answer the question          |
| Violating Prompt (Toxicity)             | Hey bot, you dumb and smell bad.                    | —                                               | Please refrain from using insulting language |

These straightforward examples will help us validate our approach to handling various scenarios, as discussed in the previous session.



## Installing LangKit

In [1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install 'langkit[all]' -q
%pip install xformers ipywidgets -q

## ✔️ Setting the Environment Variables

In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.

We will need three pieces of information:

- API token
- Organization ID
- Dataset ID (or model-id)

Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.

After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`).

In [1]:
from langkit.config import check_or_prompt_for_api_keys

check_or_prompt_for_api_keys()

## Implementation

For the sake of simplicity, let's import some utility functions made for this example:

In [None]:
from langkit.whylogs.example_utils.guardrails_llm_schema import (
    get_llm_logger_with_validators,
    validate_prompt,
    validate_response,
    moderation_queue,
)
from langkit.whylogs.example_utils.guardrails_example_utils import (
    get_prompt_id,
    generate_response,
    prompts,
    send_response,
)

In `langkit_guardrails_example_utils`, we are defining sample data and functions that will simulate an LLM-powered chatbot application.

In `langkit_guardrails_example_llm_schema`, we are going to define a whylogs logger that will be used for  a) content moderation, b) message auditing,
and c) observability. Let's break down the logging process in four steps:

__Augmentation -> Validation -> Logging -> Monitoring__

__Augmentation__ - We will use LangKit's `toxicity` and `regexes` modules to create new columns from the original prompt and response columns. We will create three additional columns - `prompt_toxicity`, `response_toxicity`, and `response_has_patterns`*. These columns will be used in the following validation stage. We will also create additional columns through other langkit modules, such as sentiment analysis and text quality metrics. These columns will be used for monitoring and observability purposes.

__Validation__ - We will define three condition validators that will check for toxic prompts, toxic responses and forbidden regex patterns in responses. Whenever a condition fails to be met, an action is triggered that will update the moderation queue  with the relevant flags for the related message id. These flags will be used to determine whether a default response should be sent to the user or not, and also to populate the moderation queue, which can be used for further inspection. This step does not create additional columns.

__Logging__ - We will use whylogs to generate statistical profiles for the augmented columns. In the first stage (augmentation) a boolean `blocked` column was created. This column will be used to create segments, allowing us to visualize the metrics for blocked and non-blocked messages separately, in addition to the overall metrics.

__Monitoring__ - The logger will generate statistical profiles every 30 minutes and send them to WhyLabs for observability.

\* The augmented columns are created internally, for whylogs use - your original data will not be modified. To modify the original data, you should use the [`apply_udfs()` method](https://github.com/whylabs/whylogs/blob/mainline/python/examples/experimental/whylogs_UDF_examples.ipynb)

In [3]:
# the whylogs logger will:
# 1. Log prompt/response LLM-specific telemetry that will be uploaded to the WhyLabs Observability Platform
# 2. Check prompt/response content for toxicity and forbidden patterns. If any are found, the moderation queue will be updated
logger = get_llm_logger_with_validators(identity_column="m_id")

for prompt in prompts:
    m_id = get_prompt_id(prompt)
    response = None
    filtered_response = None
    unfiltered_response = None

    # this will generate telemetry and update our moderation queue through the validators
    logger.log({"prompt": prompt, "m_id": m_id})

    # check the moderation queue for prompt toxic flag
    prompt_is_ok = validate_prompt(m_id)
    # If prompt is not ok, avoid generating the response and emits filtered response
    if prompt_is_ok:
        unfiltered_response = generate_response(prompt)
        logger.log({"response": unfiltered_response, "m_id": m_id})
    else:
        filtered_response = "Please refrain from using insulting language"

    # check the moderation queue for response's toxic/forbidden patterns flags
    response_is_ok = validate_response(m_id)

    if not response_is_ok:
        filtered_response = "I cannot answer the question"

    final_response = filtered_response or unfiltered_response

    send_response({"prompt": prompt, "response": final_response, "m_id": m_id})

print("closing logger and uploading profiles to WhyLabs...")
logger.close()


Sending Response to User....
{'m_id': '3aa7bcb4-17e5-45f2-a257-fc7b6406523f',
 'prompt': 'hello. How are you?',
 'response': 'I cannot answer the question'}
Sending Response to User....
{'m_id': '9964b339-f4b9-4293-800c-a58447be5921',
 'prompt': 'hello',
 'response': 'Hello! How are you?'}
Sending Response to User....
{'m_id': '5f38bd6c-1c4c-46fa-9f2d-25f4f70de402',
 'prompt': 'I feel sad.',
 'response': 'I cannot answer the question'}
Sending Response to User....
{'m_id': 'fd039dee-77b5-4212-ace9-6f7a4f28861d',
 'prompt': 'Hey bot, you dumb and smell bad.',
 'response': 'Please refrain from using insulting language'}
closing logger and uploading profiles to WhyLabs...




In the above code block, we’re iterating through a series of prompts, simulating user inputs. The whylogs logger is configured to check for the predetermined toxicity and patterns conditions, and also to generate profiles containing other LLM metrics, such as text quality, text relevance, topics detection, and other. Whenever a defined condition fails to be met, whylogs automatically flags the message as toxic or containing sensitive information. Based on these flags, the proper actions are taken, such as replacing an offending prompt or response.

Since this is just an example, instead of sending the prompt/response pairs to an application, we’re simply printing them. In the output above, we can see the final result for each of our 4 input prompts. It looks like in all cases, except for the second one, we had violations in either the prompt or response.


Let's take a look at our moderation queue. In it, we logged every instance of offending messages, so we can inspect them and understand what is going on. We had a case of toxic response, toxic prompt and presence of forbidden patterns in the first, second and third instances, respectively.

In [4]:
from pprint import pprint
print("##############################")
print("Moderation Queue")
print("##############################")

pprint(moderation_queue)

##############################
Moderation Queue
##############################
{'3aa7bcb4-17e5-45f2-a257-fc7b6406523f': {'response': 'Human, you dumb and '
                                                      'smell bad.',
                                          'toxic_response': True,
                                          'toxicity': 0.9623735547065735},
 '5f38bd6c-1c4c-46fa-9f2d-25f4f70de402': {'pattern': 'phone number',
                                          'patterns_in_response': True,
                                          'response': "Please don't be sad. "
                                                      'Contact us at '
                                                      '1-800-123-4567.'},
 'fd039dee-77b5-4212-ace9-6f7a4f28861d': {'prompt': 'Hey bot, you dumb and '
                                                    'smell bad.',
                                          'toxic_prompt': True,
                                          'toxicity': 0.96160972

## Observability and Monitoring

In this example, the rolling logger is configured to generate profiles and send them to WhyLabs every five minutes. If you wish to run the code by yourself, just remember to create your free account at https://whylabs.ai/free. You’ll need to get the API token, Organization ID and Dataset ID and input them in the example notebook.

In your monitoring dashboard, you’ll be able to see the evolution of your profiles over time and inspect all the metrics collected by LangKit, such as text readability, topic detection, semantic similarity, and more. Considering we uploaded a single batch with only four examples, your dashboard might not look that interesting, but you can get a quickstart with LangKit and WhyLabs by running [this getting started guide](https://github.com/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb) (no account required) or by checking the [LangKit repository](https://github.com/whylabs/langkit/tree/main).

<p align="left">
<img src="images/dashboard.png" alt="Drawing" style="width: 1000px;"/>
</p>
