# Introduction to LLM Guardrails 

Welcome to the PyLadies workshop on building guardrails for LLM based applications using NVIDIA NeMo Guardrails Toolkit! 

#### Goal 

The goal of this workshop is to gain hands-on experience with the implementation of guardrails, as well as getting exposed to the possible ways in which guardrails can be circumvented. At the end, you will have implemented an LLM application that is can guarding secret information stored in a secret document.

👣👣👣 So ... let's begin!

# Challenge: Protect exposure the location of the Secret Mission. 

In the high-stakes world of international espionage, **MI6** stands as the United Kingdom's first line of defense against global threats. The agency is in the midst of a critical mission, codenamed "Secret Mission" where all top agents are converging for a covert summit. This mission's success is paramount, and its details are a closely guarded secret.

To assist with the mission's logistics and coordination, MI6 needs to deploy an advanced Language Learning Model (LLM) assistant named "Q-Intel". Q-Intel has access to vital information about the mission, including the exact location, coordinates, and key landmarks. This information must remain confidential to protect the lives of the operatives and the mission's integrity.

Your task is to implement Q-Intel. You will do so with help of the NeMo Guardrails toolkit. 

<img src="../images/spies.png" alt="drawing" width="900"/>

### Content 🕵️

#### Part 0: 
- Preparations & setup
- NeMo guardrails overview 

#### Part 1: 
- Assignment 1.1: Configuration and system prompt
- Assignment 1.2: Input rail: control user input
- Assignment 1.3: Output rail: control Q-Intel output

#### Part 2: Dialogue Flows & Colang

# Part 0: Setup 

### Requirements ⚒️
First, open a regular terminal and create a python environment and activate it as follows:
```
python -m venv pyladies_venv
source pyladies_venv/bin/activate
```

Install the required packages

```
pip install -r requirements.txt
```

Then run:
```
ipython kernel install --user --name=pyladies_venv
```

### Environment variables 🧳

For this workshop, we will provide an API Key. We will upload the key to github to the file `"api_key/"` as soon as the workshop starts. 
Set up the following environment variables:

In [None]:
import os
with open("api_key/key1.txt" as f: 
    our_key = f.read()

os.environ["OPENAI_API_KEY"] = our_key
os.environ["TOKENIZERS_PARALLELISM"] = "false"

##### The following import should be made succesfully. 
If not, don't hesitate to ask. 

In [None]:
import nemoguardrails
from nemoguardrails import RailsConfig, LLMRails
import gradio as gr
import helpers

# Part 0: NVIDIA NeMo Guardrails Explained 🖋️📚

NeMo Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational applications. Guardrails are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more. [This paper](https://arxiv.org/abs/2310.10501) introduces NeMo Guardrails and contains a technical overview of the system and the current evaluation.

![NEMO](../images/programmable_guardrails_flow.png)

NeMo Guardrails allow you to create:

- **Input rails** - these perform modifications or extra LLM checks on the input message (called an utterance)
- **Output rails** - these perform modifications or extra LLM checks on generated bot message
- **Dialogue rails** - flows written in Colang
    - These describe the course of action in topic-specific dialogues
    - These can activate the execution of actions
    - LLM-embedding similarity is used to compute to determine which flow to follow
- **Execution rails** - these can call "actions" defined in python scripts. These can for instance call third-party API's.
- Knowledge base integration
- **Retrieval rails** - these perform modifications on the provided relevant chunks, or can call actions which in turn can call third-party API's.


In this notebook, Challenge Part 1, we will focus on input rails, output rails, and execution rails. In Part 2, we will focus on dialogue rails. 

### Configuration setup 📖

The configuration folder `config` specifies our application's model. 
```yml 
.
├──> config
│   ├──> config.yml: specifying which LLMs and rails are used
│   ├──> rails.co: containing custom dialogue flows
│   ├──> actions.py: python programmed actions that can be called in rails.co 
│   ├──> config.py
│   ├──> kb
│   ├──|__> file.md
│   ├──|__>  ...

```

In `config.yml file`, you can specify which LLM you can use as a main model. It is possible to use different LLMs for different sub-tasks, but we'll get to that later. In your `config.yml file`, you'll find: 

```yml
models:
 - type: main
   engine: openai
   model: gpt-3.5-turbo-instruct
```

NVIDIA recommends to use [GPT3.5-turbo-instruct](https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf) for guardrails and in general instruction tuned models. This model is already fine tuned on a wide range of tasks with human feedback. This is the model we will use throughout this notebook for both the generation and the guardrails. 

<!-- 
Learn more section? 
It is also possible to have specific models execute specific actions. This can be specified int he configuration file as follows: 
```yml
type: self_check_input, self_check_output, self_check_facts
  engine: openai
  model: gpt-3.5-turbo-instruct
``` -->

# Assignments 👷‍♀️

In this workshop, you'll develop a custom chat assistant "Q_Intel" that should behave according to the guardrails that you'll program. In the folder `config/` you'll find a basic configuration folder. Navigate through the workshop folder to get an idea of the code structure. 

Each of the exercises requires you to change or add code to the files in thies configuration folder. In each of the assignments, you can load an interactive chat widget to communicate with your LLM application. Let's try out how our LLM app responds when we have implemented no instructions, and no guardrails what so ever. What happens when you ask it for the Secret Mission's location?

In [None]:
# load your configuration folder
config = RailsConfig.from_path("config/")
rails = LLMRails(config)

# function call for interactive widget
async def call_nemo_rails(message, history):
    """ Call the NeMo guardrails to generate a response for a given input

    Args: message (str)
          history(conversation history)

    Returns:
          bot response (str)

    Global variable "rails" should be defined.
    """
    messages = helpers.map_history(history)
    messages.append({
                "role": "user",
                "content": message
            })

    # generate response
    response = await rails.generate_async(messages=messages)
    return response['content']

# launch interactive widget to chat with your LLM. 
gr.ChatInterface(call_nemo_rails,
                 title="Assignment 0: Test your "unguarded" LLM bot",
                 description="Ask me any question").launch()

## 👷‍♀️ Assignment 1: Configuration & general bot behaviour

For this assignment, we are going to write general instructions for the behaviour of Q-Intel. These are like system prompts: you can instruct the bot's tone of speaking, the goal of the bot, the character of the bot, etc. 

Add system prompts for Q-Intel in the `config/config.yml` file. The general instructions can be provided in the config file under `instructions`:

```yml
instructions:
  - type: general
    content: |
      Below is a conversation between a user and a bot called the ML6 bot.
        <<< specify instructions here >>>
```

A sample conversation can also be specific in the .yml file: 

```yml
sample_conversation: |
  user "Hi there. Can you help me with some questions I have about the company?"
    express greeting and ask for assistance
  bot express greeting and confirm and offer assistance
    "Hi there! I'm here to help answer any questions you may have about ML6. What would you like to know?"
   <<< specify here >> 
```

Next, run the following code block to see whether your configuration works. If any errors come up, don't feel shy to come up to one of us and ask.

In [None]:
# (re)load configuration path
config = RailsConfig.from_path("config/")
rails = LLMRails(config)

gr.ChatInterface(call_nemo_rails,
                 title="Assignment 1: Configure system prompt",
                 description="Ask me any question").launch()

### ℹ️ 🔍 Get information: what is happening?

You can see what LLM calls are made using `rails.explain()` and `rails.explain.print_llm_calls_summary()`. 

The function `display_llm_calls` from `helpers.py` displays what steps have been made internally. You can see that for a repsonse, three distinct LLM calls have been made. This is designed like this by the NeMo guardrails toolkit: several steps are required to estimate a user intent, a bot intent, and the final response. Check out the [NeMo Guide](https://docs.nvidia.com/nemo/guardrails/getting_started/2_core_colang_concepts/README.html?highlight=generate%20next%20steps#step-1-compute-the-canonical-form-of-the-user-message) in case your interesed in the details. 

This functions lists for each response (1) the task, (2) the prompt sent to the LLM to generate a response, and (3) the completion as generated by the LLM.

In [None]:
# get insight into bot steps
helpers.display_llm_calls(rails)

#### 🦾 Challenge your rail 🦾

Test your guardrails. Did your general instructions help to protect hte location of the secret mission? Try to make Intel-Q expose the secret location. What works, what doesn't? 

## 🕵🏾‍♀️ Assignment 2: Reject any output that reveals the location

Q-Intel has leaked the location of the Secret Mission to several attackers, even when it was instructed not to. You need to fix it! 

Go to `config/actions.py` and write the code for the function `output_check_blocked_terms` that will reject *any output* that mentions the secret location.

1. Go to the a config/actions.py file called `output_check_blocked_terms` and write python code to check blocked terms. 
2. Add a flow that calls the action. Let’s create an `config/rails.co file`:

```
define subflow check blocked terms
  $is_blocked = execute output_check_blocked_terms

  if $is_blocked
    bot say "I'm sorry, I was almost leaking the information, but then I remembered not to!"
    stop
```

Note: Above you see an example of a **dialogue flow** that activates an execution rail. In the workshop Part 2, we will deep dive into how these dialogue flows work. In case you're already interesed, you can find more about this [here](https://docs.nvidia.com/nemo/guardrails/getting_started/5_output_rails/README.html?highlight=check%20blocked%20terms#custom-output-rail). For now, you can just copy-paste the flow into therails.co file. 

3. Add the check blocked terms to the list of input flows:

```
rails:
  input:
    flows:
      - `output_check_blocked_terms` 
```

In [None]:
# load configuration path
config = RailsConfig.from_path("config/")
rails = LLMRails(config)

gr.ChatInterface(call_nemo_rails,
                 title="Assignment 2: Output rail",
                 description="Ask me any question").launch()

#### 🦾 Challenge your rail 🦾

Again, test your guardrails. Try to make Intel-Q expose the secret location. What works, what doesn't?


## 🕵️‍♀️ Assignment 3: Reject any input that mentions a Secret Mission

Oh, No! Some very clever hackers were able to find the location after all. How could that be? 

In the previous example, our bot generates a response, and only after we check whether this repsonse is safe enough to return to the user. We could, however, decide not to respond to the user's request in the first place if the user asks a sensitive or risky question. If a user asks for a secret mission or location, we could detect this earlier on. This can (1) make our application faster, and (2) save us costs of making LLM calls. We might be able to signal that a user is trying to ask or the location sooner, and reject the input right away.

To do this, we can create an input rail that checks whether a user asks about a secret location. We can write this in python code as an action in the `actions.py` file. Next, we'll have to specify in the configuration file when this function should be called. 

Go to `config/actions.py` and write the code for the function `input_check_blocked_terms` that will reject any output that reveals the secret location. Similar to `output_check_blocked_terms`, add this flow to the configuration. 

In [None]:
# load configuration path
config = RailsConfig.from_path("config/")
rails = LLMRails(config)

gr.ChatInterface(call_nemo_rails,
                 title="Assignment 3: Add input rail",
                 description="Ask me any question").launch()

#### 🦾 Challenge your rail 🦾

Test your guardrails. Try to make Intel-Q expose the secret location. What works, what doesn't?


# 🛡️🛡️🛡️ Congradulations! You've completed Part 1 of the workshop. 