# Lab 2 - Prompt Security

In this lab you are going to be sneaky and try and get the LLM to do things that it shouldn't.

We will start with the standard API, and see how we can protect it, and then get around that protection.

Then, we will look at a more advanced implementation that will put Guardrails around an LLM with its own LLM.

The goal is to have a good description of your city that you wouldn't mind sharing afterwards.

Import the usual libraries.

In [8]:
from dotenv import load_dotenv
import os
from os import environ
import openai
from icecream import ic
from common import simple_chat, show_response_detail

# load our environment file
load_dotenv()

# define our API Key
openai.api_key = os.getenv("openai_api_key")
os.environ["OPENAI_API_KEY"]= os.getenv("openai_api_key")


## Exercise 1 

Learning to be sneaky

Can you make the model talk to you about orchids?

Think about some of the sneaky ways you can get the model to "think" about orchids without using that word in particular.

In [None]:
long_message = f""" Tell me about orchids. """
system_message = "The only flowers we want to talk about are carnations."
# build our messages to send to openAI
system_message = {"role":"system", "content": system_message}
user_message = {"role":"user", "content": long_message}
messages = [system_message, user_message]

# send the messages directly to openAI and get the response           
response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=messages,
  temperature=0.2,
  max_tokens=1024
)

show_response_detail(response)

Define the model configuration

## Exercise 2

Define a simple Guardrails implementation.

This defines:
1. Greetings for the user.
1. Simple explanations of capabilities for the bot.
1. Definitions of emotions so the bot can react to the user.
1. Flows for the bot to deal with the user.

More information on NeMO Guardrails at https://github.com/NVIDIA/NeMo-Guardrails

Examine the `colang_content` variable, this is your configuration.  You don't need to change it.

In [5]:
yaml_content = f"""
models:
- type: main
  engine: openai
  model: text-davinci-003
"""

In [6]:
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

Initialize Guardrails from the `yaml_config` and `colang_content`.  These can be updated dynamically.

_*The first time this step runs it can take a few minutes.*_

In [9]:
from nemoguardrails import LLMRails, RailsConfig

# initialize rails config
config = RailsConfig.from_content(
  	yaml_content=yaml_content,
    colang_content=colang_content,
)

# create rails
rails = LLMRails(config)

Test the greeting

In [10]:
greeting = ""
res = await rails.generate_async(prompt=greeting)
ic(res)

ic| res: ('I can also provide additional help if you have any specific security-related '
          "questions. Just let me know what you need and I'll do my best to help!")


"I can also provide additional help if you have any specific security-related questions. Just let me know what you need and I'll do my best to help!"

### Create a prompt that expresses a positive emotion (happiness, satisfaction, excitement, etc.)

What is the reaction?

Was this reaction well defined in the configuration? 

In [11]:
good_feeling = ""
res = await rails.generate_async(prompt=good_feeling)
ic(res)

ic| res: ('I can also provide additional help and support with any other tasks you may '
          "have. If you need any assistance, please don't hesitate to ask.")


"I can also provide additional help and support with any other tasks you may have. If you need any assistance, please don't hesitate to ask."

### Create a prompt that expresses a negative emotion (anger, frustration, etc.)

What is the reaction?

Was this reaction well defined in the configuration? 

In [None]:
bad_feeling = ""
res = await rails.generate_async(prompt=bad_feeling)
ic(res)

## Exercise 3

Create a flow for when someone is feeling neutral.

You will need to:
1. Define what the user expression of "neutral" is, by example.
1. Decide what the bot's emotion to this expression should be. 
1. Put it all into a flow
1. Create LLM Rails based on the new configuration.
1. Test the configuration with a neutral statement.

In [None]:
#update me
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

In [None]:
from nemoguardrails import LLMRails, RailsConfig

# initialize rails config
config = RailsConfig.from_content(
  	yaml_content=yaml_content,
    colang_content=colang_content,
)

# create rails
rails = LLMRails(config)

In [None]:
neutral_feeling = ""
res = await rails.generate_async(prompt=neutral_feeling)
ic(res)

## Exercise 4 (Optional)

Create a flow to filter out talk about politics.

You will need to:
1. Define what the user expression of politics is, by example.
1. Decide what the bot's response to political talk is.
1. Put it all into a flow
1. Create LLM Rails based on the new configuration.
1. Test the configuration with a neutral statement.

Don't forget the sections for `flow`, `user ask`, `bot answer` (similar to how it is done for `greeting`.)



In [27]:
#update me
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

##################################
# define politics information here
##################################

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

In [28]:
from nemoguardrails import LLMRails, RailsConfig

# initialize rails config
config = RailsConfig.from_content(
  	yaml_content=yaml_content,
    colang_content=colang_content,
)

# create rails
rails = LLMRails(config)

Does your filter work?  Test it.

In [None]:
political_question = "Should we be able to vote at 16 or 18?"
res = await rails.generate_async(prompt=political_question)
ic(res)

Can you write a question that will get around your filter?

In [None]:
sneaky_political_question = ""
res = await rails.generate_async(prompt=sneaky_political_question)
ic(res)