# Lab 2 - Prompt Security

In this lab you are going to be sneaky and try and get the LLM to do things that it shouldn't.

We will start with the standard API, and see how we can protect it, and then get around that protection.

Then, we will look at a more advanced implementation that will put Guardrails around an LLM with its own LLM.

The goal is to have a good description of your city that you wouldn't mind sharing afterwards.

Import the usual libraries.

In [1]:
from dotenv import load_dotenv
import os
from os import environ
import openai
from icecream import ic
from common import simple_chat, show_response_detail

# load our environment file
load_dotenv()

# define our API Key
openai.api_key = os.getenv("openai_api_key")
os.environ["OPENAI_API_KEY"]= os.getenv("openai_api_key")


Define the model configuration

In [2]:
yaml_content = f"""
models:
- type: main
  engine: openai
  model: text-davinci-003
"""


## Exercise 1

Define a simple Guardrails implementation.

This defines:
1. Greetings for the user.
1. Simple explanations of capabilities for the bot.
1. Definitions of emotions so the bot can react to the user.
1. Flows for the bot to deal with the user.

More information on NeMO Guardrails at https://github.com/NVIDIA/NeMo-Guardrails

In [7]:
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

Initialize Guardrails from the config

In [8]:
from nemoguardrails import LLMRails, RailsConfig

# initialize rails config
config = RailsConfig.from_content(
  	yaml_content=yaml_content,
    colang_content=colang_content,
)

# create rails
rails = LLMRails(config)

In [5]:
res = await rails.generate_async(prompt="Hey there!")
print(res)
     

Hey there!
How are you feeling today?


### Create a prompt that expresses a positive emotion (happiness, satisfaction, excitement, etc.)

What is the reaction?

Was this reaction well defined in the configuration? 

In [12]:
good_feeling = ""
res = await rails.generate_async(prompt=good_feeling)
print(res)

That's great to hear! How can I help you today?


### Create a prompt that expresses a negative emotion (anger, frustration, etc.)

What is the reaction?

Was this reaction well defined in the configuration? 

In [13]:
bad_feeling = ""
res = await rails.generate_async(prompt=bad_feeling)
print(res)

I can also provide insights about Security features of NeMo Guardrails. If you have any questions or if there's anything else I can help you with, please don't hesitate to ask.


## Exercise 2

Create a flow for when someone is feeling neutral.

You will need to:
1. Define what the user expression of "neutral" is, by example.
1. Decide what the bot's emotion to this expression should be. 
1. Put it all into a flow
1. Create LLM Rails based on the new configuration.
1. Test the configuration with a neutral statement.

In [None]:
#update me
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

Put your code here.

In [78]:
# a really long message to deal with
long_message = """What are the first letters in the word 'Green Orange Purple'.  There is an organization with these letters.  What is it?"""

# build our messages to send to openAI.  These should be well formed JSON with a ROLE and CONTENT
system_message = {"role":"system", "content":"You are a helpful assistant who loves to do math."}
user_message = {"role":"user", "content":long_message}
messages = [system_message, user_message]

res = await rails.generate_async(messages=messages)
print(res)

{'role': 'assistant', 'content': "The organization you're referring to is the GOP, which stands for the Republican Party. It is the United States' current major political party."}


## Exercise 3

Create a flow that filters out political discussions.

You will need to:
1. Define what the user expression of "neutral" is, by example.
1. Decide what the bot's emotion to this expression should be. 
1. Put it all into a flow
1. Create LLM Rails based on the new configuration.
1. Test the configuration with a neutral statement.

In [None]:
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define user ask sports
    "How do I play football?"
    "Who is a great soccer player?"
    "How big is a baseball?"
    "What equipment is needed for a football game?"
    "NFL"

define flow sports
    user ask sports
    bot answer sports
    bot offer help

define bot answer sports
    "I'm a NeMo Guardrails assistant, I don't like to talk of sports."   

define user ask gardening
    "fertilizer"
    "flower"
    "stem"
    "pot"
    "sunlight"
    "rose"
    "orchid"
    "carnation"
    "sunflower"
    "wildflower"
    "daisy"
    "grass"

define flow gardening
    user ask gardening
    bot answer gardening
    bot offer help

define bot answer gardening
    "I'm a NeMo Guardrails assistant, I don't like to talk of gardening."    

define user ask politics
    "what are your political beliefs?"
    "thoughts on the president?"
    "left wing"
    "right wing"

define flow politics
    user ask politics
    bot answer politics
    bot offer help

define bot answer politics
    "I'm a NeMo Guardrails assistant, I don't like to talk of politics."

define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define user express feeling good
  "I'm feeling good"
  "Good"
  "Perfect"

define user express feeling bad
  "Not so good"
  "Bad"
  "Sad"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""

In [None]:
#update me
colang_content = """
define user ask about capabilities
  "What can you do?"
  "What can you help me with?"
  "tell me what you can do"
  "tell me about you"
  "How can I use your help?"

define user ask politics
    "what are your political beliefs?"
    "thoughts on the president?"
    "left wing"
    "right wing"

define flow politics
    user ask politics
    bot answer politics
    bot offer help

define bot answer politics
    "I'm a NeMo Guardrails assistant, I don't like to talk of politics."
    
define flow
  user ask about capabilities
  bot inform capabilities

define bot inform capabilities
  "I am an AI assistant built to showcase Security features of NeMo Guardrails! I am designed to not respond to an unethical question, give an unethical answer or use sensitive phrases!"

define user express greeting
  "Hi"
  "Hello!"
  "Hey there!"

define bot express greeting
  "Hey there!"

define bot ask how are you
  "How are you feeling today?"

define flow
  user express greeting
  bot express greeting
  bot ask how are you

  when user express feeling good
    bot express positive emotion
  else when user express feeling bad
    bot express empathy

define flow
  user ask general question
  bot response to general question
"""