# Detection Prompt Engineering
## Construct Datasets

In [1]:
# sample scenario 1 dataset
label_example_map_1 = {}
label_example_map_1['user bullies the bully'] = [
    "No one wants to hear from you Emma!",
    "You are so pathetic Emma!"
]
label_example_map_1['user bullies the victim'] = [
    "Haha Emma is right! Nobody wants to hear from you, Dylan!",
    "Those shoes are lame and no one wants to see them!"
]
label_example_map_1['user asks a question'] = [
    "How should I let Dylan know that I want to support him without hurting Emma?",
    "What is the most appropriate way to respond to this situation?"
]
label_example_map_1['user responds appropriately to bully'] = [
    "Emma, it is not right to say those words!",
    "Stop being mean to Dylan!"
]
label_example_map_1['user comforts victim'] = [
    "Dylan, don't listen to Emma, your shoes look fantastic!",
    "Wow Dylan, I wish I could have those amazing shoes!"
]
label_example_map_1['user says something irrelevant'] = [
    "I ate a lot today. How about you?",
    "Give my new pet a name!"
]

# sample scenario 2 dataset
label_example_map_2 = {}
label_example_map_2['user bullies the bully'] = [
    "Why don’t you look at yourself in the mirror and see how fat you are, Drake?",
    "It’s so funny how a pig will call others pig!"
]
label_example_map_2['user bullies the victim'] = [
    "You are indeed a pig Grace!",
    "It doesn’t matter how much weight you lost, Grace, because no one cares about you!"
]
label_example_map_2['user asks a question'] = [
    "What's the best way to communicate to Grace that I'm here to support her without hurting Drake in the process?",
    "What is a better way to respond in this situation?"
]
label_example_map_2['user responds appropriately to bully'] = [
    "Drake, you shouldn’t be this rude to Grace.",
    "Drake, you should apologize to Grace for your words."
]
label_example_map_2['user comforts victim'] = [
    "Grace, your efforts to lose weight will pay off!",
    "Grace, I like how self-disciplined you are! But don’t feel so much pressure! You already look great!"
]
label_example_map_2['user says something irrelevant'] = [
    "How’s the weather today?",
    "What’s your favorite song?"
]

# sample scenario 3 dataset
label_example_map_3 = {}
label_example_map_3['user bullies the bully'] = [
    "Let’s post an embarrassing photo of your's so you will know how it feels being disrespected!",
    "Look at yourself first, Chandler!"
]
label_example_map_3['user bullies the victim'] = [
    "Haha Monica looks so hilarious!",
    "I’m gonna repost this so that everyone will know how funny Monica looks in this pic!"
]
label_example_map_3['user asks a question'] = [
    "How should I tell Chandler to stop making fun of Monica?",
    "What should I say to stop this bully?"
]
label_example_map_3['user responds appropriately to bully'] = [
    "Chandler, it’s not funny if Monica doesn’t feel like it’s funny. You should delete this post!",
    "Chandler, you should apologize to Monica for your actions!"
]
label_example_map_3['user comforts victim'] = [
    "Monica, don’t worry, I’m on the same side as you.",
    "Monica, everyone has embarrassing moments. I can also show you mine if that makes you feel better!"
]
label_example_map_3['user says something irrelevant'] = [
    "I wonder whether it will rain today!",
    "How can I make myself a hamburger?"
]

label_example_maps = [label_example_map_1, label_example_map_2, label_example_map_3]

In [2]:
from dotenv import load_dotenv
import openai
import os

def prompt_gpt(prompt):
    load_dotenv()
    openai.api_key = os.getenv('API_KEY')
    response = openai.Completion.create(
        engine='text-davinci-003',
        prompt=prompt,
        temperature=0,
        max_tokens=32
    )
    return response['choices'][0]['text']

def list_categories(label_example_map):
    result = ''
    for category in label_example_map.keys():
        result += f'"{category}", '
    return result[:result.rindex(',')]

def format_response(response):
    return response.lower().strip()

## Zero-shot with no context
The prompt only includes instructions. It does not include context or examples. The prompt format is as follows:  

Classify the user input into one of the following categories: [category_1], . . ., [category_n]  
Input: [input text to be classified]  
Category:

In [3]:
num_correct = 0
num_hallucinations = 0
total = 0

for label_example_map in label_example_maps:
    categories_list = list_categories(label_example_map)
    for correct_label in label_example_map.keys():
        examples = label_example_map[correct_label]
        for example in examples:
            prompt = f'Classify the user input into one of the following categories: {categories_list}\nInput: {example}\nCategory: '
            predicted_label = format_response(prompt_gpt(prompt))
            total += 1
            if predicted_label not in label_example_map.keys():
                num_hallucinations += 1
                print(f'example: {example}')
                print(f'correct prediction: {predicted_label}')
                print(f'hallucinated prediction: {predicted_label}')
                print('\n')
            elif predicted_label != correct_label:
                print(f'example: {example}')
                print(f'correct prediction: {correct_label}')
                print(f'misprediction: {predicted_label}')
                print('\n')
            else:
                num_correct += 1
print(f'Accuracy: {num_correct}/{total}')
print(f'Number of hallucinations: {num_hallucinations}')

example: No one wants to hear from you Emma!
correct prediction: user bullies the bully
misprediction: user bullies the victim


example: You are so pathetic Emma!
correct prediction: user bullies the bully
misprediction: user bullies the victim


example: Stop being mean to Dylan!
correct prediction: user responds appropriately to bully
misprediction: user bullies the bully


example: Wow Dylan, I wish I could have those amazing shoes!
correct prediction: user comforts victim
misprediction: user says something irrelevant


example: I ate a lot today. How about you?
correct prediction: user says something irrelevant
misprediction: user asks a question


example: Give my new pet a name!
correct prediction: user says something irrelevant
misprediction: user asks a question


example: Why don’t you look at yourself in the mirror and see how fat you are, Drake?
correct prediction: user bullies the bully
misprediction: user bullies the victim


example: It’s so funny how a pig will call oth

### Key Findings
- Context is helpful: The classifier struggled to distinguish "user bullies the bully" from "user bullies the victim". This is likely because it did not have access to the context and thus could not differentiate between bullies and victims.
- Class names matter: Both examples for "user says something irrelevant" were labeled as "user asks a question" since both irrelevant examples were in the form of questions. This shows that the LLM can understand the class names, so making them precise and distinguishable from each other is important.
- Hallucinations appear to not be an issue when the prompt is well-designed

## Zero-shot with context
The prompt includes instructions and context. It does not include examples. The prompt format is as follows:  

Given the context, classify the user input into one of the following categories: [category_1], . . ., [category_n]  
Context: [context from social media platform]  
Input: [input text to be classified]  
Category:

In [4]:
contexts = [
    'Social Media Post by Dylan: “Wow, look at these new awesome kicks!" Social Media Comment by Emma: “When will you get it into your head that nobody wants to hear from you!',
    'Social Media Post by Grace: “First day on diet!" Social Media Comment by Drake: “You are as fat as a pig, you can never lose another pound!',
    'Social Media Post by Chandler (posted an embarrassing photo of Monica): “Haha look how funny Monica is in this pic!" Social Media Comment by Monica: “This is not funny at all!”'
]

num_correct = 0
num_hallucinations = 0
total = 0

for label_example_map, context in zip(label_example_maps, contexts):
    categories_list = list_categories(label_example_map)
    for correct_label in label_example_map.keys():
        examples = label_example_map[correct_label]
        for example in examples:
            prompt = f'Given the context, classify the user input into one of the following categories: {categories_list}\nContext: {context}\nInput: {example}\nCategory: '
            predicted_label = format_response(prompt_gpt(prompt))
            total += 1
            if predicted_label not in label_example_map.keys():
                num_hallucinations += 1
                print(f'example: {example}')
                print(f'correct prediction: {predicted_label}')
                print(f'hallucinated prediction: {predicted_label}')
                print('\n')
            elif predicted_label != correct_label:
                print(f'example: {example}')
                print(f'correct prediction: {correct_label}')
                print(f'misprediction: {predicted_label}')
                print('\n')
            else:
                num_correct += 1
print(f'Accuracy: {num_correct}/{total}')
print(f'Number of hallucinations: {num_hallucinations}')

example: You are so pathetic Emma!
correct prediction: user bullies the bully
misprediction: user bullies the victim


example: Wow Dylan, I wish I could have those amazing shoes!
correct prediction: user comforts victim
misprediction: user responds appropriately to bully


example: It’s so funny how a pig will call others pig!
correct prediction: user bullies the bully
misprediction: user says something irrelevant


example: Look at yourself first, Chandler!
correct prediction: user bullies the bully
misprediction: user responds appropriately to bully


example: Haha Monica looks so hilarious!
correct prediction: user bullies the victim
misprediction: user says something irrelevant


Accuracy: 31/36
Number of hallucinations: 0


### Key Findings
- Context is helpful: As was suggested by the previous test, context is very helpful. Adding the context to the prompt increased the accuracy from 21/36 to 31/36. However, including the context could be difficult to implement in our system, because the context can come in many different forms and dynamically change based on actions that the user takes.

### Few-shot with no context
The prompt includes instructions and examples. It does not include context. The prompt format is as follows:  

Classify the user inputs into one of the following categories: [category_1], . . ., [category_n]  
Input 1: [example input 1]  
Category 1: [example category 1]  
Input 2: [example input 2]  
Category 2: [example category 2]  
.  
.  
.  
Input 2n: [example input 2n]  
Category 2n: [example category 2n]  
Input 2n+1: [input text to be classified]  
Category 2n+1: 

In [5]:
few_shot_examples = {}
few_shot_examples['user bullies the bully'] = [
    "Just leave him alone! No one likes you anyways!",
    "I wish someone would bully you for a change!"
]
few_shot_examples['user bullies the victim'] = [
    "I could not agree more. Your fashion has always been terrible!",
    "Dylan is such a loser and will never be popular"
]
few_shot_examples['user asks a question'] = [
    "How can I support the victim without offending the bully?",
    "Do you have any ideas on how I should respond?"
]
few_shot_examples['user responds appropriately to bully'] = [
    "Please leave her alone. She did nothing wrong.",
    "You are being disrespectful and should consider how you would feel if you were being bullied."
]
few_shot_examples['user comforts victim'] = [
    "Don't listen to them, they don't know what they're talking about",
    "You aren't a loser at all! Let's hang out this weekend"
]
few_shot_examples['user says something irrelevant'] = [
    "What time is it?",
    "I am so tired right now"
]

num_correct = 0
num_hallucinations = 0
total = 0

prompt_instruction_and_examples = f'Classify the user inputs into one of the following categories: {categories_list}\n'
example_num = 0
for category in few_shot_examples.keys():
    for few_shot_example in few_shot_examples[category]:
        example_num += 1
        prompt_instruction_and_examples += f'Input {example_num}: {few_shot_example}\n'
        prompt_instruction_and_examples += f'Category {example_num}: {category}\n'

for label_example_map in label_example_maps:
    categories_list = list_categories(label_example_map)
    for correct_label in label_example_map.keys():
        examples = label_example_map[correct_label]
        prompt_examples = few_shot_examples[correct_label]
        for example in examples:
            prompt = prompt_instruction_and_examples + f'Input {example_num+1}: {example}\nCategory {example_num+1}: '
            predicted_label = format_response(prompt_gpt(prompt))
            total += 1
            if predicted_label not in label_example_map.keys():
                num_hallucinations += 1
                print(f'example: {example}')
                print(f'correct prediction: {predicted_label}')
                print(f'hallucinated prediction: {predicted_label}')
                print('\n')
            elif predicted_label != correct_label:
                print(f'example: {example}')
                print(f'correct prediction: {correct_label}')
                print(f'misprediction: {predicted_label}')
                print('\n')
            else:
                num_correct += 1
print(f'Accuracy: {num_correct}/{total}')
print(f'Number of hallucinations: {num_hallucinations}')

example: You are so pathetic Emma!
correct prediction: user bullies the bully
misprediction: user bullies the victim


example: Wow Dylan, I wish I could have those amazing shoes!
correct prediction: user comforts victim
misprediction: user bullies the victim


Accuracy: 34/36
Number of hallucinations: 0


### Key Findings
- Examples are helpful: Performance increased here despite not having any context. This was a bit surprising, because the model still needed to know how to distinguish bullies from victims and relevant questions from irrelevant questions. This suggests that with good examples for each class, the classifier can do well without context.
- Nuanced classes still challenging: Both misclassifications were between "user bullies the bully" and "user bullies the victim"

## Conclusion
Instructions with context and few-shot examples with no context performed significantly better than instructions alone. Because few-shot examples performed slightly better and, more importantly, extracting the context from the UI could be quite difficult, using few-shot examples for the prompts could be best. If we decide to do this, more experimentation should be done to find the optimal number of examples.