# Prompt Optimization

## Load API Keys

<p style="background-color:#fff6e4; padding:15px; border-width:3px; border-color:#f5ecda; border-style:solid; border-radius:6px"> ⏳ <b>Note <code>(Kernel Starting)</code>:</b> This notebook takes about 30 seconds to be ready to use. You may start and watch the video while you wait.</p>

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import os
from utils import get_llama_api_key, get_llama_base_url, get_together_api_key

llama_api_key = get_llama_api_key()
llama_base_url = get_llama_base_url()
together_api_key = get_together_api_key()

In [3]:
#!pip install llama-prompt-ops==0.0.7

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>helper.py</code> files:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em>.</p>

<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

<p> 📒 &nbsp; For more help, please see the <em>"Appendix – Tips, Help, and Download"</em> Lesson.</p>
</div>

## Creating a sample project

In [4]:
## Check if the folder exists: In the line [ -d "my-project" ] returns true if the directory is present; the || (“or”) means llama-prompt-ops create my-project executes only when that first test is false
![ -d "my-project" ] || llama-prompt-ops create my-project

In [5]:
!ls ./my-project/

README.md  config.yaml	data  prompts  results


## System prompt and dataset

In [6]:
!cat my-project/prompts/prompt.txt

You are a helpful assistant. Extract and return a json with the following keys and values:
- "urgency" as one of `high`, `medium`, `low`
- "sentiment" as one of `negative`, `neutral`, `positive`
- "categories" Create a dictionary with categories as keys and boolean values (True/False), where the value indicates whether the category is one of the best matching support category tags from: `emergency_repair_services`, `routine_maintenance_requests`, `quality_and_safety_concerns`, `specialized_cleaning_services`, `general_inquiries`, `sustainability_and_environmental_practices`, `training_and_support_requests`, `cleaning_services_scheduling`, `customer_feedback_and_complaints`, `facility_management_issues`
Your complete message should be a valid json string that can be read directly and only contain the keys mentioned in the list above. Never enclose it in ```json...```, no newlines, no unnessacary whitespaces.


In [7]:
!head -15 my-project/data/dataset.json

[
  {
    "fields": {
      "input": "Subject: Urgent Assistance Required for Specialized Cleaning Services\n\nDear ProCare Facility Solutions Support Team,\n\nI hope this message finds you well. My name is [Sender], and my family and I have been availing your services for our home for the past year. We have always appreciated the high standards and professionalism your team brings to maintaining our living environment.\n\nHowever, we are currently facing an urgent issue that requires immediate attention. We recently hosted a large gathering at our home, and despite our best efforts, there are several areas that now require specialized cleaning. Specifically, we need deep cleaning for our carpets and upholstery, as well as thorough window washing. The situation is quite pressing as we have more guests arriving soon, and we want to ensure our home is in pristine condition to welcome them.\n\nWe have tried some basic cleaning ourselves, but the results have not been satisfactory. Give

In [8]:
!cat my-project/config.yaml

system_prompt:
  file: prompts/prompt.txt
  inputs:
  - question
  outputs:
  - answer
dataset:
  path: data/dataset.json
  input_field:
  - fields
  - input
  golden_output_field: answer
model:
  task_model: together_ai/meta-llama/Llama-4-Scout-17B-16E-Instruct
  proposer_model: together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
  api_base: https://api.together.xyz/v1
metric:
  class: llama_prompt_ops.core.metrics.FacilityMetric
  strict_json: false
  output_field: answer
optimization:
  strategy: llama


In [9]:
%%writefile my-project/config.yaml
system_prompt:
  file: prompts/prompt.txt
  inputs:
  - question
  outputs:
  - answer
dataset:
  path: data/dataset.json
  input_field:
  - fields
  - input
  golden_output_field: answer
model:
  task_model: together_ai/meta-llama/Llama-4-Scout-17B-16E-Instruct
  proposer_model: together_ai/meta-llama/Llama-3.3-70B-Instruct-Turbo
  api_base: https://api.together.xyz/v1
metric:
  class: llama_prompt_ops.core.metrics.FacilityMetric
  strict_json: false
  output_field: answer
optimization:
  strategy: llama


Overwriting my-project/config.yaml


## Running prompt optimization

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p>Running prompt optimization can take a long time. To speed up running the notebooks, we will load saved results. You can change <code>run_optimization</code> to <code>True</code> to run the optimization.</p>
</div>

In [10]:
run_optimization = False     # or chnage to True to run

if run_optimization:
    !cd my-project && llama-prompt-ops migrate --api-key-env TOGETHERAI_API_KEY

## Analyzing the results

In [11]:
import glob
json_files = glob.glob("my-project/results/*.json")

import json
with open(json_files[0], "r") as f:
    data = json.load(f)
optimized_prompt = data['prompt']

In [12]:
with open("my-project/prompts/prompt.txt", "r", encoding="utf-8") as file:
    original_prompt = file.read()

In [13]:
from IPython.display import display, HTML

def compare_strings_side_by_side(text1, text2):
    html = '<table style="width: 100%; border-collapse: collapse;"><tr><th>Original Prompt</th><th>Optimized Prompt</th></tr>'
    html += f'<tr><td style="width:50% padding: 10px; vertical-align: top;"><pre style="white-space: pre-wrap; word-wrap: break-word;">{text1}</pre></td><td style="width: 50% padding: 10px; vertical-align: top;"><pre style="white-space: pre-wrap; word-wrap: break-word;">{text2}</pre></td></tr></table>'

    display(HTML(html))

compare_strings_side_by_side(original_prompt, optimized_prompt)

Original Prompt,Optimized Prompt
"You are a helpful assistant. Extract and return a json with the following keys and values: - ""urgency"" as one of `high`, `medium`, `low` - ""sentiment"" as one of `negative`, `neutral`, `positive` - ""categories"" Create a dictionary with categories as keys and boolean values (True/False), where the value indicates whether the category is one of the best matching support category tags from: `emergency_repair_services`, `routine_maintenance_requests`, `quality_and_safety_concerns`, `specialized_cleaning_services`, `general_inquiries`, `sustainability_and_environmental_practices`, `training_and_support_requests`, `cleaning_services_scheduling`, `customer_feedback_and_complaints`, `facility_management_issues` Your complete message should be a valid json string that can be read directly and only contain the keys mentioned in the list above. Never enclose it in ```json...```, no newlines, no unnessacary whitespaces.","You are a customer support specialist for ProCare Facility Solutions, tasked with analyzing incoming customer inquiries and requests. Extract and return a json with the following keys and values: - ""urgency"" as one of `high`, `medium`, `low` - ""sentiment"" as one of `negative`, `neutral`, `positive` - ""categories"" Create a dictionary with categories as keys and boolean values (True/False), where the value indicates whether the category is one of the best matching support category tags from: `emergency_repair_services`, `routine_maintenance_requests`, `quality_and_safety_concerns`, `specialized_cleaning_services`, `general_inquiries`, `sustainability_and_environmental_practices`, `training_and_support_requests`, `cleaning_services_scheduling`, `customer_feedback_and_complaints`, `facility_management_issues`. Your complete message should be a valid json string that can be read directly and only contain the keys mentioned in the list above."


## Few-shot examples

In [14]:
data['few_shots'][0]['question']

'Subject: Request for Training and Support on Facility Management Best Practices\n\nDear ProCare Support Team,\n\nI hope this message finds you well. My name is Dr. Alex Turner, and I am a wildlife ecologist who has been utilizing your facility management services for our research center. We have been quite satisfied with the overall maintenance and cleaning services provided by ProCare Facility Solutions.\n\nI am reaching out to request some additional training and support for our in-house maintenance team. As our research activities expand, we find ourselves needing to better understand the best practices in facility management, particularly in areas related to energy efficiency and environmental impact reduction. This knowledge is crucial for us to maintain our facility in a way that aligns with our ecological research goals.\n\nSo far, we have tried to implement some basic practices based on general guidelines, but we believe that a more structured training program from your expert

In [15]:
data['few_shots'][0]['answer']

'{"categories": {"routine_maintenance_requests": false, "customer_feedback_and_complaints": false, "training_and_support_requests": true, "quality_and_safety_concerns": false, "sustainability_and_environmental_practices": true, "cleaning_services_scheduling": false, "specialized_cleaning_services": false, "emergency_repair_services": false, "facility_management_issues": true, "general_inquiries": false}, "sentiment": "neutral", "urgency": "low"}'

In [16]:
len(data['few_shots'])

5

In [17]:
few_shots = "\n\nFew shot examples\n\n"
for i, shot in enumerate(data['few_shots']):
  few_shots += f"""Example {i+1}\n=================\nQuestion:\n
  {shot['question']}\n\nAnswer:\n{shot['answer']}\n\n"""

## Compare optimized and original prompt

In [18]:
with open("my-project/data/dataset.json", 'r') as f:
  ds = json.load(f)

len(ds)

200

In [19]:
ds_test = ds[int(len(ds)*0.7):]
len(ds_test)

60

In [20]:
from utils import evaluate

In [21]:
from together import Together
from tqdm.auto import tqdm

result_original = []
client = Together()

for entry in tqdm(ds_test):
    messages=[
        {"role": "system", "content": original_prompt},
        {"role": "user", "content": entry["fields"]["input"]},
    ]

    response = client.chat.completions.create(
      model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
      messages=messages,
      temperature=0
    )

    prediction = response.choices[0].message.content
    result_original.append(evaluate(entry["answer"], prediction))

  0%|          | 0/60 [00:00<?, ?it/s]

In [22]:
result_optimized = []

for entry in tqdm(ds_test):
    messages=[
        {"role": "system", "content": optimized_prompt + few_shots},
        {"role": "user", "content": entry["fields"]["input"]},
    ]

    response = client.chat.completions.create(
      model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
      messages=messages,
      temperature=0
    )

    prediction = response.choices[0].message.content
    result_optimized.append(evaluate(entry["answer"], prediction))

  0%|          | 0/60 [00:00<?, ?it/s]

In [23]:
result_optimized[0]

{'is_valid_json': True,
 'correct_categories': 0.8,
 'correct_sentiment': True,
 'correct_urgency': True,
 'total': 0.9333333333333332}

In [24]:
result_optimized[1]

{'is_valid_json': True,
 'correct_categories': 0.9,
 'correct_sentiment': False,
 'correct_urgency': True,
 'total': 0.6333333333333333}

In [25]:
float_keys = [k for k, v in result_original[0].items() if isinstance(v,
                                                (int, float, bool))]
{k: sum([e[k] for e in result_original])/len(result_original) for k in float_keys}

{'is_valid_json': 1.0,
 'correct_categories': 0.9349999999999998,
 'correct_sentiment': 0.5,
 'correct_urgency': 0.8666666666666667,
 'total': 0.7672222222222221}

In [26]:
{k: sum([e[k] for e in result_optimized])/len(result_optimized) for k in float_keys}

{'is_valid_json': 1.0,
 'correct_categories': 0.948333333333333,
 'correct_sentiment': 0.7,
 'correct_urgency': 0.9333333333333333,
 'total': 0.8605555555555552}