# Ungraded Lab: Prompt Engineering


## 1. Introduction

Welcome to the ungraded lab on Prompt Engineering! In this lab you will explore some prompt techniques to help adjust the LLM to your needs. Mainly in this lab you will:


1. Learn how to make an LLM generate specific outputs, such as labeling a sentence 
2. Make an LLM call with different parameters depending on the task nature of the prompt
3. Make an LLM return a specific object type in its response, like a JSON


### 1.1 Importing the libraries


# Table of Contents
- [ 1 - Text Classification with LLMs](#1)
- [ 2 - Parameter Setting Based on Tasks](#2)
- [ 3 - Guiding the LLM to Output Specific Objects](#3)


In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from utils import (
    generate_with_single_input, 
    generate_with_multiple_input, 
    generate_params_dict
)

<a id='1'></a>
## 1 - Text Classification with LLMs

An interesting and practical application of language models (LLMs) is transforming them into text classifiers. With good instructions, you can guide an LLM to categorize text based on sentiment, task type, and more. Since classifiers typically output fixed labels (such as 1 for positive and 0 for negative), you need to construct a prompt that minimizes the likelihood of the LLM producing unexpected outputs. Besides designing a robust prompt, implementing checks in your code is something to consider to avoid potential issues, such as a function later in the process expecting values of 1 or 0 but receiving an unexpected value like 2 or a phrase such as "positive sentence." This combination of strategies ensures reliable classification results while maintaining flexibility in handling unforeseen outputs.

To illustrate, let's suppose you are developing a chatbot for a company that sells sport outfits and nutritional supplements. The idea is to make an LLM decide if the query is related to outfits or nutrition. This might be useful to redirect your LLM to the correct database to query from.

1. Be precise. You need to explain exactly what you want it to do and output.
2. Add examples. Create examples with their expected result.
3. You might add also edgy examples, i.e., examples that you know that might be hard for the LLM to properly decide.

In [3]:
def check_if_outfit_or_supplement(query):
    prompt = f"""
Determine the category of the following query as either "nutritional" or "outfit" related.
- Nutritional queries: These are related to nutrition products, such as whey protein, vitamins, supplements, dietary products, and health-related food and beverages.
  - Outfit queries: These pertain to clothing and fashion, including items like shirts, dresses, shoes, accessories, and jewelry.
Examples:

1. Query: “Where can I buy high-protein snacks?” Expected answer: Nutritional
2. Query: “Best shirt styles for summer 2023” Expected answer: Outfit
3. Query: “Are there any shoes designed for running?” Expected answer: Outfit
4. Query: “What multivitamins should I take daily?” Expected answer: Nutritional
5. Query: “Best weight loss products that are stylish” Expected answer: Nutritional
6. Query: “Athletic wear that boosts performance” Expected answer: Outfit 

Query: {query}

Instructions: Respond with “Nutritional” if the query pertains to nutritional products or “Outfit” if it pertains to clothing or fashion products.
Answer only one single word.
"""
    return prompt
    

In [4]:
# Testing a simple query
query = "Give me the available vitamins supplement you have in your catalogue."
generate_with_single_input(check_if_outfit_or_supplement(query), max_tokens = 2)

{'role': 'assistant', 'content': 'Nutritional'}

Now let's test in a bigger set.

In [5]:
# ASCII color codes
GREEN = '\033[92m'
RED = '\033[91m'
RESET = '\033[0m'

queries = [
    {"query": "Where can I buy whey protein?", "label": "Nutritional"},
    {"query": "Recommended vitamins for winter", "label": "Nutritional"},
    {"query": "Latest fashion for women's dresses", "label": "Outfit"},
    {"query": "Comfortable sneakers for daily use", "label": "Outfit"},
    {"query": "Best energy bars for athletes", "label": "Nutritional"},
    {"query": "Trendy accessories for men", "label": "Outfit"},
    {"query": "Low-carb diet food options", "label": "Nutritional"},
    {"query": "What supplements help with muscle recovery?", "label": "Nutritional"},
    {"query": "Casual wear that supports healthy living", "label": "Outfit"}
]

for item in queries:
    query = item["query"]
    prompt = check_if_outfit_or_supplement(query)
    expected_label = item["label"]
    response = generate_with_single_input(prompt, max_tokens = 2)
    result = response['content']
    
    # Determine color based on comparison
    if result == expected_label:
        color = GREEN
    else:
        color = RED

    print(f"Query: {query}\nResult: {result}\nExpected: {color}{expected_label}{RESET}\n")
    

Query: Where can I buy whey protein?
Result: Nutritional
Expected: [92mNutritional[0m

Query: Recommended vitamins for winter
Result: Nutritional
Expected: [92mNutritional[0m

Query: Latest fashion for women's dresses
Result: Outfit
Expected: [92mOutfit[0m

Query: Comfortable sneakers for daily use
Result: Outfit
Expected: [92mOutfit[0m

Query: Best energy bars for athletes
Result: Nutritional
Expected: [92mNutritional[0m

Query: Trendy accessories for men
Result: Outfit
Expected: [92mOutfit[0m

Query: Low-carb diet food options
Result: Nutritional
Expected: [92mNutritional[0m

Query: What supplements help with muscle recovery?
Result: Nutritional
Expected: [92mNutritional[0m

Query: Casual wear that supports healthy living
Result: Outfit
Expected: [92mOutfit[0m



<a id='2'></a>
## 2 - Parameter Setting Based on Tasks

In this section, you will learn how to adjust your LLM interactions to be flexible, allowing you to control its behavior based on the nature of the task. This involves determining the nature of the query before requesting the LLM to respond. 

In this exercise, let's develop a function to categorize a query as either technical or creative. Once categorized, you can apply different parameters suited for each task type. Technical queries generally benefit from lower randomness, whereas creative tasks may benefit from allowing higher randomness.

In [6]:
def decide_if_technical_or_creative(query):
    """
    Determines whether a given query is creative or technical in nature.

    Args:
        query (str): The query string to be evaluated.

    Returns:
        str: A label indicating the query type, either 'creative' or 'technical'.

    This function constructs a prompt to classify a query based on its content. 
    Creative queries typically involve requests to generate original content, whereas 
    technical queries relate to documentation or technical information, such as procedures.
    By leveraging an LLM, it identifies the query type and returns an appropriate label.
    """
    
    PROMPT = f"""Decide if the following query is a creative query or a technical query.
    Creative queries ask you to create content, while technical queries are related to documentation or technical requests, like information about procedures.
    Answer only 'creative' or 'technical'.
    Query: {query}
    """
    result = generate_with_single_input(PROMPT)
    label = result['content']
    return label

In [7]:
queries = ["What is Pi-hole?", 
           "Suggest to me three places to visit in South America"]
for query in queries:
    label =decide_if_technical_or_creative(query)
    print(f"Query: {query}, label: {label}")

Query: What is Pi-hole?, label: technical
Query: Suggest to me three places to visit in South America, label: creative


In [8]:
def answer_query(query):
    """
    Processes a query and generates an appropriate response by categorizing the query
    as either 'technical' or 'creative', and modifies behavior based on this categorization.

    Args:
        query (str): The query string to be answered.

    Returns:
        str: A generated response from the LLM tailored to the nature of the query.

    This function first determines the nature of the query using the `decide_if_technical_or_creative` function. 
    If the query is classified as 'technical', it sets parameters suitable for precise and low-variability responses. 
    If the query is 'creative', it applies parameters allowing for more variability and creativity. 
    If the classification is inconclusive, it uses neutral parameters. 
    It then generates a response using these parameters and returns the content.
    """
    
    # Determine whether the query is 'technical' or 'creative'
    label = decide_if_technical_or_creative(query).lower()

    # Set parameters for technical queries (precise, low randomness)
    if label == 'technical':
        kwargs = generate_params_dict(query, temperature=0, top_p=0.1)
    
    # Set parameters for creative queries (variable, high randomness)
    elif label == 'creative':
        kwargs = generate_params_dict(query, temperature=1.1, top_p=0.4)

    # Use default parameters if the query type is inconclusive
    else:
        kwargs = generate_params_dict(query, temperature=0.5, top_p=0.5)
    
    # Generate a response based on the query type and parameters
    response = generate_with_single_input(**kwargs)
    
    # Extract and return the content from the response
    result = response['content']
    return result

In [9]:
queries = ["What is Pi-hole?", 
           "Suggest to me three places to visit in South America"]
for query in queries:
    result = answer_query(query)
    print(f"Query: {query}\nAnswer: {result}\n\n#######\n")

Query: What is Pi-hole?
Answer: Pi-hole is a free, open-source, and self-contained ad-blocking DNS server that can be run on a Raspberry Pi or other single-board computers. It's designed to block ads, trackers, and other unwanted content on your network, providing a more private and secure browsing experience.

Here's how it works:

1. You install Pi-hole on a Raspberry Pi or other compatible device.
2. The device runs a custom operating system, such as Raspbian, and loads the Pi-hole software.
3. Pi-hole sets up a DNS server that intercepts all DNS queries on your network.
4. When a device on your network requests a website, Pi-hole checks the requested domain against its database of blocked ads and trackers.
5. If the domain is found in the blocklist, Pi-hole returns a fake IP address that is not associated with any website, effectively blocking the ad or tracker.
6. The device on your network receives the fake IP address and displays the requested website without the ad or tracker.


<a id='3'></a>
## 3 - Guiding the LLM to Output Specific Objects

In this section, you'll explore how to make the LLM generate outputs in specific formats, such as JSON, which can be used by another application. This is a crucial aspect when working with an LLM, as applications often require data in precise formats.

Let's imagine you're automating your home and want to create your personal assistant to control devices like lights and sound systems. The goal is to translate a user's request into a specific format that your home automation server can understand.

In this hypothetical scenario, the format for each action is a JSON structure containing details like this:

```json
{
  "room": "room where the action will occur",
  "object_id": "unique identifier of the targeted object",
  "object_name": "name of the object",
  "action": "action to be performed",
  "parameters": "dictionary containing action-specific parameters"
}
```

For instance, to turn on the office light and set its color to yellow, you should provide the following JSON to the software:

```json
{
  "room": "office",
  "object_id": "152",
  "object_name": "office_light",
  "action": "turn on",
  "parameters": {"color": "yellow"}
}
```


### 3.1 The Old-Fashioned Way  

Let's start with the old-fashioned way by creating a detailed prompt to pass to the LLM.

The following prompt example offers a comprehensive structure. Note how incorporating the JSON format is essential to prevent errors with `f-string` syntax. 

**NOTE**: Creating effective prompts often involves a great deal of experimentation. It's a natural part of the creative process to assess the outputs generated by different prompts, identify potential flaws, and make necessary adjustments to refine them.

In [10]:
def generate_system_call(command):
    PROMPT = f"""
You are an assistant program that converts natural language commands into structured JSON for controlling smart home devices. The JSON should conform to a specific format describing the device, action, and parameters. Here's how you can do it:

**Available Devices and Actions:**

1. **Light**
   - Actions: "turn on", "turn off"
   - Parameters: color, intensity (percentage)

2. **Automatic Lock**
   - Actions: "lock", "unlock"
   - Parameters: None

3. **Sound System (Speaker)**
   - Actions: "play", "pause", "stop", "set volume"
   - Parameters: volume (integer), track (string), playlist_style (string)

4. **TV**
   - Actions: "turn on", "turn off", "change channel", "adjust volume"
   - Parameters: channel (string), volume (integer)

5. **Air Conditioner**
   - Actions: "turn on", "turn off", "set temperature", "adjust fan speed"
   - Parameters: temperature (integer), fan_speed (low/medium/high)

**Rooms and Devices:**
- **Office**
  - Lights: "office_light_1" (ID: 123), "office_light_2" (ID: 321)
  - Automatic Lock: "office_door_lock" (ID: 111)

- **Living Room**
  - Light: "living_room_light" (ID: 222)
  - Speaker: "living_room_speaker" (ID: 223)
  - Air Conditioner: "living_room_airconditioner" (ID: 556)

- **Kitchen**
  - Light: "kitchen_light" (ID: 333)

- **Bedroom**
  - Light: "bedroom_light" (ID: 444)
  - TV: "bedroom_tv" (ID: 445)

- **Bathroom**
  - Light: "bathroom_light" (ID: 555)

**Task:**
Convert the following natural language command into the structured JSON format based on the available devices:

**Input Examples:**

1. "Turn on the office light with ID 123 with blue color and 50% intensity."
   - JSON:
     [
     {{
       "room": "office",
       "object_id": "123",
       "object_name": "office_light_1",
       "action": "turn on",
       "parameters": {{"color": "blue", "intensity": "50%"}}
     }}
     ]

2. "Lock the office door."
   - JSON:
   [
     {{
       "room": "office",
       "object_id": "111",
       "object_name": "office_door_lock",
       "action": "lock",
       "parameters": {{}}
     }}
    ]

2. "Make my living room a cheerful place"
   - JSON:
   [
     {{
       "room": "living_room",
       "object_id": "222",
       "object_name": "living_room_light",
       "action": "turn on",
       "parameters": {{'intensity': '80%', 'color':'yellow'}}
     }},
     {{
       "room": "living_room",
       "object_id": "223",
       "object_name": "living_room_speaker",
       "action": "turn on",
       "parameters": {{'volume': '100', 'playlist_style':'party'}}
     }},
     
   ]

**Note:**
- Ensure that each JSON object correctly maps the natural command to the appropriate device and action using the listed device ID.
- Use the object ID to differentiate between devices when the room contains multiple similar items.
- You can add more than one parameter in the parameters dictionary.

Using this information, translate the following command into JSON: "{command}". Output a list with all the necessary JSONs. 
Always output a list even if there is only one command to be applied, do not output anything else but the desired structure.
"""
    kwargs = generate_params_dict(PROMPT, temperature=0.4, top_p=0.1)
    result = generate_with_single_input(**kwargs)
    return result['content']

In [11]:
print(generate_system_call("Play a chill playlist very loud"))

[
  {
    "room": "living_room",
    "object_id": "223",
    "object_name": "living_room_speaker",
    "action": "play",
    "parameters": {
      "volume": "100",
      "playlist_style": "chill"
    }
  },
  {
    "room": "living_room",
    "object_id": "223",
    "object_name": "living_room_speaker",
    "action": "adjust volume",
    "parameters": {
      "volume": "100"
    }
  }
]


In [12]:
print(generate_system_call("I'm tired today, please make my living room a very cozy ambient, it is really cold today too."))

[
  {
    "room": "living_room",
    "object_id": "222",
    "object_name": "living_room_light",
    "action": "turn on",
    "parameters": {'intensity': '80%', 'color':'yellow'}
  },
  {
    "room": "living_room",
    "object_id": "223",
    "object_name": "living_room_speaker",
    "action": "turn on",
    "parameters": {'volume': '100', 'playlist_style':'relaxing'}
  },
  {
    "room": "living_room",
    "object_id": "556",
    "object_name": "living_room_airconditioner",
    "action": "set temperature",
    "parameters": {'temperature': '18', 'fan_speed': 'low'}
  }
]


### 3.2 Using LLM structured output parameter

It is possible to force the LLM to output a JSON using [Pydantic](https://docs.pydantic.dev/latest/) to help it validate the data structure, so you make sure that the output is always a JSON! 

Let's see an example below!

In [13]:
from pydantic import BaseModel, validator, conint, Field
from typing import Literal, Union, Optional, List
import json

# Define the schema for the output
class VoiceNote(BaseModel):
    title: str = Field(description="A title for the voice note")
    summary: str = Field(description="A short one sentence summary of the voice note.")
    actionItems: list[str] = Field(
        description="A list of action items from the voice note"
    )


In [14]:
transcript = (
        "Good morning! It's 7:00 AM, and I'm just waking up. Today is going to be a busy day, "
        "so let's get started. First, I need to make a quick breakfast. I think I'll have some "
        "scrambled eggs and toast with a cup of coffee. While I'm cooking, I'll also check my "
        "emails to see if there's anything urgent."
    )


messages=[
            {
                "role": "system",
                "content": "The following is a voice message transcript. Only answer in JSON.",
            },
            {
                "role": "user",
                "content": transcript,
            },
        ]

response_format={
            "type": "json_schema",
            "schema": VoiceNote.model_json_schema(),
        }

result = generate_with_multiple_input(messages, response_format = response_format)
result_json = json.loads(result['content'])
print(json.dumps(result_json, indent=2))

{
  "title": "Morning Routine",
  "summary": "Waking up at 7:00 AM, planning a quick breakfast of scrambled eggs and toast with coffee, and checking emails for any urgent messages.",
  "actionItems": [
    "Prepare breakfast",
    "Check emails for urgency"
  ]
}


Keep it up! You finished the ungraded lab on Prompt Engineering!