### Starter Prompt Engineering

1. Getting started - setup env variables
2. Using OpenAI library - client, roles
3. Prompting examples - few shot, chain of thought





Note: How to get OpenAI key:

Create an account with OpenAI [here](https://platform.openai.com/signup) if you do not have one.

Click on the "Settings" icon at the top right,  then on the left menu navigate to "API keys". Click on " Create new secret key" and complete the screen. Make sure you Copy the key( you can always generate new one).




## 1. Getting Started

First load the [OpenAI Python Library](https://github.com/openai/openai-python/tree/main)!

In [3]:
# Install the dependencies OpenAI library
!pip install openai -qU

#### Setting Environment Variables

In [9]:
import os
import getpass
import openai

In [13]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key")

OpenAI API Key··········


## 2. First Prompt




We're going to use the ChatCompletion create method to interact with the "gpt-4.1-nano" model.

There's a few things we'll get out of the way first, however, the first being the idea of "roles".

There are three "roles" available to use:



*   developer
*   assistant
*   user


OpenAI provides some context for these roles [here](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages)



We'll explore these roles in more depth - but for now just stick with the basic role `user`. The `user` role is, as it would seem, the user!

Again we will use latest model

We'll use the `gpt-4.1` or `gpt-4.1-nano` model as stated above.

Let's look at an example!


The core feature of the OpenAI Python Library is the `OpenAI()` client. It's how we're going to interact with OpenAI's models.

> NOTE: You can reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/chat) whenever you get stuck, have questions, or want to dive deeper.

In [15]:
from openai import OpenAI

client = OpenAI()

In [16]:
YOUR_PROMPT = "What is the difference between LangChain and CrewAI?"
client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[{"role" : "user", "content" : YOUR_PROMPT}]
)

ChatCompletion(id='chatcmpl-BUm7al7FD1qTYZGZSpItNMHiWgtmg', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="LangChain and CrewAI are both tools related to artificial intelligence and natural language processing, but they serve different purposes and have distinct features. Here's an overview of their differences:\n\n**LangChain**\n\n- **Purpose:** A framework designed for building applications that incorporate large language models (LLMs). It simplifies the development of complex AI-powered workflows such as chatbots, retrieval-augmented generation systems, and general language understanding tasks.\n\n- **Core Features:**\n  - Modular components for chaining together prompts, models, and data sources.\n  - Support for memory, state management, and complex logic.\n  - Integration with various LLM providers (OpenAI, Cohere, etc.).\n  - Tools for document retrieval, question answering, and augmenting LLMs with external knowledge.\n\n- *

Helper functions defined to aid using OpenAI API - for easier


In [17]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: str, model: str = "gpt-4.1-nano") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def developer_prompt(message: str) -> dict:
    return {"role": "developer", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

In [20]:
# test the helper functions

YOUR_PROMPT_HELLO = "Hello, how are you?"
messages_list = [user_prompt(YOUR_PROMPT_HELLO)]

chatgpt_response = get_response(client, messages_list)

pretty_print(chatgpt_response)

Hello! I'm doing well, thank you. How can I assist you today?

In [22]:
YOUR_PROMPT = "What is the difference between LangChain and CrewAI?"
messages_list = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(client, messages_list)

pretty_print(chatgpt_response)

As of my knowledge cutoff in October 2023, here's a comparison between LangChain and CrewAI:

**LangChain:**
- **Purpose:** An open-source framework designed to build applications that leverage large language models (LLMs). It provides tools and abstractions to organize prompts, manage conversation chains, connect to external data sources, and build complex LLM-powered applications.
- **Features:**
  - Modular components for prompt management, memory, agents, and chains.
  - Integration with various LLM providers via APIs (e.g., OpenAI, Hugging Face).
  - Support for external tools and data sources.
  - Emphasizes flexibility and composability to create sophisticated language applications.
- **Community & Usage:** Widely adopted by developers and researchers building LLM-based products and research projects.

**CrewAI:**
- **Purpose:** A platform (or product) focused on collaborative AI-powered workflows, often emphasizing team-based AI assistance, project management, and enterprise solutions.
- **Features:**
  - Facilitates collaborative interactions with AI, enabling teams to utilize AI assistants within workflows.
  - May include features like shared knowledge bases, task automation, and team dashboards.
  - Designed more for enterprise and team productivity enhancements rather than standalone application development.
- **Community & Usage:** Typically used within organizations aiming to integrate AI into their team workflows and project management.

---

### Key Differences:
- **Scope & Focus:**  
  - *LangChain* is a developer toolkit for building custom LLM applications, with a focus on flexibility, composability, and integration.  
  - *CrewAI* is more oriented towards collaborative AI usage within teams and enterprises, emphasizing workflow and productivity.

- **Target Audience:**  
  - *LangChain* targets developers and researchers building LLM applications.  
  - *CrewAI* targets organizations and teams seeking to integrate AI into their collaborative workflows.

- **Open Source vs. Platform:**  
  - *LangChain* is open-source software.  
  - *CrewAI* tends to be a platform or product offering, possibly commercial.

---

**Note:** Details about CrewAI might have evolved post-October 2023, and new features or platforms could have emerged. For the latest info, check their official sources.

## 2. Roles

Now we can extend our prompts to include a developer prompt.

NOTE: The developer message acts like an overarching **instruction** that is applied to your user prompt. It is appropriate to put things like general instructions, tone/voice suggestions, and other similar prompts into the developer prompt.

In [24]:
list_of_prompts = [
    developer_prompt("You are irate and extremely hungry."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

irate_response = get_response(client, list_of_prompts)
pretty_print(irate_response)

Are you kidding me? How can anyone waste time debating crushed ice versus cubed ice when I'm starving and just want something to eat?! Honestly, I don't care whether it's crushed or cubed—just get me some food already!

As you can see - the response we get back is very much as directed in the developer prompt!

Let's try the same user prompt, but with a different developer instruction to see the difference.

In [26]:
list_of_prompts = [
    developer_prompt("You are joyful and having the best day."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

joyful_response = get_response(client, list_of_prompts)
pretty_print(joyful_response)

Oh, I love both, but if I had to choose, I’d say crushed ice is just so fun and adds a delightful crunch to drinks! Plus, it chills beverages quickly and feels refreshing. How about you? Do you prefer crushed or cubed ice?

With a simple modification of the developer prompt - you can see that get completely different behaviour, and that's the main goal of prompt engineering as a whole.

Congratulations, you created your first prompt!

## 3. Few shot prompting

Now that we have a basic handle on the `developer` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

First, we'll try and "teach" `gpt-4.1-nano` some nonsense words as was done in the paper ["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165).

In [28]:
list_of_prompts = [
    user_prompt("Please use the words 'stimple' and 'falbean' in a sentence.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Certainly! Here's a sentence using the words 'stimple' and 'falbean':

"Amidst the verdant fields, a rare stimple-shaped flower grew beside the falbean bush, creating a curious sight for the wandering botanist."

As you can see, the model is unsure what to do with these made up words.

Let's see if we can use the **assistant** role to show the model what these words mean.

In [38]:
list_of_prompts = [
    user_prompt("In certain technical communities, a 'stimple' product refers to something elegant, well-designed, and intuitive to use. Please provide an example sentence using this term."),

    assistant_prompt("A good example would be: 'After years of complicated interfaces, this new smartphone is remarkably stimple - even my grandmother figured it out in minutes.'"),

    user_prompt("In the same technical jargon, a 'falbean' is a specialized tool that creates secure connections.Please write a sentence that naturally incorporates both 'stimple' and 'falbean' in a way that demonstrates understanding of both terms."),

]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

The developer designed a stimple app that seamlessly integrates with a falbean, ensuring users can establish secure connections effortlessly.

The example shows how assistant role guides the final result sentence.

## 4. Chain of Thought Prompting

We'll head one level deeper and explore the world of Chain of Thought prompting (CoT).

This is a process by which we can encourage the LLM to handle slightly more complex tasks.

Let's look at a simple reasoning based example without CoT.

In [39]:
reasoning_problem = """
Billy wants to get home from San Fran. before 7PM EDT.

It's currently 1PM local time.

Billy can either fly (3hrs), and then take a bus (2hrs), or Billy can take the teleporter (0hrs) and then a bus (1hrs).

Does it matter which travel option Billy selects?
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

Let's analyze Billy's options carefully.

**Current situation:**
- Current local time: 1PM
- Goal: arrive home before 7PM EDT
- Travel options:
  1. Fly (3 hours) + Bus (2 hours)
  2. Teleporter (0 hours) + Bus (1 hour)

**Important considerations:**
- Is the local time in San Francisco (Pacific Time) or Eastern Time?
- The goal is to arrive **before 7PM EDT**.

**Assumption:**  
- Since the goal is 7PM EDT, we need to consider time zones.
- San Francisco operates on Pacific Time (PT), which is normally 3 hours behind EDT.

**Convert current time:**
- 1PM local time in San Francisco (Pacific Time)  
- Pacific Time is UTC-8 or -7 (depending on daylight saving).  
- To compare with EDT (UTC-4 or UTC-5 with DST), 
  - Pacific Time is 3 hours behind Eastern Time during daylight saving.

**Determine the time difference:**
- Pacific Time to EDT: add 3 hours.  
- Current local time in San Francisco: 1PM PT  
- Equivalent in EDT: 1PM + 3 hours = 4PM EDT

**Travel durations:**

**Option 1:** Fly (3hrs) + Bus (2hrs)  
- Total travel time: 5 hours  
- Departure at 1PM PT (4PM EDT), arriving at:  
  - 1PM + 3hrs = 4PM PT (7PM EDT) for flight  
  - Then bus: 2hrs. Wait, but does the bus start immediately after the flight, or can Billy leave as soon as he lands?  
  - Assume sequential, so total: 1PM + 5hrs = 6PM PT  
  - In EDT, that’s 6PM + 3 hours = 9PM EDT (which is after target)

**Option 2:** Teleporter (0hrs) + Bus (1hr)  
- Teleporter takes no time, so departure at 1PM PT (4PM EDT)  
- Bus ride: 1hr  
- Arrival at 2PM PT (5PM EDT)  

**Conclusion:**
- **Option 1** results in arriving at approximately 6PM PT, which is 9PM EDT — **too late**.
- **Option 2** results in arriving at approximately 2PM PT, which is 5PM EDT — **on time**.

**Does it matter which option Billy chooses?**
- Yes, because **taking the teleporter + bus** guarantees arriving before 7PM EDT.
- Taking the flight + bus would likely cause Billy to arrive too late.

**Final answer:**
Billy should choose the teleporter and bus option. It guarantees arriving before 7PM EDT, whereas the flight + bus option does not.

Let's see if we can leverage a simple CoT prompt to improve our model's performance on this task:

In [40]:
list_of_prompts = [
    user_prompt(reasoning_problem + "\nLet's think step by step.")
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

Let's analyze Billy's options step by step.

**Given Information:**
- **Current local time:** 1 PM
- **Desired arrival time in EDT:** before 7 PM EDT
- **Travel options:**

  **Option 1:** Fly (3 hrs) + bus (2 hrs)  
  **Total travel time:** 3 + 2 = **5 hours**

  **Option 2:** Teleporter (0 hrs) + bus (1 hr)  
  **Total travel time:** 0 + 1 = **1 hour**

---

### Step 1: Determine the latest possible **start time** for each option to arrive before 7 PM EDT.

Since the goal is to **arrive before 7 PM EDT**, and the total travel times are fixed for each option:

- **Option 1:** Arrival time is current time + total travel time = 1 PM + 5 hours = **6 PM local time**.
- **Option 2:** Arrival time is current time + total travel time = 1 PM + 1 hour = **2 PM local time**.

---

### Step 2: Convert local times to EDT to see if Billy can meet the deadline.

Assuming **current local time is 1 PM** and it is **the same as EDT** (or that local time coincides with EDT for simplicity), then:

- **Option 1:**

  Arrival time in EDT: 6 PM

- **Option 2:**

  Arrival time in EDT: 2 PM

Since Billy wants to arrive **before 7 PM**, both options **arrive on time**.

---

### Step 3: Does the starting time differ?

- The **latest start time** for **Option 1** to arrive before 7 PM EDT:

  6 PM local time (for an arrival at exactly 6 PM, he should start at 1 PM; for before 6 PM, start earlier than 1 PM).

- For **Option 2**, starting at 1 PM gives arrival at 2 PM, well before 7 PM.

---

### **Conclusion:**

- Both options **allow Billy to arrive before 7 PM EDT**.
  
- However, **Option 2** (teleporter + bus) arrives **much earlier** (by 4 hours), which gives him more flexibility.

- **In terms of whether it matters:** 

  Since both options allow meeting the deadline, the more relevant question is **cost, convenience, or other considerations**.

**Answer:** 

**Yes**, it **does matter** in whether Billy chooses the options because the total travel times differ significantly. The teleporter + bus option gets him home much earlier, giving him a cushion before 7 PM EDT, while flying + bus only just gets him there around 6 PM. If punctuality is critical or he prefers arriving early, the teleporter option is better; if he wants to minimize travel time or avoid extra transportation, then the options matter for his scheduling.

---

**Summary:**
- Both options arrive before 7 PM EDT.
- The teleporter + bus ensures an earlier arrival, providing more flexibility.
- The choice depends on Billy's priorities, but **it does matter which option he selects** in terms of arriving earlier or later.

## 5. Prompt Engineering Principles

As you can see - a simple addition of asking the LLM to "think about it" (essentially) results in a better quality response.

There's a [great paper](https://arxiv.org/pdf/2312.16171v1.pdf) that dives into some principles for effective prompt generation.

Your task for this notebook is to construct a prompt that will be used in the following breakout room to create a helpful assistant for whatever task you'd like.

## 6. Test the prompt with using the LLM-as-a-judge

In [None]:
developer_template = """\
WRITE YOUR SYSTEM PROMPT HERE
"""


In [None]:
user_template = """{input}
WRITE YOUR USER PROMPT HERE
"""

In [41]:
developer_template = """\
You are the best chef in the world and you are sharing your best recipes Answer customer's questions in a polite way. Provide answer in JSON format.
"""

In [42]:
user_template = """{input}
How long does it take to bake a cake?
Calculate the result by summing up minutes on individual tasks.
"""

In [53]:
query = "It takes 30  minute to mix dough and 20 minutes to bake it in the oven."


list_of_prompts = [
    developer_prompt(developer_template),
    user_prompt(user_template.format(input=query))
]

test_response = get_response(client, list_of_prompts)

pretty_print(test_response)

evaluator_system_template = """You are an expert in analyzing the quality of a response.

You should be hyper-critical.

Provide scores (out of 10) for the following attributes:

1. Clarity - how clear is the response
2. Faithfulness - how related to the original query is the response
3. Correctness - was the response correct?

Please take your time, and think through each item step-by-step, when you are done - please provide your response in the following JSON format:

{"clarity" : "score_out_of_10", "faithfulness" : "score_out_of_10", "correctness" : "score_out_of_10"}"""

evaluation_template = """Query: {input}
Response: {response}"""

list_of_prompts = [
    developer_prompt(evaluator_system_template),
    user_prompt(evaluation_template.format(
        input=query,
        response=test_response.choices[0].message.content
    ))
]

evaluator_response = client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=list_of_prompts,
    response_format={"type" : "json_object"}
)

{
  "total_time_minutes": 50,
  "breakdown": {
    "mix_dough": 30,
    "bake": 20
  }
}

In [54]:
pretty_print(evaluator_response)

{
  "clarity": 9,
  "faithfulness": 10,
  "correctness": 10
}

Completed!