# General Prompt Engineering with GPT-4.1  
  
This notebook will help you master prompt engineering—crafting instructions to reliably get the best results from GPT-4.1 and related LLMs. We’ll use best practices and practical strategies for clarity, accuracy, and efficiency.  
  
---  
  
## 1️⃣ What is Prompt Engineering?  
  
**Prompt engineering** is the science of giving language models instructions to get high-quality outputs. Effective prompt engineering involves:  
- Giving clear, detailed instructions,  
- Supplying examples,  
- Adding context or references,  
- Specifying output format and priorities.  
  
By crafting better prompts, you can drastically improve response quality for any application—from Q&A, to code generation, to document analysis.  
  
---  

## 2️⃣ Chat Messages and Roles  
  
Unlike traditional models, GPT-4.1 and newer chat models take a structured **array of messages**. Each message has a **role** that guides the model’s interpretation:  
  
| Role       | Description                                                                |  
|------------|----------------------------------------------------------------------------|  
| developer  | High-priority “system” message, sets default behavior, tone, constraints   |  
| user       | The primary input or instruction from the end-user                         |  
| assistant  | (Optional) Previous responses or demonstrations                            |  
  
**Example: Persona and Style Control**  
  

In [1]:
# Import libraries and load environment variables  
  
import os  
from openai import AzureOpenAI  
from dotenv import load_dotenv  
  
# Load Azure credentials from a .env file (you should have credentials.env in your directory)  
load_dotenv("credentials.env")  
  
# Create Azure OpenAI client  
client = AzureOpenAI(  
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2025-01-01-preview",  
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")  
)  
  
print("✅ Azure OpenAI client initialized.")  

✅ Azure OpenAI client initialized.


In [2]:
# Example: Setting up a "southern belle programmer" assistant persona  
  
messages = [  
    {  
        "role": "developer",  
        "content": [  
            { "type": "text", "text": "You are a helpful assistant that answers programming questions in the style of a southern belle from the southeast United States." }  
        ]  
    },  
    {  
        "role": "user",  
        "content": [  
            { "type": "text", "text": "Are semicolons optional in JavaScript?" }  
        ]  
    }  
]  
  
response = client.chat.completions.create(  
    model="gpt-4.1",  
    messages=messages  
)  
print(response.choices[0].message.content)  

Well, sugar, you’ve asked a fine question!

In JavaScript, semicolons **are technically optional** in many cases—thanks to a feature called *Automatic Semicolon Insertion* (ASI). The language tries its best to help you out by inserting semicolons where it thinks you meant to put 'em if you leave them out.

**However** (and it’s a big “however”, hon), leaving ’em off can sometimes lead to unexpected bugs, especially in trickier code. There are certain situations where JavaScript gets confused if you don’t pop a semicolon in, such as:

```javascript
let x = 5
let y = 10
console.log(x + y)
```
That’ll work just fine, bless its heart.

But sometimes, like with return statements:
```javascript
function getObject() {
    return
    {
        name: "Daisy"
    }
}
```
That sneaky JavaScript engine adds a semicolon *right after* the `return`, so your function ends up returning `undefined` instead of that nice object you had in mind!

**My advice, darlin’:** Most folks stick to the habit of end

**Try changing the content of the developer message above—watch how the assistant’s style and expertise changes!**  
  
---  

## 3️⃣ Strategy 1: Write Clear Instructions  
  
- **Be explicit:** The more detail you give, the less the model needs to guess.  
- **Set expectations:** Specify length, style, steps, or format—don't assume!  
- **Sample:**  
  
> “Summarize this article in three bullet points, then ask one insightful follow-up question.”  
  
### 👎 Vague prompt  
  

In [3]:
vague_prompt = [  
    {"role": "user", "content": "Summarize this text: John bought apples and oranges at the market."}  
]  
resp = client.chat.completions.create(model="gpt-4.1", messages=vague_prompt)  
print(resp.choices[0].message.content)  

John purchased apples and oranges at the market.


### 👍 Improved prompt  
  

In [4]:
clear_prompt = [  
    {"role": "user", "content": "Summarize the following in one bullet point, and use friendly language:\nJohn bought apples and oranges at the market."}  
]  
resp = client.chat.completions.create(model="gpt-4.1", messages=clear_prompt)  
print(resp.choices[0].message.content)  

- John picked up some apples and oranges at the market.


**Tip:**    
For complex requests, break them into lists or use headings to guide the model.  
  
---  

## 4️⃣ Strategy 2: Provide Reference Text and Retrieval Augmented Generation (RAG)  
  
**Why?**    
Language models can hallucinate facts, especially on niche or recent topics. By providing reference text (such as documents, snippets, or database results), you guide the model to use trustworthy information.  
  
**How to do it:**  
- Delimit your reference text clearly (use triple quotes, markdown, XML tags, etc.).  
- Instruct the model to answer **only** from the provided materials (or to cite them).  
- For longer references, mention which delimiters or citation markers to use.  
  
---  
  
**Example 1: Injecting Reference Information**  
  

In [5]:
messages = [  
    {"role": "developer",   
     "content": "Use ONLY the provided text (inside triple quotes) to answer user questions. If you can’t find the answer, say so."},  
    {"role": "user",   
     "content": '''"""The warranty for Product A12 lasts 12 months from the date of purchase. Returns are possible within 30 days of delivery."""\nWhat is the warranty period for Product A12?'''}  
]  
  
resp = client.chat.completions.create(model="gpt-4.1", messages=messages)  
print(resp.choices[0].message.content)  

The warranty period for Product A12 is 12 months from the date of purchase.


In [6]:
#example 2 
messages = [  
    {"role": "developer",   
     "content": 'Use only the document delimited by triple quotes. Cite any answer with {"citation": ...}. If you cannot answer, say "Insufficient information."'},  
    {"role": "user",   
     "content": '''"""The event will be held on July 10, 2024 at the Grand Hall downtown."""\nQuestion: When and where is the event?'''}  
]  
  
resp = client.chat.completions.create(model="gpt-4.1", messages=messages)  
print(resp.choices[0].message.content)  

The event is on July 10, 2024 at the Grand Hall downtown. {"citation": "The event will be held on July 10, 2024 at the Grand Hall downtown."}


## 5️⃣ Strategy 4: Give the Model Time to “Think” (Chain-of-Thought Prompting)  
  
**Why?**    
If given space to “think out loud,” models can solve problems more reliably, especially in reasoning, math, and evaluations.  
  
**How to do it:**  
- Explicitly instruct the model to "reason step by step" or "show your work before answering."  
- You can also ask for a structured plan or interim analysis.  
  
---  
  
**Example: Chain of Thought for a Math Problem**  
  

In [7]:
messages = [  
    {"role": "developer",   
     "content": "First, work out your solution step by step before you give the final answer."},  
    {"role": "user",   
     "content": "If Alice has 3 apples and gets 5 more, how many apples does she have in total?"}  
]  
resp = client.chat.completions.create(model="gpt-4.1", messages=messages)  
print(resp.choices[0].message.content)  

Let's solve the problem step by step:

1. **Initial apples Alice has:** 3 apples
2. **Apples Alice gets:** 5 more apples

To find the total number of apples Alice has, we need to add the two amounts together:

\( 3 \) (initial) + \( 5 \) (more) = \( 8 \) apples

**Final Answer:**  
Alice has **8 apples** in total.


In [8]:
#Ask for error-check or self-evaluation
   
messages = [  
    {"role": "developer",   
     "content": "Solve the problem step by step, then check if you made any mistakes before answering."},  
    {"role": "user",   
     "content": "Multiply 17 by 23."}  
]  
resp = client.chat.completions.create(model="gpt-4.1", messages=messages)  
print(resp.choices[0].message.content)  

Let's solve the problem step by step:

We need to multiply 17 by 23.

Step 1: Write the numbers.

```
   17
x  23
```

Step 2: Multiply 17 by 3 (units digit of 23):  
\( 17 \times 3 = 51 \)

Step 3: Multiply 17 by 2 (the '2' in 23 is actually 20):
\( 17 \times 2 = 34 \)
Since this is actually the tens place, we write it as 340.

Step 4: Add the two products:
\( 51 + 340 = 391 \)

Let's check the calculation:

Double-check with the standard multiplication:

```
   17
x  23
------
   51   <-- 17*3
+340   <-- 17*2 (shifted one position to the left)
------
  391
```

Answer:
\[
\boxed{391}
\]


  
---  
  
```markdown  
## 6️⃣ Strategy 5: Use External Tools and Function Calling  
  
**Why?**    
Language models have limitations: they may not have up-to-date or domain-specific knowledge, and are unreliable for math or code execution. Giving them access to tools (retrievers, code interpreters, APIs) allows more accurate, sophisticated responses.  
  
**How to do it:**  
- Use OpenAI’s function calling features to let the model call your functions/tools.  
- For accurate calculations, instruct the model to write code (e.g., in markdown code blocks) and (optionally) execute and resubmit results.  
  
---  
  
**Example: Simple Function Call (Pseudocode)**  
  

In [22]:
def factorial(n):  
    if n < 0:  
        return "Error: n must be a non-negative integer"  
    result = 1  
    for i in range(2, n+1):  
        result *= i  
    return result  

In [23]:
functions = [  
    {  
        "name": "factorial",  
        "description": "Calculates the factorial of a non-negative integer.",  
        "parameters": {  
            "type": "object",  
            "properties": {  
                "n": {"type": "integer", "description": "A non-negative integer"}  
            },  
            "required": ["n"]  
        }  
    }  
]  
messages = [  
    {"role": "user", "content": "What is the factorial of 5?"}  
]  

In [24]:
#Call OpenAI and inspect the function call. The function call info is in response.choices[0].message.function_call, not in .message.content (which will be None for function calls)!
response = client.chat.completions.create(  
    model="gpt-4.1",  
    messages=messages,  
    functions=functions  
)  
  
# Extract function call  
function_call = response.choices[0].message.function_call  
print(function_call)  # See what the model wants to call  

FunctionCall(arguments='{"n":5}', name='factorial')


In [25]:
import json  
  
args = json.loads(function_call.arguments)  
result = factorial(args["n"])  
  
print(result)    # <- This shows the numerical result  

120


In [26]:
messages.append({  
    "role": "function",  
    "name": "factorial",  
    "content": json.dumps({"result": result})  
})  
  
response2 = client.chat.completions.create(  
    model="gpt-4.1",  
    messages=messages  
)  
  
print(response2.choices[0].message.content)  
# Now you get "The factorial of 5 is 120." or similar  

The factorial of 5, written as 5!, is:

5! = 5 × 4 × 3 × 2 × 1 = **120**


# 📊 Strategy 7: Evaluation / Testing  
  
Test the model's responses by giving a prompt and comparing its answer against a gold-standard. We can measure similarity and set up grading.  
  
Below, we use a simple similarity check using Python's `difflib`. For robust setups, consider BLEU, ROUGE, or manual rubric scoring.  

In [27]:
# Evaluation: Compare Model's Answer to Gold Standard  
  
from difflib import SequenceMatcher  
  
def evaluate_lm_response(gold_answer, model_response):  
    s = SequenceMatcher(None, gold_answer.strip().lower(), model_response.strip().lower())  
    similarity = s.ratio()  
    print(f"Similarity score: {similarity:.2f}")  
    return similarity  
  
# Example  
prompt = "What is the capital of France?"  
gold_answer = "Paris"  
model_response = "The capital of France is Paris."  
  
evaluate_lm_response(gold_answer, model_response)  

Similarity score: 0.28


0.2777777777777778

# 📋 General LLM Tactics (Reference Table)  
  
This table highlights prompt engineering and API usage tactics for optimizing LLM results.  
  
| Tactic        | Description                        | Example                                               |  
|---------------|------------------------------------|-------------------------------------------------------|  
| Delimiters    | Use ```, """, < > etc. to separate | `"""Summarize this passage: ..."""`                   |  
| Output Length | Limit/maximize response size       | `max_tokens=100` or "In 2 sentences or less..."       |  
| Few-shot      | Provide sample Q&A before query    | "Q: 2+2?\nA: 4\nQ: 3+5?\nA:"                         |  
| Personas      | Specify a role or identity         | "You are a helpful assistant."                        |  
| Summarization | Ask to summarize/shorten           | "Summarize the following text briefly."               |  
| Citations     | Request sources or quotes          | "List your sources at the end."                       |  

# ⚡ Optimizing for Accuracy, Cost, Latency  
  
Choose your settings and prompts to balance quality, speed, and price.  
  
**Examples:**  

In [31]:
# Example 1: Limit cost by capping response length  
response = client.chat.completions.create(  
    model="gpt-4.1",  
    messages=[{"role": "user", "content": "Explain relativity simply."}],  
    max_tokens=50    # Lower token limit = lower cost  
)  
print(response.choices[0].message.content)  
  
# Example 2: Increase speed with smaller models or streaming  
response = client.chat.completions.create(  
    model="gpt-4.1",   
    messages=[{"role": "user", "content": "Summarize this article..."}],  
    stream=True   # Streams output for faster perceived response  
)  
for chunk in response:  
    if chunk.choices and hasattr(chunk.choices[0].delta, "content"):  
        print(chunk.choices[0].delta.content or "", end="")  
print()  # Newline after the streamed output   
  
# Example 3: Explicit instructions for higher accuracy  
prompt = (  
    "You are a medical expert. "  
    "In one sentence, explain what insulin does."  
)  
response = client.chat.completions.create(  
    model="gpt-4.1",  
    messages=[{"role": "user", "content": prompt}]  
)  
print(response.choices[0].message.content)  

Sure! Here’s a simple explanation of **relativity**:

**Relativity** is a theory from Albert Einstein that describes how space and time work, especially when things move very fast or are near strong gravity.

It has two main ideas:

1
Certainly! Please provide the article text or a link to it, and I'll summarize it for you.
Insulin is a hormone produced by the pancreas that helps lower blood sugar levels by enabling cells to absorb glucose from the bloodstream for energy or storage.


# 📚 Additional Resources  
  
- OpenAI Cookbook : https://platform.openai.com/docs/guides/prompt-engineering/prompt-engineering