# Prompt Engineering Principles for OpenAI Reasoning Models (o4-mini & o1)


## Principles Overview

| Prompting Principle | What it means for **o-series reasoning models (o4-mini & o1)** |
|---|---|
| **1 — Use a `developer` message (not system) and give a clear role** | The newest reasoning models treat a *system* message as a *developer* message; mixing both is discouraged. Put the high‑level instruction in a single `role:"developer"` message, e.g., “You are an expert tax lawyer…”. |
| **2 — Keep prompts simple & direct — don’t force chain‑of‑thought** | Reasoning models already “think” internally. Over‑specifying (“think step‑by‑step… explain every thought”) wastes tokens and can leak private reasoning. Ask only for the final answer unless you truly need a public explanation. |
| **3 — Use explicit delimiters to structure input** | Wrap long passages, code, or multi‑part instructions in clear fences (` ````, `<doc>…</doc>`, Markdown headings). Delimiters help the model parse sections correctly and reduce mis‑interpretation. |
| **4 — Supply *only* the relevant context** | With 200 k‑token windows you can paste huge docs—but you *shouldn’t*. Include the minimal excerpts the task needs so the model focuses on the right evidence and stays concise. |
| **5 — Decompose or iterate instead of one giant ask (try zero‑shot first)** | Start with a straightforward version, examine the answer, then refine or break the workflow into numbered sub‑tasks. This lets the model reason deeply on each step and saves tokens. |
| **6 — Specify output format & boundaries** | Tell the model exactly *how* to answer (JSON schema, Markdown table, “≤ 120 words”, etc.). Reasoning models follow detailed format guards well and will keep their lengthy analysis inside your boundaries. |


---

## Define Chat Functions

In [20]:
# Load environment variables from .env
import os
from dotenv import load_dotenv
load_dotenv()

if not all(os.getenv(var) for var in ["AZURE_OPENAI_KEY", "AZURE_OPENAI_ENDPOINT", "AZURE_OPENAI_API_VERSION", "REASONING_NEW"]): 
    raise ValueError("❌ Missing one or more required env vars: Check .env.")

print("✅ Environment looks good: All variables are set.")

✅ Environment looks good: All variables are set.


In [21]:
# Pretty print stats
def print_token_and_filter_info(r):

    # Print token usage
    print(".........................")
    print("Token Costs:")
    print(f"Total Tokens: {r.usage.total_tokens}")
    print(f"Prompt Tokens: {r.usage.prompt_tokens}")
    print(f"Completion Tokens: {r.usage.completion_tokens}")
    print(f"Reasoning Tokens: {r.usage.completion_tokens_details.reasoning_tokens}")
    print(f"Output Tokens: {r.usage.total_tokens - r.usage.completion_tokens_details.reasoning_tokens}")
    print(".........................")

    # Print content filter results
    '''
    print("Content Filter Results:")
    filter_results = getattr(r.choices[0], "content_filter_results", None)
    if filter_results is not None:
        for k, v in filter_results.items():
            print(f"{k}: {v}")
    else:
        print("No content filter results available.")
    print(".........................")
    '''

In [None]:
# Default chat completion with developer persona
import os
from openai import AzureOpenAI
client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

def chat(model: str = os.getenv("REASONING_NEW"), persona: str="You are a friendly and helpful assistant.", query: str = "Hello!", **kwargs):
    r = client.chat.completions.create(
        model=model,
        messages=[
            {"role":"developer","content":persona},
            {"role":"user","content":query}
        ],      
        **kwargs
    )
    print(f"🗣️ {model} returned:")
    print(r.choices[0].message.content)

    # pretty print stats
    print_token_and_filter_info(r)
    return r



In [None]:
# Lets you share history of messages and customize the persona
def chat_with_history(messages: list, model: str=os.getenv("REASONING_NEW"), query: str = "Hello!", **kwargs):
    messages = messages + [{"role": "user", "content": query}]
    r = client.chat.completions.create(
        model=model,
        messages=messages,
        **kwargs
    )
    print(f"🗣️ {model} returned:")
    print(r.choices[0].message.content)
    print_token_and_filter_info(r)
    return r

---

## Principle 1: Developer Messages

- Use a developer message (not system) - and don't mix them
- Define a clear task scope or persona (Act as XYZ)
- Specify an output format if it make sense (Give me a list of items)
- Set boundaries (Use simple language)

In [6]:
# BAD PROMPT
# Uses a system message as well as a developer message.
# Sets no boundaries or clarity for response
messages=[{"role":"system","content":"You are ChatGPT"}, {"role":"developer","content":"You are a 1st grade teacher."}]
query="Explain reasoning models"
test = chat_with_history(messages=messages, query=query)

🗣️ o4-mini returned:
Hi, friends! Today we’re going to learn about “reasoning models.” That sounds like a big phrase, but it’s really just talking about ways our brain figures things out. Think of it like different tools or helpers we can use when we want to solve a problem or answer a question. Let’s look at three simple helpers:

1. “Step‑by‑Step” Helper  
   • Imagine you’re making a peanut‑butter sandwich. You follow steps:  
     1) Take two slices of bread.  
     2) Spread peanut butter on one slice.  
     3) Put the slices together.  
   • When we solve a question, we can do the same—go one small step at a time until we get our answer!

2. “Pattern Detective” Helper  
   • A detective looks for clues and patterns. For example, you see: 2, 4, 6, 8, __.  
   • You notice “each number goes up by 2.” So the next number is 10!  
   • Our brain can be a pattern detective, too, spotting what’s the same or what comes next.

3. “Treasure‑Map” Helper (Working Backward)  
   • Pretend yo

In [7]:
# GOOD PROMPT
# Using the system role
# Role description is very vague
messages=[{"role": "developer", "content": "You are a 1st grade teacher. Explain concepts with analogies and rhymes. Keep answer simple ans short."}]
query="Explain reasoning models"
test = chat_with_history(messages=messages, query=query)

🗣️ o4-mini returned:
Reasoning models are like treasure maps in your head:  
They give you steps to follow, from clue A to clue Z.  
Step by step, you never stray,  
Finding answers along the way!
.........................
Token Costs:
Total Tokens: 354
Prompt Tokens: 37
Completion Tokens: 317
Reasoning Tokens: 256
Output Tokens: 98
.........................
Content Filter Results:
hate: {'filtered': False, 'severity': 'safe'}
self_harm: {'filtered': False, 'severity': 'safe'}
sexual: {'filtered': False, 'severity': 'safe'}
violence: {'filtered': False, 'severity': 'safe'}
.........................


### 2. Simple & Direct. No CoT

- Keep it simple. 
- Give high-level guidance and let the model figure it out. 
- Don't offer irrelevant details - less is more
- Don't ask it to think step by step - it internalizes chain of thought already

| | Bad Prompt | Good Prompt |
|---|---|---|
| **Prompt** | "Explain step by step who is lying and who is telling the truth. Show all your reasoning in detail so I can follow your chain of thought" | "A says ‘B is a liar.’ B says ‘C is a knight.’ C says nothing. Who is telling the truth?" |

In [17]:
# BAD PROMPT
# Overly verbose and forces chain-of-thought reasoning
bad_query = (
    "A says ‘B is a liar.’ B says ‘C is a knight.’ C says nothing. Explain step by step who is lying and who is telling the truth. Show all your reasoning in detail so I can follow your chain of thought"
)
test = chat(query=bad_query)

🗣️ o4-mini returned:
Let’s call a person a “Knight” if they always tell the truth, and a “Liar” if they always lie.  We have three statements:

 1. A says “B is a liar.”  
 2. B says “C is a knight.”  
 3. C says nothing.  

We ask: which assignments of Knight (K) and Liar (L) to A, B, and C are logically consistent?

---

Step 1: Translate A’s statement.

– If A is a Knight, then his statement “B is a liar” is true ⇒ B is L.  
– If A is a Liar, then his statement is false ⇒ B is not a liar ⇒ B is K.  

So from A we get two branches:
  
  Branch 1: A = K ⇒ B = L  
  Branch 2: A = L ⇒ B = K  

---

Step 2: In each branch, analyze B’s statement “C is a knight.”

Branch 1 (A=K, B=L):  
– B is a Liar, so his statement is false.  
– “C is a knight” is false ⇒ C = L.

Conclusion of Branch 1:  
  A = K, B = L, C = L  
  (Check: A’s true statement B=L is good. B, being Liar, lies about C, so C really is L.  C says nothing—no contradiction.)

Branch 2 (A=L, B=K):  
– B is a Knight, so his state

In [None]:
# GOOD PROMPT
# Simple, direct, and does not force chain-of-thought reasoning
good_query = "A says ‘B is a liar.’ B says ‘C is a knight.’ C says nothing. Who is telling the truth?"
test = chat(query=good_query)

🗣️ o4-mini returned:
Let A, B, C be “knights” (always tell the truth) or “liars” (always lie).  
A says “B is a liar.”  
B says “C is a knight.”  
C says nothing.  

Call A, B, C = T (knight) or F (liar).  Then truth‐value of  
  • A’s statement (“B is a liar”) is [B = F].  
  • B’s statement (“C is a knight”) is [C = T].  

We must have  
  – If A = T, then B = F.  
      But then B = F ⇒ B’s statement is false ⇒ C = F.  
      ⇒ (A,B,C) = (T,F,F) is self‐consistent.  
  – If A = F, then A’s statement is false ⇒ B = T.  
      Then B = T ⇒ B’s statement is true ⇒ C = T.  
      ⇒ (A,B,C) = (F,T,T) is also self‐consistent.  

No other assignment works.  Hence there are exactly two solutions:  
 1) A = knight, B = liar, C = liar  
 2) A = liar,  B = knight, C = knight  

In particular exactly one of A or B is telling the truth, and C turns out to be the same type as B.
.........................
Token Costs:
Total Tokens: 2022
Prompt Tokens: 44
Completion Tokens: 1978
Reasoning Tokens: 1

### 3 Delimiters for structure


| | Bad Prompt | Good Prompt |
|---|---|---|
| **Pair A** | “Summarize this plus write code” (pastes code & prose un‑separated) | “Summarize the prose, then improve the code in **Section 2** below.\n### Section 1 – Prose\n```text\n...\n```\n### Section 2 – Code\n```python\n...\n```” |
| **Pair B** | “Fix errors in my SQL:” + random HTML fragment mixed in | “Between `<sql>` tags is my query; return only the corrected query.\n<sql>\nSELECT * FROM orders o JOIN customers c ON id;\n</sql>” |


In [None]:
# GOOD PROMPT using XML delimiters for structure
query = """
Summarize the recipe below in 3 bullet points, focusing on key ingredients and preparation steps.
<recipe>
<name>Mango Margaritas</name>
<ingredients>2 cups mango, 1 cup tequila, 1/2 cup lime juice, 1/4 cup triple sec</ingredients>
<instructions>Blend mango, tequila, lime juice, and triple sec until smooth. Serve over ice.</instructions>
<serving>4</serving>
<calories>200</calories>
<prep_time>10 minutes</prep_time>
<total_time>10 minutes</total_time>
<notes>Refreshing summer drink, perfect for parties!</notes>
<tips>Use fresh mango for best flavor.</tips>
</recipe>
"""
test = chat(query=query)

### 4 Relevant context only


| | Bad Prompt | Good Prompt |
|---|---|---|
| **Pair A** | “Summarize ACME’s entire 300‑page 10‑K (pasted below) in 3 bullets.” | “Summarize **Risk Factors** (pp 12‑15) from ACME’s 2024 10‑K into 3 bullets.” |
| **Pair B** | “Based on these 10 articles (pasted), who won the case?” | “Using the quoted judgment excerpt below, identify which party (Smith or Jones) prevailed.\n```<judgment>…</judgment>```” |


### 5 Decompose / iterate


| | Bad Prompt | Good Prompt |
|---|---|---|
| **Pair A** | “Write a 40‑page business plan including market, finances, HR, legal.” | “Step 1 — outline sections & bullet points. *Wait.*\nStep 2 — expand the **Market Analysis** section to ~500 words.” |
| **Pair B** | “Translate, summarize and turn into slides—in one go.” | “(a) Translate the article to English.  (b) Summarize it in 5 bullets.  (c) Provide slide headlines.” |


### 6 Output format & boundaries


| | Bad Prompt | Good Prompt |
|---|---|---|
| **Pair A** | “Tell me key poll data for EU elections.” | “Return JSON array `{country, pollster, sample_size, lead_pct}` for Germany, France, Spain (2024 polls only).” |
| **Pair B** | “Explain transformers.” | “Explain transformer architecture in **exactly five bullet points of ≤ 15 words each**, no code blocks.” |



### References

- OpenAI Reasoning Guide – <https://platform.openai.com/docs/guides/reasoning>  
- OpenAI Reasoning Best Practices – <https://platform.openai.com/docs/guides/reasoning-best-practices>  
- Azure TechCommunity Blog: *Prompt Engineering for OpenAI’s O1 and O3‑mini Reasoning Models* – <https://techcommunity.microsoft.com/blog/azure-ai-services-blog/prompt-engineering-for-openai%E2%80%99s-o1-and-o3-mini-reasoning-models/4374010>
