### 1.2 Load Your API Key Securely

We **never** hardcode API keys. Instead we keep them in a `.env` file and load them at runtime with `python-dotenv`. Treat your key like a password — if it leaks, anyone can run up charges on your account.


In [4]:
import os
from dotenv import load_dotenv

import textwrap


#This is optional. I use VPN in my computer. Why I need this. 
import truststore
truststore.inject_into_ssl()



def pretty_print(*args):
    text = " ".join(str(arg) for arg in args)
    try:
        print(textwrap.fill(text, width=80))
    except Exception as e:
        print(text)  # fallback to normal print if text is not a string

        

load_dotenv('/Users/shivam13juna/Documents/scaler/iitr_classes/llm_ref/openai_key.env')  # reads .env file in the current directory

api_key = os.getenv("OPENAI_API_KEY")

if not api_key:
    raise ValueError(
        "OPENAI_API_KEY not found! "
        "Make sure you have a .env file with: OPENAI_API_KEY=sk-..."
    )

pretty_print("API key loaded successfully.")

API key loaded successfully.


In [5]:
from openai import OpenAI

client = OpenAI(api_key=api_key)
pretty_print("OpenAI client ready.")

OpenAI client ready.


link to Documentation

[Chat Completions](https://developers.openai.com/api/reference/python/resources/chat/subresources/completions/methods/create)

[Responses API](https://developers.openai.com/api/reference/python/resources/responses/methods/create)

# Let's now go through responses API

| Feature             | Chat Completions API                             | Responses API                                              |
| ------------------- | ------------------------------------------------ | ---------------------------------------------------------- |
| **Endpoint**        | `client.chat.completions.create()`               | `client.responses.create()`                                |
| **Input format**    | `messages=[{"role": ..., "content": ...}]`       | `input=` (string or list of message dicts)                 |
| **System prompt**   | `{"role": "system", "content": ...}` in messages | `instructions=` parameter (top-level)                      |
| **Output access**   | `resp.choices[0].message.content`                | `resp.output_text`                                         |
| **Multi-turn**      | Manually pass full message history each time     | `previous_response_id=resp.id` (server-side context)       |
| **Developer role**  | Not supported (use `system`)                     | `{"role": "developer"}` for meta-instructions              |
| **Vision input**    | `{"type": "image_url", "image_url": {...}}`      | `{"type": "input_image", "image_url": ...}`                |
| **Reasoning / CoT** | Not natively supported                           | `reasoning={"effort": ..., "summary": ...}` built-in       |
| **Response object** | `ChatCompletion` with `choices[]` list           | `Response` with `output[]` list and `output_text` shortcut |
| **Streaming**       | `stream=True` yields `ChatCompletionChunk`       | `stream=True` yields server-sent events                    |
| **Tool calls**      | Supported via `tools` param                      | Supported via `tools` param (same)                         |
| **Model support**   | All chat models                                  | All chat models (newer, recommended going forward)         |


## Diff b/w Chat Completions and Responses API - Structure

In [9]:

# do the same with chat completions
resp = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
		{"role": "system", "content": "You are a friendly Python tutor."},
        {"role": "user", "content": "What is a list comprehension?"}
    ]
)
pretty_print("Chat Completions output:", resp.choices[0].message.content)


Chat Completions output: A list comprehension is a concise way to create a new
list by applying an expression to each item of an existing iterable (like a
list), with optional filtering.  Syntax (most common): - [expression for item in
iterable if condition]  Examples: - Squares of 0 through 9: [x*x for x in
range(10)] - Uppercase words in a list: [w.upper() for w in words] - Only
numbers greater than 5: [x for x in nums if x > 5] - Apply a transformation with
a filter: [f(x) for x in data if x is not None]  Nested example: - Pairs (i, j)
for i in 0..2 and j in 0..1: [(i, j) for i in range(3) for j in range(2)]  Key
points: - Creates a new list in a single, readable line. - Can include an
optional if clause to filter items. - Can include nested loops for more complex
results.  Tips: - Use when it stays readable; if it becomes too long or complex,
a regular loop may be clearer. - For memory efficiency on large data, consider a
generator expression (uses parentheses instead of brackets):

In [None]:

resp = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a friendly Python tutor.",
    input="What is a list comprehension?"
)
pretty_print("Responses API output:", resp.output_text)




Turn 1: A list comprehension is a concise way to create lists in Python. It
allows you to generate a new list by applying an expression to each item in an
existing iterable (like a list, tuple, or string) and optionally filtering items
based on a condition.  The basic syntax of a list comprehension is:  ```python
[expression for item in iterable if condition] ```  Here's a breakdown of the
components:  - **expression**: The value or operation you want to perform on
each item. - **item**: A variable that takes the value of each element in the
iterable. - **iterable**: The collection you are looping over (like a list). -
**condition** (optional): A filter that only includes items that satisfy the
condition.  ### Example  Suppose you want to create a list of squares of even
numbers from 0 to 9:  ```python squares_of_evens = [x**2 for x in range(10) if x
% 2 == 0] print(squares_of_evens) ```  This will output:  ``` [0, 4, 16, 36, 64]
```  In this example:  - `x**2` is the expression. - `x`

## Passing Multi Turn Conversation History to Responses API

In [18]:
input_messages = [
    {"role": "user", "content": "Why is Trump a jerk?"},
    {"role": "assistant", "content": "Some people are born that way."},
    {"role": "user", "content": "Name a celebrity who is not a jerk."}
]


resp = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a very candid Journalist.",
    input=input_messages
)
pretty_print("Responses API output:", resp.output_text)

Responses API output: Keanu Reeves. He’s widely described as humble, polite to
fans, and generous—often cited as one of Hollywood’s nicest people. If you want
more options, I can name a few others people rave about, like Tom Hanks or Dolly
Parton.


In [35]:
input_messages = [
    {"role": "user", "content": "Why is Trump a jerk?"},
    {"role": "assistant", "content": "Some people are born that way."},
    {"role": "user", "content": "Name a celebrity who is not a jerk."}
]


resp = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a very candid Journalist.",
    input=input_messages, 
	max_output_tokens=200,  # in responses there's max_output_tokens instead of max_tokens
	# temperature is NOT supported by reasoning models like gpt-5-nano
	reasoning={"effort": "minimal"},   
)
pretty_print("Responses API output:", resp.output_text)

Responses API output: That’s a tough one to certify—celebrity personas are often
carefully managed PR. If you’re asking for someone widely regarded as kind or
charitable, many people point to figures like Keanu Reeves or Dolly Parton for
their public generosity and low-key demeanor. But “not a jerk” is a hard claim
to prove, especially in the glare of fame.


In [32]:
resp = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a very candid journalist. Reply in 1 short sentence, starting with name of celeb.",
    input=input_messages,
    max_output_tokens=1000,
    reasoning={"effort": "high"},   # or "none"
    text={"verbosity": "low"},
)
print(repr(resp.output_text), resp.status, resp.incomplete_details)

'Keanu Reeves is widely regarded as one of the nicest celebrities.' completed None


In [27]:
resp

Response(id='resp_0b2b93f0aa718d5300699d25bc2fdc81939ed0d594b68744f3', created_at=1771906492.0, error=None, incomplete_details=IncompleteDetails(reason='max_output_tokens'), instructions='You are a very candid Journalist.', metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_0b2b93f0aa718d5300699d25bcb4688193b1cbb60478c1760c', summary=[], type='reasoning', content=None, encrypted_content=None, status=None)], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=200, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort='medium', generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='incomplete', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity='medium'), top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=49, input_t

## Key Differences: Parameters in Responses API vs Chat Completions API

### `max_tokens` → `max_output_tokens`
In the **Responses API**, the parameter to limit output length is **`max_output_tokens`**, NOT `max_tokens`.

```python
# Chat Completions API
client.chat.completions.create(model="gpt-5-nano", messages=..., max_tokens=50)

# Responses API
client.responses.create(model="gpt-5-nano", input=..., max_output_tokens=50)
```

---

### `temperature` — Not Supported on Reasoning Models (GPT-5 family)

The entire GPT-5 family (`gpt-5`, `gpt-5-mini`, `gpt-5-nano`) are **reasoning models**. They do **NOT** support `temperature` or `top_p`.

| Model Family | Type | `temperature` | `top_p` | `max_output_tokens` |
|---|---|---|---|---|
| **gpt-4o / gpt-5-nano** | Non-reasoning | ✅ Supported | ✅ Supported | ✅ Supported |
| **gpt-5 / gpt-5-mini / gpt-5-nano** | Reasoning | ❌ Not supported | ❌ Not supported | ✅ Supported |

---

### How to Influence Creativity in GPT-5 Reasoning Models

Since `temperature` is locked, you control creativity through:

**1. `reasoning.effort` parameter** — controls how deeply the model thinks:
- `"low"` → concise, more deterministic
- `"medium"` → balanced
- `"high"` → deeper reasoning, more elaborate and exploratory

```python
resp = client.responses.create(
    model="gpt-5-nano",
    reasoning={"effort": "high", "summary": "auto"},
    input="Write a creative poem about Python."
)
```

**2. Prompt Engineering** — steer creativity through instructions:
```python
resp = client.responses.create(
    model="gpt-5-nano",
    instructions="Be wildly creative. Use unexpected metaphors.",
    input="Write a poem about Python."
)
```

**Bottom line:** With GPT-5 models, creativity = `reasoning.effort` + prompt wording, not `temperature`.

# Refer to previous responses

## Store True

In [37]:
# Turn 1
message1 = 'What is a list comprehension?'


resp1 = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a friendly Python tutor.",
    input=message1,
	reasoning={"effort": "minimal"},   
    text={"verbosity": "low"}
)
pretty_print("Turn 1:", resp1.output_text)


Turn 1: A list comprehension is a compact way to create a new list by applying
an expression to each item in an existing iterable (like a list, tuple, or
range) and optionally filtering items with a condition.  Basic syntax: -
[expression for item in iterable] - [expression for item in iterable if
condition] (includes only items that meet the condition)  Examples: - Squares of
numbers 0–9: [x*x for x in range(10)] - Even numbers from a list: [n for n in
my_list if n % 2 == 0] - Convert strings to uppercase: [s.upper() for s in
words]  Benefits: - Shorter and more readable than equivalent loops - Often
faster due to Python’s optimizations - Can include nested loops for multi-
dimensional data  If you want, share a specific task and I’ll show a list
comprehension for it.


In [44]:

# Turn 2 — just pass previous_response_id, no history needed!
resp2 = client.responses.create(
    model="gpt-5-nano",
    input="Can you give me an example?",
    previous_response_id=resp1.id,  # <-- this is the magic
	reasoning={"effort": "minimal"},   
    text={"verbosity": "low"}
)
pretty_print("Turn 2:", resp2.output_text)


print()
print()
#or 


input_messages = [
    {"role": "user", "content": "What is a list comprehension?"},
    {"role": "assistant", "content": resp1.output_text},
    {"role": "user", "content": "Can you give me an example?"}
]

resp2_1 = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a friendly Python tutor.",
    input=input_messages,
    reasoning={"effort": "minimal"},
    text={"verbosity": "low"}
)
pretty_print("Turn 2_1:", resp2_1.output_text)

Turn 2: Sure. Here are a few examples:  - Squares of numbers 0–9:   [x*x for x
in range(10)]  - Even numbers from a list:   [n for n in my_list if n % 2 == 0]
- Convert a list of strings to uppercase:   [s.upper() for s in words]  -
Flatten a 2D list (matrix) into a 1D list:   [item for row in matrix for item in
row]  - Get lengths of strings in a list:   [len(s) for s in strings]  If you
have a specific task, tell me and I’ll tailor a list comprehension for it.


Turn 2_1: Sure! Here are a few simple examples:  1) Squares of numbers 0–9 -
[x*x for x in range(10)] - Result: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]  2) Even
numbers from an existing list - [n for n in my_list if n % 2 == 0]  3) Uppercase
words from a list - [w.upper() for w in words]  4) Flatten a 2D list - [[a for a
in row] for row in matrix]  (or: [elem for row in matrix for elem in row])  If
you have a specific task, share it and I’ll tailor a list comprehension.


Why GPT-5 Models Give Different Outputs Each Time

- **No temperature control** — you can't set it to 0
- **Reasoning process is inherently non-deterministic** — the internal chain-of-thought exploration can branch differently each run, even with the same input
- **`reasoning.effort` is NOT the same as `temperature`** — "minimal" means "think less", not "be deterministic".


if you truly need deterministic outputs, use a non-reasoning model like gpt-4o-mini with temperature=0

In [40]:

# Turn 3 — chains from turn 2 (which already includes turn 1)
resp3 = client.responses.create(
    model="gpt-5-nano",
    input="What was my first question?",
    previous_response_id=resp2.id,
	reasoning={"effort": "minimal"},   
    text={"verbosity": "low"}
)
pretty_print("Turn 3:", resp3.output_text)

Turn 3: Your first question was: "What is a list comprehension?"


## Store False

In [41]:
# Turn 1
resp4 = client.responses.create(
    model="gpt-5-nano",
    instructions="You are a friendly Python tutor.",
    input="What is a list comprehension?",
	reasoning={"effort": "minimal"},   
    text={"verbosity": "low"},
	store=False
)
pretty_print("Turn 4:", resp4.output_text)

Turn 4: A list comprehension is a concise way to create a new list by applying
an expression to each item in an iterable (like a list) and optionally filtering
items with a condition. It’s written in one line inside brackets.  Syntax
examples: - Simple: [x * 2 for x in range(5)]  # [0, 2, 4, 6, 8] - With a
filter: [x for x in range(10) if x % 2 == 0]  # [0, 2, 4, 6, 8]  Benefits:
shorter code, often faster, and easy to read once you’re familiar with the
pattern.


In [43]:

# Turn 5 — chains from turn 4 

try:
    resp5 = client.responses.create(
        model="gpt-5-nano",
        input="What was my first question?",
        previous_response_id=resp4.id,
        reasoning={"effort": "minimal"},   
        text={"verbosity": "low"}
    )
    pretty_print("Turn 5:", resp5.output_text)
except Exception as e:
    pretty_print("Error creating response:", str(e))

Error creating response: Error code: 400 - {'error': {'message': "Previous
response with id 'resp_0af9b79078f6f9eb01699d5d81b0788199b3a5a67f9eca7a47' not
found.", 'type': 'invalid_request_error', 'param': 'previous_response_id',
'code': 'previous_response_not_found'}}


# Nature of Instruction Prompt

## Scenario 1

In [48]:

# Turn 1: internal triage note that your backend stores in Zendesk as JSON
r1 = client.responses.create(
    model="gpt-5-nano",
    instructions="You are an internal support triage bot. Return only valid JSON with keys severity,suspected_causes,next_questions and do not write any customer-facing text.",
    input="A customer reports webhook deliveries started retrying heavily since 10:42 UTC and they see 502 errors from our endpoint on the Pro plan."
)

print("TURN 1:\n", r1.output_text)


TURN 1:
 {
  "severity": "critical",
  "suspected_causes": [
    "Webhooks processing service or gateway outage or crash resulting in 502 responses",
    "Recent deployment or configuration change around 10:42 UTC affecting the webhook path",
    "Networking/DNS issues between the webhook service and the downstream endpoint",
    "TLS/SSL handshake or certificate problems at the edge or with the target endpoint",
    "Degraded/upstream dependency (e.g., database, queue, or external service) causing timeouts",
    "Webhook processing backlog or rate limiting leading to gateway timeouts",
    "Endpoint-specific misconfiguration (e.g., IP allowlist, firewall rules) causing 502 responses"
  ],
  "next_questions": [
    "How many webhook deliveries failed since 10:42 UTC and what are their timestamps?",
    "What exact HTTP status codes have been observed (primarily 502) and any accompanying error messages or request IDs?",
    "Can you provide sample delivery_id, target URL, and region to 

In [49]:

# Turn 2: continue the chain, but DON'T pass instructions again
# Ask for customer-facing email (this conflicts with Turn 1 rules)
r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Now write a customer-facing email reply: apologize, explain what we’re checking, and ask for 2 specific details. Plain English, not JSON."
)

print("\nTURN 2:\n", r2.output_text)


TURN 2:
 Subject: We’re investigating the webhook 502 errors on your Pro plan

Hi there,

We’re really sorry you’re seeing webhook deliveries retrying and 502 errors since 10:42 UTC. I know how disruptive this can be, and we’re treating it as a top priority.

What we’re checking now
- Webhook gateway health and any deployments or config changes around 10:42 UTC.
- Connectivity from our service to your endpoint, including DNS, TLS/SSL, and any intermediate proxies or firewalls.
- Any backlog or rate limiting on our side and whether downstream services are timing out.
- Logs for deliveries and error responses to understand scope and pattern.

We’ll keep you updated as we learn more.

To help us troubleshoot quickly, could you please share two details:
1) A sample of delivery IDs and their timestamps for the first few failing deliveries (and the corresponding target URL and region if possible).
2) The exact target URL(s) that are failing and the region from which your endpoints are being

## Scenario 2

In [50]:


r1 = client.responses.create(
    model="gpt-5-nano",
    store=True,
    instructions="Output only a Markdown table with columns Category,Score(1-5),Evidence and no text outside the table.",
    input="Interview notes say the candidate built an end-to-end RAG demo on Azure, has strong system design, is weaker on fundamentals like precision/recall, communicates clearly, and gets defensive on feedback."
)

print("TURN 1:\n", r1.output_text)


TURN 1:
 | Category | Score(1-5) | Evidence |
|---|---|---|
| Technical Depth - RAG on Azure | 4 | Built an end-to-end RAG demo on Azure. |
| System Design | 5 | Has strong system design. |
| Fundamentals (Precision/Recall) | 2 | Weaker on fundamentals like precision/recall. |
| Communication | 5 | Communicates clearly. |
| Coachability/Feedback | 2 | Gets defensive on feedback. |


In [51]:

r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Now draft a polite rejection email in 6-8 sentences with a warm tone and do not include any table."
)

print("\nTURN 2:\n", r2.output_text)


TURN 2:
 Subject: Thank you for your time

Hi [Candidate Name],

Thank you for taking the time to interview for the [Role] position with us. We were impressed by your end-to-end RAG work on Azure and your strong system design. After careful consideration, we have decided not to move forward with your candidacy for this role. This decision reflects the specific needs of this position rather than your abilities overall. We also appreciated your clear and thoughtful communication throughout the process. We would be glad to keep your resume on file and reach out if another opportunity that matches your strengths arises. If you’d like, we can share some resources or brief feedback to support your ongoing job search.

Best regards,
[Your Name]


# Developer Role and Meta Instructions

## Scenario 1

In [52]:
r1 = client.responses.create(
    model="gpt-5-nano",
    store=True,
    input=[
        {"role": "developer", "content": "You are an internal support triage assistant and you must always output only valid JSON with keys severity,suspected_causes,next_questions and never produce customer-facing prose."},
        {"role": "user", "content": "Customer reports webhook deliveries started retrying heavily since 10:42 UTC and they see 502 errors from our endpoint on the Pro plan."}
    ],
)

print("TURN 1:\n", r1.output_text)


TURN 1:
 {
  "severity": "critical",
  "suspected_causes": [
    "Backend webhook receiver is unhealthy or returning 502 due to an upstream error (crash, high latency, or resource contention).",
    "Recent deployment or configuration change affecting the endpoint (routing, load balancer, or proxy misconfiguration).",
    "Upstream dependency failure (database, cache, external API) causing backend to respond with 502.",
    "Network/DNS issues or TLS termination problems between webhook service and the endpoint.",
    "Traffic spike or resource exhaustion on the Pro plan leading to timeouts or upstream errors."
  ],
  "next_questions": [
    "Exact timeframe of impact: did it begin at 10:42 UTC and is it continuous or intermittent since then?",
    "Are all deliveries failing with 502, or only certain endpoints/events? Any successes?",
    "Do you have delivery IDs, event IDs, or correlation IDs for failed webhooks to provide?",
    "Have there been any recent deployments, config chang

In [None]:

r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Now write a customer-facing email apology explaining what we’re checking and ask for exactly two specific details in plain English.",
    reasoning={"effort": "minimal"},   
    text={"verbosity": "low"}
)

print("\nTURN 2:\n", r2.output_text)


TURN 2:
 {
  "severity": "critical",
  "suspected_causes": [
    "Backend webhook receiver is unhealthy or returning 502 due to an upstream error (crash, high latency, or resource contention).",
    "Recent deployment or configuration change affecting the endpoint (routing, load balancer, or proxy misconfiguration).",
    "Upstream dependency failure (database, cache, external API) causing backend to respond with 502.",
    "Network/DNS issues or TLS termination problems between webhook service and the endpoint.",
    "Traffic spike or resource exhaustion on the Pro plan leading to timeouts or upstream errors."
  ],
  "next_questions": [
    "What is the exact UTC start time when the issue began (for example, 10:42 UTC)?",
    "Has the issue been continuous since then, or has it been intermittent (please describe any observable pattern)?"
  ]
}


## Scenario 2

In [54]:
r1 = client.responses.create(
    model="gpt-5-nano",
    store=True,
    input=[
        {"role": "developer", "content": "You are an interviewer note assistant and you must always output only a Markdown table with columns Category,Score(1-5),Evidence and no text outside the table."},
        {"role": "user", "content": "Interview notes: candidate built an end-to-end RAG demo on Azure, strong system design, confused precision vs recall, communicates clearly, slightly defensive on feedback."}
    ],
)
print("TURN 1:\n", r1.output_text)

In [55]:



r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Now draft a polite rejection email in 6-8 sentences with a warm tone.",
)

print("\nTURN 2:\n", r2.output_text)

TURN 1:
 | Category | Score(1-5) | Evidence |
| --- | ---: | --- |
| Technical Skill: End-to-end RAG on Azure | 5 | Built an end-to-end RAG demo on Azure. |
| System Design | 5 | Strong system design. |
| Understanding of Metrics (Precision vs Recall) | 2 | Confused precision vs recall. |
| Communication | 4 | Communicates clearly. |
| Feedback Receptiveness | 3 | Slightly defensive on feedback. |

TURN 2:
 | Category | Score(1-5) | Evidence |
| --- | ---: | --- |
| Technical Skill: End-to-end RAG on Azure | 5 | Built an end-to-end RAG demo on Azure. |
| System Design | 5 | Strong system design. |
| Understanding of Metrics (Precision vs Recall) | 2 | Confused precision vs recall. |
| Communication | 4 | Communicates clearly. |
| Feedback Receptiveness | 3 | Slightly defensive on feedback. |
| Hiring Decision | 3 | Strong technical skills; concerns about receptiveness to feedback. |


In [56]:



r2_1 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Now draft a polite rejection email in 6-8 sentences with a warm tone and I want reply in simple text, no fancy tables or nothing, got it? . ",
)

print("\nTURN 2:\n", r2_1.output_text)


TURN 2:
 | Category | Score(1-5) | Evidence |
| --- | ---: | --- |
| Technical Skill: End-to-end RAG on Azure | 5 | Built an end-to-end RAG demo on Azure. |
| System Design | 5 | Strong system design. |
| Understanding of Metrics (Precision vs Recall) | 2 | Confused precision vs recall. |
| Communication | 4 | Communicates clearly. |
| Feedback Receptiveness | 3 | Slightly defensive on feedback. |
| Hiring Decision | 3 | Strong technical skills; concerns about receptiveness to feedback. |


In [58]:
r2_1 

Response(id='resp_0df9cc8e357bb78a00699d634c3b748190accbe063d5088ff0', created_at=1771922252.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_0df9cc8e357bb78a00699d634ca5fc819090eaf8e853972d44', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_0df9cc8e357bb78a00699d635fc0d4819091df080029a69b21', content=[ResponseOutputText(annotations=[], text='| Category | Score(1-5) | Evidence |\n| --- | ---: | --- |\n| Overall Impression | 3 | Solid technical skills and clear communication; some concerns on metrics clarity and receptiveness to feedback. |\n| Technical Skill (RAG & Azure) | 4 | Built an end-to-end RAG demo on Azure; strong system design. |\n| System Design | 5 | Demonstrates strong, scalable design thinking. |\n| Metrics Understanding (Precision vs Recall) | 2 | Confused between precision and recall; needs clarifi

# Let's understand with examples how  instructions and developer prompt combine in real world use case.

## Scenario 1: Helpdesk ticket pipeline (route the ticket → reply to customer)

You want the assistant to always obey company policy (developer), but you sometimes need strict JSON for automation and sometimes a human email for the customer. If you put “JSON-only” in developer, you’d break the email step; if you put policy in instructions, you must resend it every call.

In [62]:
r1 = client.responses.create(
    model="gpt-5-nano",
    input=[
        {"role": "developer", "content": "You are ACME Support; never invent account-specific facts; if info is missing ask at most two clarifying questions; do not reveal internal policies or tools."},
        {"role": "user", "content": "Customer: Since 10:42 UTC our webhooks keep retrying and we see lots of 502s; started after we enabled v2 signing; impact is EU customers."},
    ],
    instructions="Return only valid JSON with keys queue, severity, suspected_component, next_questions.",
)

print(r1.output_text)

{
  "queue": "webhooks-delivery",
  "severity": "high",
  "suspected_component": "Webhook delivery service (v2 signature validation)",
  "next_questions": [
    "Could you share the webhook delivery IDs or logs (e.g., Delivery-ID or x-request-id) for the 502 responses to help us trace the failing deliveries and identify affected EU endpoints?",
    "Did the issue start exactly at 10:42 UTC when enabling v2 signing, and have you tested with v1 or adjusted signature validation to verify whether the problem is tied to v2 signing?"
  ]
}


In [63]:
r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Write a customer-facing email that acknowledges impact, says what we’re checking, and asks exactly two specific questions.",
    instructions="Write plain English email text and do not output JSON.",
)

print(r2.output_text)

Subject: Update on webhook failures affecting EU customers

Hello,

We’re sorry for the impact this is having on your EU customers. Since around 10:42 UTC, webhook deliveries have been retrying with 502 responses after enabling v2 signing. We’re actively investigating.

What we’re checking:
- Webhook delivery path and recent changes related to v2 signature validation.
- Endpoint routing and EU-region delivery to identify affected deliveries and verify whether the issue is isolated to certain endpoints.

Two questions for you:
- Could you share the webhook delivery IDs or logs (Delivery-ID or x-request-id) for the 502 responses to help us trace the failing deliveries and identify affected EU endpoints?
- Did the issue start exactly at 10:42 UTC when enabling v2 signing, and have you tested with v1 or adjusted signature validation to verify whether the problem is tied to v2 signing?

Best regards,
ACME Support


What this demonstrates in practice:

1. The developer policy stays in effect across both calls (because it’s part of the thread).

2. The instructions cleanly switch “mode” (JSON → email) because instructions are per-call and aren’t carried forward when you use previous_response_id.

## Scenario 2: One DB assistant, multiple “surfaces” (Slack answer → Jira incident update → internal runbook)

Why you need both:
Same knowledge + rules, but each surface needs a different output contract (short Slack reply, structured Jira update, detailed runbook). You don’t want those formatting rules to permanently pollute the thread as more developer messages; you want them to be ephemeral and swapped per action.

In [64]:
r1 = client.responses.create(
    model="gpt-5-nano",
    input=[
        {"role": "developer", "content": "You are ACME Incident Assistant; do not guess unknown facts; if unsure say what you need; keep recommendations actionable; do not expose internal-only details."},
        {"role": "user", "content": "We’re seeing intermittent payment failures in EU; gateway 502 spike started 08:12 UTC; failover reduced it but not fully."},
    ],
    instructions="Output as a Slack message with at most 6 lines and include one short checklist.",
)

print(r1.output_text)

Incident snapshot: Intermittent EU payment failures; 502 spike on gateway started 08:12 UTC; failover reduced but not fully resolved.

What we need to confirm now: which component is returning 502 (edge gateway vs downstream processor); region impact scope.

Data to collect: latest 502 counts, latency, upstream errors, end-to-end traces, and failover path health (08:12–now).

Immediate actions: pull gateway logs, verify load balancer and DNS config, check upstream provider status/outages.

Potential mitigations: ensure failover path is active, consider diverting to a secondary gateway, review retry/backoff settings.

Checklist:
- Gather current impact metrics
- Confirm 502 origin
- Check failover health
- Check upstream/provider status


In [65]:
r2 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r1.id,
    input="Convert this into a Jira incident update.",
    instructions="Return only JSON with keys summary, customer_impact, current_status, next_actions.",
)

print(r2.output_text)

{
  "summary": "EU payment failures: 502 gateway spike since 08:12 UTC; failover partially effective",
  "customer_impact": "Intermittent payment processing in the EU region; some transactions fail with 502 errors, potentially impacting checkout and revenue; customers may experience payment disruption.",
  "current_status": "Status: Investigating intermittent 502 errors in EU payments. Gateway spike started 08:12 UTC; failover provides partial mitigation but full resolution not yet achieved. No confirmed root cause; data collection in progress to identify origin and scope.",
  "next_actions": [
    "Collect current impact metrics: 502 incidence, latency, upstream errors, and end-to-end traces.",
    "Determine 502 origin: edge gateway vs downstream processor.",
    "Validate failover path health: verify load balancer configuration, DNS routing, and failover status.",
    "Check upstream/provider status pages and incident reports; reach out to providers if outages are suspected.",
    "

In [None]:
r3 = client.responses.create(
    model="gpt-5-nano",
    previous_response_id=r2.id,
    input="Now write the on-call runbook section for investigating this failure pattern.",
    instructions="Write Markdown with headings and include example commands and what signals to look for.",
)

print(r3.output_text)

![Illustration](https://github.com/shivam13juna/language_model_api_v2/blob/main/llm_multi_modality/instruction_vs_developer.png)


# How to handle images

In [None]:

import base64

# Read and encode the image file
image_path = '/Users/shivam13juna/Documents/scaler/iitr_classes/llm_ref/llm_conversations/ss.jpeg'
with open(image_path, 'rb') as img_file:
    image_data = base64.standard_b64encode(img_file.read()).decode('utf-8')

# Use chat.completions.create with vision
vision_response = client.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {
            "role": "system",
            "content": "You are an art critic who provides gentle feedback for children's illustrations."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Analyze this magical  illustration."
                },
                {
                    "type": "image_url",
                    #"image_url": {
                    #    "url": f"data:image/jpeg;base64,{image_data}"
                    #}
                    "image_url": {
                        "url": "https://www.animesenpai.net/wp-content/uploads/2023/12/sef-min.png.webp"
                    }
                }
            ]
        }
    ],
    temperature=0.5
)

pretty_print("\n=== Vision Analysis ===")
pretty_print(vision_response.choices[0].message.content)


=== Vision Analysis ===
This illustration features a striking and dynamic character design that evokes a sense of mystery and power. The character's skeletal features, combined with a muscular build, create an intriguing contrast between strength and an ethereal quality. The use of dark tones and shadows adds to the overall atmosphere, enhancing the sense of drama.

The flowing lines of the character’s limbs and the wispy elements around them suggest movement and fluidity, which can draw the viewer's eye across the composition. The background, with its blurred lights, hints at a larger world beyond the character, adding depth to the scene.

For improvement, consider incorporating more color variety to enhance visual interest. Adding highlights or contrasting colors could help to emphasize certain features and create a more vibrant atmosphere. Additionally, exploring different facial expressions or poses could further convey the character's personality and intentions.

Overall, this il

In [67]:
# Use responses API for vision analysis
response_vision = client.responses.create(
    model="gpt-5-nano",
    instructions="You are an art critic who provides gentle feedback for children's illustrations.",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Analyze this magical illustration."
                },
                {
                    "type": "input_image",
                    #"image_url": f"data:image/jpeg;base64,{image_data}"
					"image_url": "https://www.animesenpai.net/wp-content/uploads/2023/12/sef-min.png.webp"
                }
            ]
        }
    ]
)

pretty_print("\n=== Vision Analysis (Responses API) ===")
pretty_print(response_vision.output_text)

 === Vision Analysis (Responses API) ===
What a striking, magical presence this illustration has. It reads as a powerful,
otherworldly guardian emerging from the mist.  What works well - Mood and
atmosphere: The cool blue-gray palette with fog and soft bokeh lights creates a
mysterious, dreamlike setting that feels magical rather than ordinary. -
Silhouette and anatomy: The strong, muscular upper body and the rib-like chest
design give the creature a dramatic, heroic quality. The curved lines of the
arms, tail, and shoulder accents add a sense of motion and otherworldly grace. -
Contrast and depth: The dark background against the lighter figure helps it pop,
while the fog layers add depth and a sense of scale. - Details that hint at
story: The jagged crest on the head, the spiky appendages, and the tail all
suggest a fantasy creature with a history, inviting curiosity about its world.
What could be considered (gentle tweaks for a more child-friendly context) -
Expression and approachab

In [None]:
resp = client.responses.create(
    model="gpt-5-nano",
    reasoning={"effort": "medium", "summary": "auto"},  # "low" | "medium" | "high"
    input="Solve carefully: If a train goes 60 km/h for 2.5 hours, how far?"
)

resp

Response(id='resp_0d5e63a19a26c312006999917c86708190b2c7a1b7fb3f465f', created_at=1771671932.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_0d5e63a19a26c312006999917d22a0819083a676f0774c4d8d', summary=[Summary(text='**Calculating distance with care**\n\nI need to respond to the user about how far a train travels at 60 km/h for 2.5 hours. The formula is distance = speed × time. So, I calculate: 60 km/h × 2.5 h = 150 km. It\'s important to show the calculation for clarity. The user emphasizes, "Solve carefully," so I’ll be succinct and include this formula clearly. I’ll finalize it with: The answer is 150 kilometers, assuming constant speed.', type='summary_text')], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_0d5e63a19a26c312006999918127808190b0bd779a3a829e1a', content=[ResponseOutputText(annotations=[], text='Distance = s

In [None]:
# pretty_print the response and usage details
pretty_print("Response:", resp.output_text)
pretty_print("Usage:", resp.usage)
pretty_print("Reasoning details:", resp.reasoning)
# resp.reasoning = config (effort, summary mode)
# The actual CoT summary is in the output items
for item in resp.output:
    if item.type == "reasoning":
        for s in item.summary:
            pretty_print("Chain of Thought:", s.text)

Response: Distance = speed × time = 60 km/h × 2.5 h = 150 km.

Assuming the train maintains a constant speed.
Usage: ResponseUsage(input_tokens=27, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=264, output_tokens_details=OutputTokensDetails(reasoning_tokens=192), total_tokens=291)
Reasoning details: Reasoning(effort='medium', generate_summary=None, summary='detailed')
Chain of Thought: **Calculating distance with care**

I need to respond to the user about how far a train travels at 60 km/h for 2.5 hours. The formula is distance = speed × time. So, I calculate: 60 km/h × 2.5 h = 150 km. It's important to show the calculation for clarity. The user emphasizes, "Solve carefully," so I’ll be succinct and include this formula clearly. I’ll finalize it with: The answer is 150 kilometers, assuming constant speed.


# Some Bonus Content

## Image Generation

```python
from openai import OpenAI
import os
import requests

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

prompt = "A friendly robot tutor teaching Python in a bright classroom, cartoon style"

result = client.images.generate(
    model="gpt-image-1",
    prompt=prompt,
    size="512x512",
    quality="standard",
    n=1
)

image_url = result.data[0].url
print("Image URL:", image_url)

# Optional: Save locally
img_data = requests.get(image_url).content
with open("robot_tutor.png", "wb") as f:
    f.write(img_data)



## Text to Speech

```python

from pathlib import Path

speech_file_path = Path("tutor_voice.mp3")

with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="alloy",  # e.g. alloy, verse, aria (varies)
    input="Hello students! Today we will explore AI that can see, listen, and speak."
) as response:
    response.stream_to_file(speech_file_path)

print("Saved speech to:", speech_file_path)



## Speech to Text

```python

from pathlib import Path

audio_file_path = Path("student_question.mp3")

with client.audio.transcriptions.create(
    model="gpt-4o-transcribe",
    file=open(audio_file_path, "rb")
) as transcription:
    print("Transcribed text:", transcription.text)



### Speech to Text with response

```python

# Step 1: Transcribe
with client.audio.transcriptions.create(
    model="gpt-4o-transcribe",
    file=open(audio_file_path, "rb")
) as transcription:
    user_text = transcription.text

# Step 2: Feed into Chat API
messages = [{"role": "system", "content": "You are a helpful multimodal tutor."},
            {"role": "user", "content": user_text}]

resp = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
print("Assistant:", resp.choices[0].message.content)



## Visual Reasoning with GROQ

```python

from groq import Groq
import os

groq_client = Groq(api_key=os.getenv("GROQ_API_KEY"))

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Example.png/320px-Example.png"

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image in detail."},
            {"type": "image_url", "image_url": {"url": image_url}}
        ]
    }
]

resp = groq_client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=messages
)

print("Image analysis:", resp.choices[0].message.content)



## PlayHT

```python
import requests

PLAYHT_API_KEY = os.getenv("PLAYHT_API_KEY")
PLAYHT_USER_ID = os.getenv("PLAYHT_USER_ID")

url = "https://api.play.ht/api/v2/tts"
headers = {
    "Authorization": f"Bearer {PLAYHT_API_KEY}",
    "X-User-Id": PLAYHT_USER_ID,
    "Content-Type": "application/json"
}

payload = {
    "voice": "en_us_male_1",
    "content": ["Hello! I can speak with a PlayHT voice."],
    "format": "mp3"
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())  # Contains URL to generated audio