# OpenAI Responses API

Welcome to AI Makerspace! This notebook demonstrates the powerful new **OpenAI Responses API**, which provides a streamlined interface for working with OpenAI's language models without the complexity of tool calling.

The Responses API offers several key advantages:
- **Simplified interface** - Direct text generation without tool schemas
- **Built-in reasoning controls** - Adjust effort levels for different use cases  
- **Structured output parsing** - Native Pydantic model support
- **Multimodal capabilities** - Text and image inputs in a single request
- **Streaming support** - Real-time response generation
- **Developer instructions** - System-level guidance separate from user content

This notebook walks through the core features with practical examples, showing how you can integrate this API into your AI applications for cleaner, more maintainable code.


### Setup and Authentication

First, we need to set up our OpenAI API credentials. We'll use `getpass` to securely input the API key without exposing it in the notebook.


Now we'll initialize the OpenAI client using our API key. This client will be used for all subsequent API calls.

In [12]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

OpenAI API Key: ········


In [13]:
from openai import OpenAI
client = OpenAI()

### Basic Response Generation

Let's start with a simple example using the new `responses.create()` method. This is the core interface for the Responses API - notice how clean and straightforward it is compared to the traditional chat completions API.


In [16]:
response = client.responses.create(
    model="gpt-5",
    input="Define what 'AI Engineering' is."
)

print(response.output_text)

AI engineering is the discipline of designing, building, deploying, and operating AI-enabled systems using rigorous software and systems engineering practices so they work reliably, safely, and at scale in real-world contexts.

It blends machine learning, data engineering, and software engineering to manage the full lifecycle of AI, including:
- Data management: collection, labeling, quality, lineage, and governance
- Model development: training, evaluation, experiment tracking, and reproducibility
- Infrastructure: training/serving platforms, performance, cost, and scalability
- MLOps: pipelines, CI/CD for models, monitoring, alerts, rollback, and drift handling
- Safety and trust: robustness, security, privacy, fairness, interpretability, and compliance
- Human-in-the-loop and UX: integrating AI with user workflows and oversight
- Product integration: APIs, services, edge/cloud deployment, and architecture
- Lifecycle governance: versioning, documentation, audits, and incident respon

### Reasoning Control and Instructions

One of the powerful features of the Responses API is the ability to control the reasoning effort and provide developer instructions. Here we're using:

- `reasoning={"effort": "low"}` - Controls how much computational effort the model puts into reasoning
- `instructions` - Developer-level instructions that guide the model's behavior (separate from user content)


In [17]:
response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    instructions="Talk like a manager.",
    input="How to write an efficient loop with NumPy?",
)

print(response.output_text)

Short answer: in NumPy, the most efficient “loop” is no Python loop at all. You push the work into vectorized, compiled NumPy operations and BLAS calls. Here’s how to do that in practice.

What to do instead of loops
- Use vectorized ufuncs for elementwise math.
  - Example: y = a*x + b becomes y = a * x + b (x is an array).
- Use broadcasting to align shapes without copying.
  - Example: add a column vector to every row: A + b[None, :]
- Use reductions and aggregations provided by NumPy.
  - sum, mean, min/max, argmin/argmax, cumsum, nansum, etc.
- Prefer linear algebra operations that call BLAS/LAPACK.
  - A @ B, np.dot, np.linalg.solve, np.linalg.norm, np.einsum/np.tensordot.
- Use boolean masks instead of looping over conditions.
  - y = np.where(x > 0, x, 0); A[A < 0] = 0
- Use advanced indexing rather than per-element assignments.
  - A[idx_rows[:, None], idx_cols] = values
- For sliding windows, avoid loops with stride tricks.
  - windows = np.lib.stride_tricks.sliding_window_vi

### Message-Based Input Format

The Responses API also supports the familiar message format with roles. Here we demonstrate:

- Using `input` as a list of message objects instead of a simple string
- `developer` role - A new role type for system-level instructions
- `user` role - Standard user input

This approach gives you more granular control over the conversation structure while maintaining the simplified API interface.

> NOTE: This should be *roughly* equivalent to the above cell - since we're using the same "instructions" in our `developer` role.


In [18]:
response = client.responses.create(
    model="gpt-5",
    reasoning={"effort": "low"},
    input=[
        {
            "role": "developer",
            "content": "Talk like a wizard."
        },
        {
            "role": "user",
            "content": "How to write an efficient loop with NumPy?"
        }
    ]
)

print(response.output_text)

Ah, seeker of speed, heed these arcane runes: with NumPy, the mightiest loop is oft no loop at all. Invoke the powers of vectorization and broadcasting, and your spells shall fly.

Principles of efficiency
- Prefer whole-array operations (vectorization) over Python for-loops.
- Use broadcasting to combine arrays without explicit iteration.
- Wield ufuncs (universal functions) and reductions (sum, mean, min, max) along axes.
- Preallocate arrays; shun repeated append/concatenate within loops.
- Favor in-place updates (out=..., arr += ...) to spare memory.
- Keep dtypes numeric (no object), and arrays contiguous when possible.
- If a Python loop is truly needed, consider Numba to JIT-compile it.

Common patterns and their incantations
1) Elementwise transformations
- Slow loop:
  for i in range(n): y[i] = a*x[i] + b
- Fast vectorization:
  y = a*x + b

2) Conditional logic
- if/else per element:
  y = np.where(x > 0, np.log1p(x), 0.0)
- Boolean masks:
  y = np.empty_like(x)
  mask = x > 

### Structured Output with Pydantic

One of the most exciting features is native structured output parsing using Pydantic models. Instead of trying to parse JSON from text responses, the API can directly return structured data.

Key features:
- `responses.parse()` method for structured output
- `text_format` parameter accepts Pydantic models
- Automatic validation and type checking
- Clean, typed responses that integrate seamlessly with Python applications


In [9]:
from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = client.responses.parse(
    model="gpt-5",
    input=[
        {
            "role": "developer",
            "content": "Extract the event information."
        },
        {
            "role": "user",
            "content": "Alice and Bob are going to the AI Engineering Bootcamp kickoff on September 9th at 7PM. Registration closes Tuesday (9/9) at Noon EDT.",
        },
    ],
    text_format=CalendarEvent,
)

print(response.output_parsed)

name='AI Engineering Bootcamp kickoff' date='September 9 at 7:00 PM' participants=['Alice', 'Bob']


### Multimodal Input - Text and Images

The Responses API provides excellent support for multimodal inputs, allowing you to combine text and images in a single request. This example shows:

- `input_text` and `input_image` content types
- Direct image URL support
- Unified processing of text and visual information

This makes it easy to build applications that need to analyze images, documents, or other visual content alongside text instructions.


In [10]:
response = client.responses.create(
    model="gpt-5",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Describe the visual appearance of the people in this image.",
                },
                {
                    "type": "input_image",
                    "image_url": "https://d2426xcxuh3ht5.cloudfront.net/rnDUeTj9Sqysz9tCeGdB_AIE_Cohort_7_Banner.jpg"
                }
            ]
        }
    ]
)

print(response.output_text)

- Left person: Light-skinned man, smiling with teeth showing. He wears a black hoodie and a black-and-white patterned baseball cap facing forward. Short facial hair/stubble. He’s outlined with a pink glow.
- Right person: Light-skinned man with a short goatee and mustache, hair pulled back. He’s smiling softly with arms crossed. He wears a blue-and-white tie-dye style polo shirt and a dark smartwatch on his left wrist. He’s outlined with a blue glow.


### Streaming Responses with Structured Output

For real-time applications, the Responses API supports streaming even with structured output. This example demonstrates:

- `responses.stream()` context manager for streaming
- Event-based streaming with different event types:
  - `response.output_text.delta` - Incremental text updates
  - `response.refusal.delta` - Safety refusal messages  
  - `response.error` - Error handling
  - `response.completed` - Stream completion
- `get_final_response()` to retrieve the complete parsed result
- Structured output streaming with Pydantic models

This is perfect for building responsive UIs that show progress while maintaining type safety.


In [11]:
from typing import List

class EntitiesModel(BaseModel):
    attributes: List[str]
    colors: List[str]
    animals: List[str]

with client.responses.stream(
    model="gpt-5-mini",
    input=[
        {"role": "system", "content": "Extract entities from the input text"},
        {
            "role": "user",
            "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
        },
    ],
    text_format=EntitiesModel,
) as stream:
    for event in stream:
        if event.type == "response.refusal.delta":
            print(event.delta, end="")
        elif event.type == "response.output_text.delta":
            print(event.delta, end="")
        elif event.type == "response.error":
            print(event.error, end="")
        elif event.type == "response.completed":
            print("Completed")
            # print(event.response.output)

    final_response = stream.get_final_response()
    print(final_response)

BadRequestError: Error code: 400 - {'error': {'message': 'Your organization must be verified to stream this model. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate.', 'type': 'invalid_request_error', 'param': 'stream', 'code': 'unsupported_value'}}

## Summary

The OpenAI Responses API represents a significant step forward in making AI integration simpler and more powerful. Key takeaways:

✅ **Cleaner code** - Less boilerplate, more focus on your application logic  
✅ **Built-in structure** - Native Pydantic support eliminates JSON parsing headaches  
✅ **Flexible control** - Fine-tune reasoning effort and output style  
✅ **Multimodal ready** - Text and image inputs work seamlessly together  
✅ **Production ready** - Streaming support for responsive applications  

---

### Advanced Configuration Options

The final example showcases some advanced configuration options:

- `reasoning={"effort": "minimal"}` - Even lower computational effort for simple tasks
- `text={"verbosity": "low"}` - Control output verbosity for concise responses
- Fine-tuning the balance between speed, cost, and output quality

This final configuration will give you the closest possible performance to GPT-4o, just in case you miss the old model.

In [None]:
result = client.responses.create(
    model="gpt-5",
    input="Write a haiku about code.",
    reasoning={ "effort": "minimal" },
    text={ "verbosity": "low" },
)

print(result.output_text)