diff --git a/examples/building_w_rt_mini/byo_realtime.ipynb b/examples/building_w_rt_mini/byo_realtime.ipynb new file mode 100644 index 0000000000..e2c623469d --- /dev/null +++ b/examples/building_w_rt_mini/byo_realtime.ipynb @@ -0,0 +1,507 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d06162f1-643b-4b1a-9300-ca1beb0b937c", + "metadata": {}, + "source": [ + "# Build with Realtime Mini\n", + "\n", + "Growing up, I was fascinated by the idea of Jarvis—an intelligent assistant that could autonomously handle complex workflows. What I didn’t realize back then was that I was imagining the future of voice agents. OpenAI was the first to make this vision real with the launch of `4o-audio`, and more recently made it even more accessible—cutting costs by 70%—with the release of [GPT Realtime Mini](https://platform.openai.com/docs/models/gpt-realtime-mini), which offers lower latency and major improvements in tool calling.\n", + "\n", + "Building with speech models, however, is fundamentally different from working with text-only interfaces. In addition to prompt engineering, audio models bring new challenges: they’re more latency-sensitive, require managing a WebRTC session, and introduce additional variability through voice activity detection (VAD).\n", + "\n", + "To make this process easier, OpenAI has released the Agents SDK in both Python and TypeScript, along with detailed examples that showcase our recommended design patterns for building reliable voice agents.\n", + "\n", + "Before diving into code, let’s map out exactly what we’ll be building—and how it fits into the broader agent handoff architecture." + ] + }, + { + "cell_type": "markdown", + "id": "0652dc8b-9756-4093-a0b6-8ffb34d5a48e", + "metadata": {}, + "source": [ + "## System Architecture\n", + "For our application today we are going to be building an extremely simple customer support app using the **“handoff architecture”**. **“Handoff Architecture”** means a **primary agent** acts as the orchestrator for all incoming customer queries. Rather than handling every request directly, the primary agent analyzes the intent behind the user’s message and **categorizes it into one of 2 core pathways**:\n", + "\n", + "1. General questions and basic support (no authenticatior required).\n", + "2. Specific questions (user authentication required before lookup is performed).\n", + "\n", + "Based on this categorization, the primary agent **hands off the conversation** to the appropriate specialist agent designed for that specific task.\n", + "\n", + "![alt text](./byo_realtime_diagram.png)" + ] + }, + { + "cell_type": "markdown", + "id": "fadab09b-0ffb-4a4c-8993-1c7ae7acca53", + "metadata": {}, + "source": [ + "## Setup\n", + "Instead of starting from scratch we're going to be working from the [openai-agents-js](https://github.com/openai/openai-agents-js/tree/main) repo, so lets start by cloning, installing the necessary dependencies, and building the web demo\n", + "```bash\n", + "git clone https://github.com/openai/openai-agents-js/tree/main\n", + "```\n", + "\n", + "After cloning follow along with the steps in the readme to get started\n", + "```bash\n", + "npm install @openai/agents zod@3\n", + "pnpm examples:realtime-next\n", + "```\n", + "\n", + "If everything works as expected you should see a simple chat interface\n", + "![alt text](./byo_realtime_starting.png)" + ] + }, + { + "cell_type": "markdown", + "id": "6f142bbe-8a06-4a12-99cd-297212a57b8f", + "metadata": {}, + "source": [ + "## Main Agent\n", + "Great! Now that we've cloned the repo, we are going to be modifying `openai-agents-js/examples/realtime-next/src/app/page.tsx`, starting with the **Main Agent**. Our **Main Agent** is the point of entry for the application stack. It acts as an intent classifier for any user query choosing how to re-route between different layers.\n", + "\n", + "The implementation is fairly straightforward\n", + "```js\n", + "const mainAgent = new RealtimeAgent({\n", + " name: 'Main Agent',\n", + " instructions:\n", + " 'You are the entry point for all customer queries. Default to the no-auth QA flow. If authentication is needed and validated, escalate to the Auth Layer by handing off to either the Flight Status Checker or Rebooking Agent. Do not answer policy questions from your own knowledge; rely on subordinate agents and tools.',\n", + " tools: [\n", + " checkFlightsTool,\n", + " ],\n", + " handoffs: [qaAgent],\n", + "});\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "f0e09bb2-480d-473b-9a52-5e694ae048a4", + "metadata": {}, + "source": [ + "## QA Agent\n", + "\n", + "Now that we’ve built the main agent, the next step is to add a specialized supporting agent to handle a specific class of customer queries. For general airline policy questions, this will be the QA Agent.\n", + "\n", + "In a real-world product, this agent would power a more sophisticated experience: it would ingest company-specific PDFs and other reference materials, embed them, and dynamically query those documents at runtime to provide accurate, policy-grounded answers.\n", + "\n", + "```\n", + "┌────────────┐ ┌────────────┐ ┌────────────────────────┐ ┌────────────┐\n", + "│ User Query │ ───► │ QA Agent │ ───► │ Vector DB / Retriever │ ───► │ LLM Answer │\n", + "└────────────┘ └────────────┘ └────────────────────────┘ └────────────┘\n", + " │ │\n", + " │ build search │ top-k context\n", + " ▼ ▼\n", + " (semantic search) (grounded generation)\n", + "\n", + "```\n", + "\n", + "This would typically involve building a full vector database service that embeds the customer’s query and retrieves the most relevant results. For the sake of simplicity in this demo, we’ll mock that part of the pipeline.\n", + "\n", + "If you’re interested in learning how to implement a fully featured retrieval system, take a look at our other cookbooks on the topic [here](https://cookbook.openai.com/examples/vector_databases/pinecone/readme).\n", + "\n", + "```js\n", + "const documentLookupTool = tool({\n", + " name: 'document_lookup_tool',\n", + " description: 'Looks up answers from known airline documentation to handle general questions without authentication.',\n", + " parameters: z.object({\n", + " request: z.string(),\n", + " }),\n", + " execute: async ({ request }) => {\n", + " const mockDocument = `**Airline Customer Support — Quick Reference**\n", + "\n", + "1. Each passenger may bring 1 carry-on (22 x 14 x 9) and 1 personal item.\n", + "2. Checked bags must be under 50 lbs; overweight fees apply.\n", + "3. Online check-in opens 24 hours before departure.\n", + "4. Seat upgrades can be requested up to 1 hour before boarding.\n", + "5. Wi‑Fi is complimentary on all flights over 2 hours.\n", + "6. Customers can change flights once for free within 24 hours of booking.\n", + "7. Exit rows offer extra legroom and require passengers to meet safety criteria.\n", + "8. Refunds can be requested for canceled or delayed flights exceeding 3 hours.\n", + "9. Pets are allowed in the cabin if under 20 lbs and in an approved carrier.\n", + "10. For additional help, contact our support team via chat or call center.`;\n", + " return mockDocument;\n", + " },\n", + "});\n", + "```\n", + "\n", + "Like before when we defined the Main Agent we are going to create another instance of `RealtimeAgent` but this time we are going to supply a `documentLookupTool`.\n", + "\n", + "```js\n", + "const qaAgent = new RealtimeAgent({\n", + " name: 'QA Agent',\n", + " instructions:\n", + " 'You handle general customer questions using the document lookup tool. Use only the document lookup for answers. If the request may involve personal data or operations (rebooking, flight status), call the auth check tool. If auth is required and validated, handoff to the appropriate Auth Layer agent.',\n", + " tools: [documentLookupTool],\n", + "});\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "9f3f0b37-8b7e-463c-8534-78ec37591e0a", + "metadata": {}, + "source": [ + "## Flight Status Agent\n", + "We’ve already built a powerful foundation: a main agent that can handle inbound customer queries, and a QA agent that searches our document store to provide accurate, policy-based answers.\n", + "\n", + "What’s missing is a layer for customer-specific information—for example, queries like “What’s the status of my flight?” or “Which terminal should I go to?”. To support these kinds of personalized interactions, we need to embed an authentication layer into the workflow so the system can securely access and respond with user-specific data.\n", + "\n", + "```\n", + "┌────────────┐ ┌──────────────┐ ┌───────────────────────┐ ┌───────────────────────┐\n", + "│ User Query │ ───► │ Auth Layer │ ───► │ Customer Data Access │ ───► │ LLM Answer (Personal) │\n", + "└────────────┘ └──────────────┘ └───────────────────────┘ └───────────────────────┘\n", + " │ │\n", + " │ verify identity │ query flight / account\n", + " ▼ ▼\n", + " (token, SSO, OTP, etc.) (e.g., flight status, profile info)\n", + "```\n", + "Fortunately, the Agents SDK is designed to support this kind of use case. For customer support scenarios that involve sensitive, account-level information, we can ensure proper access control by using the `needsApproval` parameter within `tool`, which requires the user to authenticate before any protected data is accessed.\n", + "\n", + "```js\n", + "const checkFlightsTool = tool({\n", + " name: 'checkFlightsTool',\n", + " description: 'Call this tool if the user queries about their current flight status',\n", + " parameters: z.object({}),\n", + " // Require approval so the UI can collect creds before executing.\n", + " needsApproval: true,\n", + " execute: async () => {\n", + " if (!credState.username || !credState.password) {\n", + " return 'Authentication missing.';\n", + " }\n", + " return `${credState.username} you are currently booked on the 8am flight from SFO to JFK`;\n", + " },\n", + "});\n", + "```\n", + "\n", + "When a tool is registered with `needsApproval`, it automatically emits a `tool_approval_requested` event during the session. This allows us to add logic inside the `RealtimeAgent` instantiation block of our web application to listen for these events and update the UI accordingly—for example, by prompting the user to approve or authenticate before continuing.\n", + "\n", + "```js\n", + " const [credUsername, setCredUsername] = useState('');\n", + " const [credPassword, setCredPassword] = useState('');\n", + " const [pendingApproval, setPendingApproval] = useState(null);\n", + "\n", + " useEffect(() => {\n", + " session.current = new RealtimeSession(mainAgent, {\n", + " // other configs go here! \n", + " });\n", + " // various other event based logic goes here!\n", + " session.current.on(\n", + " 'tool_approval_requested',\n", + " (_context, _agent, approvalRequest) => {\n", + " setPendingApproval(approvalRequest.approvalItem); // <- Alterations to react state!\n", + " setCredUsername('');\n", + " setCredPassword('');\n", + " setCredOpen(true);\n", + " },\n", + " );\n", + " }, []);\n", + " // ....\n", + " return (\n", + " {credOpen && (\n", + "
\n", + " // ... remainder of component logic\n", + "
\n", + " )}\n", + " )\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "25359c31-2ac1-4d7f-9d47-a08dba45d23f", + "metadata": {}, + "source": [ + "## Final Code Snippet\n", + "And with that, we’re done! You’ve now built the core components of a customer support application:\n", + "\n", + "* A generalist agent capable of handling a wide range of customer support queries\n", + "* An authentication workflow that verifies user identity and retrieves customer-specific information\n", + "\n", + "With everything in place, the final version of `realtime-next/src/app/page.tsx` should look like this.\n", + "\n", + "```js\n", + "'use client';\n", + "\n", + "import {\n", + " RealtimeAgent,\n", + " RealtimeSession,\n", + " tool,\n", + " TransportEvent,\n", + " RealtimeOutputGuardrail,\n", + " OutputGuardrailTripwireTriggered,\n", + " RealtimeItem,\n", + "} from '@openai/agents/realtime';\n", + "import { useEffect, useRef, useState } from 'react';\n", + "import { z } from 'zod';\n", + "import { getToken } from './server/token.action';\n", + "import { App } from '@/components/App';\n", + "import { CameraCapture } from '@/components/CameraCapture';\n", + "\n", + "// Demo-only credential store the tool can read at execution time\n", + "const credState: { username?: string; password?: string } = {};\n", + "\n", + "// ---------------------------------------------\n", + "// Tools.\n", + "\n", + "const documentLookupTool = tool({\n", + " name: 'document_lookup_tool',\n", + " description: 'Looks up answers from known airline documentation to handle general questions without authentication.',\n", + " parameters: z.object({\n", + " request: z.string(),\n", + " }),\n", + " execute: async ({ request }) => {\n", + " const mockDocument = `**Airline Customer Support — Quick Reference**\n", + "\n", + "1. Each passenger may bring 1 carry-on (22 x 14 x 9) and 1 personal item.\n", + "2. Checked bags must be under 50 lbs; overweight fees apply.\n", + "3. Online check-in opens 24 hours before departure.\n", + "4. Seat upgrades can be requested up to 1 hour before boarding.\n", + "5. Wi‑Fi is complimentary on all flights over 2 hours.\n", + "6. Customers can change flights once for free within 24 hours of booking.\n", + "7. Exit rows offer extra legroom and require passengers to meet safety criteria.\n", + "8. Refunds can be requested for canceled or delayed flights exceeding 3 hours.\n", + "9. Pets are allowed in the cabin if under 20 lbs and in an approved carrier.\n", + "10. For additional help, contact our support team via chat or call center.`;\n", + " return mockDocument;\n", + " },\n", + "});\n", + "\n", + "const checkFlightsTool = tool({\n", + " name: 'checkFlightsTool',\n", + " description: 'Call this tool if the user queries about their current flight status',\n", + " parameters: z.object({}),\n", + " // Require approval so the UI can collect creds before executing.\n", + " needsApproval: true,\n", + " execute: async () => {\n", + " if (!credState.username || !credState.password) {\n", + " return 'Authentication missing.';\n", + " }\n", + " return `${credState.username} you are currently booked on the 8am flight from SFO to JFK`;\n", + " },\n", + "});\n", + "\n", + "// ---------------------------------------------\n", + "// Agents for each layer.\n", + "\n", + "// 2. No-Auth Layer: QA Agent with doc lookup and auth check tool.\n", + "const qaAgent = new RealtimeAgent({\n", + " name: 'QA Agent',\n", + " instructions:\n", + " 'You handle general customer questions using the document lookup tool. Use only the document lookup for answers. If the request may involve personal data or operations (rebooking, flight status), call the auth check tool. If auth is required and validated, handoff to the appropriate Auth Layer agent.',\n", + " tools: [documentLookupTool],\n", + "});\n", + "\n", + "// 1. Main Agent: entry point and routing.\n", + "const mainAgent = new RealtimeAgent({\n", + " name: 'Main Agent',\n", + " instructions:\n", + " 'You are the entry point for all customer queries. Default to the no-auth QA flow. If authentication is needed and validated, escalate to the Auth Layer by handing off to either the Flight Status Checker or Rebooking Agent. Do not answer policy questions from your own knowledge; rely on subordinate agents and tools.',\n", + " tools: [\n", + " checkFlightsTool,\n", + " ],\n", + " handoffs: [qaAgent],\n", + "});\n", + "\n", + "// Cross-handoffs so agents can return or escalate.\n", + "qaAgent.handoffs = [mainAgent];\n", + "\n", + "export default function Home() {\n", + " const session = useRef | null>(null);\n", + " const [isConnected, setIsConnected] = useState(false);\n", + " const [isMuted, setIsMuted] = useState(false);\n", + " const [outputGuardrailResult, setOutputGuardrailResult] =\n", + " useState | null>(null);\n", + "\n", + " const [events, setEvents] = useState([]);\n", + " const [history, setHistory] = useState([]);\n", + " const [mcpTools, setMcpTools] = useState([]);\n", + " const [credOpen, setCredOpen] = useState(false);\n", + " const [credUsername, setCredUsername] = useState('');\n", + " const [credPassword, setCredPassword] = useState('');\n", + " const [pendingApproval, setPendingApproval] = useState(null);\n", + "\n", + " useEffect(() => {\n", + " session.current = new RealtimeSession(mainAgent, {\n", + " model: 'gpt-realtime-mini',\n", + " outputGuardrailSettings: {\n", + " debounceTextLength: 200,\n", + " },\n", + " config: {\n", + " audio: {\n", + " output: {\n", + " voice: 'cedar',\n", + " },\n", + " },\n", + " },\n", + " });\n", + " session.current.on('transport_event', (event) => {\n", + " setEvents((events) => [...events, event]);\n", + " });\n", + " session.current.on('mcp_tools_changed', (tools) => {\n", + " setMcpTools(tools.map((t) => t.name));\n", + " });\n", + " session.current.on(\n", + " 'guardrail_tripped',\n", + " (_context, _agent, guardrailError) => {\n", + " setOutputGuardrailResult(guardrailError);\n", + " },\n", + " );\n", + " session.current.on('history_updated', (history) => {\n", + " setHistory(history);\n", + " });\n", + " session.current.on(\n", + " 'tool_approval_requested',\n", + " (_context, _agent, approvalRequest) => {\n", + " setPendingApproval(approvalRequest.approvalItem);\n", + " setCredUsername('');\n", + " setCredPassword('');\n", + " setCredOpen(true);\n", + " },\n", + " );\n", + " }, []);\n", + "\n", + " async function connect() {\n", + " if (isConnected) {\n", + " await session.current?.close();\n", + " setIsConnected(false);\n", + " } else {\n", + " const token = await getToken();\n", + " try {\n", + " await session.current?.connect({\n", + " apiKey: token,\n", + " });\n", + " setIsConnected(true);\n", + " } catch (error) {\n", + " console.error('Error connecting to session', error);\n", + " }\n", + " }\n", + " }\n", + "\n", + " async function toggleMute() {\n", + " if (isMuted) {\n", + " await session.current?.mute(false);\n", + " setIsMuted(false);\n", + " } else {\n", + " await session.current?.mute(true);\n", + " setIsMuted(true);\n", + " }\n", + " }\n", + "\n", + " function handleCredCancel() {\n", + " const approval = pendingApproval;\n", + " setCredOpen(false);\n", + " setPendingApproval(null);\n", + " if (approval) session.current?.reject(approval);\n", + " }\n", + "\n", + " function handleCredSubmit(e: React.FormEvent) {\n", + " e.preventDefault();\n", + " if (!credUsername || !credPassword) return;\n", + " // Store creds for the tool to read\n", + " credState.username = credUsername;\n", + " credState.password = credPassword;\n", + " const approval = pendingApproval;\n", + " setCredOpen(false);\n", + " setPendingApproval(null);\n", + " setCredUsername('');\n", + " setCredPassword('');\n", + " if (approval) session.current?.approve(approval);\n", + " }\n", + "\n", + " return (\n", + "
\n", + " {credOpen && (\n", + "
\n", + "
\n", + "
\n", + " \n", + "
Authentication Required
\n", + "
\n", + " Enter username and password to continue.\n", + "
\n", + " setCredUsername(e.target.value)}\n", + " />\n", + " setCredPassword(e.target.value)}\n", + " />\n", + "
\n", + " \n", + " Cancel\n", + " \n", + " \n", + " Continue\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + " )}\n", + " \n", + "
\n", + " {\n", + " if (!session.current) return;\n", + " session.current.addImage(dataUrl, { triggerResponse: false });\n", + " }}\n", + " />\n", + "
\n", + "
\n", + " );\n", + "}\n", + "```" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/building_w_rt_mini/byo_realtime_diagram.png b/examples/building_w_rt_mini/byo_realtime_diagram.png new file mode 100644 index 0000000000..588fa5f250 Binary files /dev/null and b/examples/building_w_rt_mini/byo_realtime_diagram.png differ diff --git a/examples/building_w_rt_mini/byo_realtime_starting.png b/examples/building_w_rt_mini/byo_realtime_starting.png new file mode 100644 index 0000000000..05f3737eb3 Binary files /dev/null and b/examples/building_w_rt_mini/byo_realtime_starting.png differ diff --git a/registry.yaml b/registry.yaml index 6a384f31f8..8eb421591d 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,6 +4,14 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. +- title: Build with Realtime Mini + path: examples/building_w_rt_mini/byo_realtime.ipynb + date: 2025-10-11 + authors: + - carter-openai + tags: + - gpt-realtime-mini + - title: Sora 2 Prompting Guide path: examples/sora/sora2_prompting_guide.ipynb date: 2025-10-06