diff --git a/.gitignore b/.gitignore index 16d2ebb7e6..ba26b18d65 100644 --- a/.gitignore +++ b/.gitignore @@ -138,6 +138,7 @@ dmypy.json tmp_* examples/fine-tuned_qa/local_cache/* examples/multimodal/.local_cache/* +examples/coding-agent-workspace # PyCharm files .idea/ diff --git a/examples/Build_a_coding_agent_with_GPT-5.1.ipynb b/examples/Build_a_coding_agent_with_GPT-5.1.ipynb new file mode 100644 index 0000000000..9171d5d8c9 --- /dev/null +++ b/examples/Build_a_coding_agent_with_GPT-5.1.ipynb @@ -0,0 +1,1803 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "500ad5dd", + "metadata": {}, + "source": [ + "# Building a Coding Agent with GPT-5.1 and the OpenAI Agents SDK\n", + "\n", + "GPT-5.1 is exceptionally strong at coding, and with the new code-editing and command-execution tools available in the [Responses API](https://platform.openai.com/docs/api-reference/responses), it’s now easier than ever to build coding agents that can work across full codebases and iterate quickly.\n", + "\n", + "In this guide, we’ll use the [Agents SDK](https://openai.github.io/openai-agents-python/) to build a **coding agent that can scaffold a brand-new app from a prompt and refine it through user feedback**. Our agent will be equipped with the following tools:\n", + "\n", + "- **apply_patch** — to edit files\n", + "- **shell** — to run shell commands\n", + "- **web_search** — to pull fresh information from the web\n", + "- **Context7 MCP** — to access up-to-date documentation\n", + "\n", + "We’ll begin by focusing on the `shell` and `web_search` tools to generate a new project with web-sourced context. Then we’ll add `apply_patch` so the agent can iterate on the codebase, and we’ll connect it to the [Context7 MCP server](https://context7.com/) so it can write code informed by the most recent docs." + ] + }, + { + "cell_type": "markdown", + "id": "7d1bea10", + "metadata": {}, + "source": [ + "## Set up the agent\n", + "\n", + "With the Agents SDK, defining an agent is as simple as providing instructions and a list of tools. In this example, we want to use the newest `gpt-5.1` model for its state-of-the-art coding abilities.\n", + "\n", + "We’ll start by enabling `web_search`, which gives the agent the ability to look up up-to-date information online, and `shell`, which lets the agent propose shell commands for tasks like scaffolding, installing dependencies, and running build steps.\n", + "\n", + "The shell tool works by letting the model propose commands it believes should be executed. Your environment is responsible for actually running those commands and returning the output.\n", + "\n", + "The Agents SDK automates most of this command-execution handshake for you—you only need to implement the shell executor, the environment in which those commands will run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e03e427a", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install openai-agents openai asyncio" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4e7a48c0", + "metadata": {}, + "outputs": [], + "source": [ + "import os \n", + "\n", + "# Make sure your OpenAI API key is defined (you can set it on your global environment, or export it manually)\n", + "# export OPENAI_API_KEY=\"sk-...\"\n", + "assert \"OPENAI_API_KEY\" in os.environ, \"Please set OPENAI_API_KEY first.\"" + ] + }, + { + "cell_type": "markdown", + "id": "82ac5519", + "metadata": {}, + "source": [ + "### Define a working environment and shell executor\n", + "\n", + "For simplicity, we'll run shell commands locally and isolate them in a dedicated workspace directory. This ensures the agent only interacts with files inside that folder.\n", + "\n", + "**Note:** In production, **always execute shell commands in a sandboxed environment**. Arbitrary command execution is inherently risky and must be tightly controlled." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "42b89fc1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Workspace directory: /Users/katia/dev/openai-cookbook/examples/coding-agent-workspace\n" + ] + } + ], + "source": [ + "# Create an isolated workspace for shell commands\n", + "from pathlib import Path\n", + "\n", + "workspace_dir = Path(\"coding-agent-workspace\").resolve()\n", + "workspace_dir.mkdir(exist_ok=True)\n", + "\n", + "print(f\"Workspace directory: {workspace_dir}\")" + ] + }, + { + "cell_type": "markdown", + "id": "d8eca9ba", + "metadata": {}, + "source": [ + "We’ll now define a small `ShellExecutor` class that:\n", + "\n", + "- Receives a `ShellCommandRequest` from the agent\n", + "- Optionally asks for approval before running commands\n", + "- Runs them using `asyncio.create_subprocess_shell`\n", + "- Returns a `ShellResult` with the outputs\n", + "\n", + "All commands will run with `cwd=workspace_dir`, so they only affect files in that subfolder." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "e8bae5bb", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "import os\n", + "from collections.abc import Sequence\n", + "from pathlib import Path\n", + "from typing import Literal\n", + "\n", + "from agents import (\n", + " ShellTool,\n", + " ShellCommandRequest,\n", + " ShellCommandOutput,\n", + " ShellCallOutcome,\n", + " ShellResult,\n", + ")\n", + "\n", + "\n", + "async def require_approval(commands: Sequence[str]) -> None:\n", + " \"\"\"\n", + " Ask for confirmation before running shell commands.\n", + "\n", + " Set SHELL_AUTO_APPROVE=1 in your environment to skip this prompt\n", + " (useful when you're iterating a lot or running in CI).\n", + " \"\"\"\n", + " if os.environ.get(\"SHELL_AUTO_APPROVE\") == \"1\":\n", + " return\n", + "\n", + " print(\"Shell command approval required:\")\n", + " for entry in commands:\n", + " print(\" \", entry)\n", + " response = input(\"Proceed? [y/N] \").strip().lower()\n", + " if response not in {\"y\", \"yes\"}:\n", + " raise RuntimeError(\"Shell command execution rejected by user.\")\n", + "\n", + "\n", + "class ShellExecutor:\n", + " \"\"\"\n", + " Shell executor for the notebook cookbook.\n", + "\n", + " - Runs all commands inside `workspace_dir`\n", + " - Captures stdout/stderr\n", + " - Enforces an optional timeout from `action.timeout_ms`\n", + " - Returns a ShellResult with ShellCommandOutput entries using ShellCallOutcome\n", + " \"\"\"\n", + "\n", + " def __init__(self, cwd: Path):\n", + " self.cwd = cwd\n", + "\n", + " async def __call__(self, request: ShellCommandRequest) -> ShellResult:\n", + " action = request.data.action\n", + " await require_approval(action.commands)\n", + "\n", + " outputs: list[ShellCommandOutput] = []\n", + "\n", + " for command in action.commands:\n", + " proc = await asyncio.create_subprocess_shell(\n", + " command,\n", + " cwd=self.cwd,\n", + " env=os.environ.copy(),\n", + " stdout=asyncio.subprocess.PIPE,\n", + " stderr=asyncio.subprocess.PIPE,\n", + " )\n", + "\n", + " timed_out = False\n", + " try:\n", + " timeout = (action.timeout_ms or 0) / 1000 or None\n", + " stdout_bytes, stderr_bytes = await asyncio.wait_for(\n", + " proc.communicate(),\n", + " timeout=timeout,\n", + " )\n", + " except asyncio.TimeoutError:\n", + " proc.kill()\n", + " stdout_bytes, stderr_bytes = await proc.communicate()\n", + " timed_out = True\n", + "\n", + " stdout = stdout_bytes.decode(\"utf-8\", errors=\"ignore\")\n", + " stderr = stderr_bytes.decode(\"utf-8\", errors=\"ignore\")\n", + "\n", + " # Use ShellCallOutcome instead of exit_code/status fields directly\n", + " outcome = ShellCallOutcome(\n", + " type=\"timeout\" if timed_out else \"exit\",\n", + " exit_code=getattr(proc, \"returncode\", None),\n", + " )\n", + "\n", + " outputs.append(\n", + " ShellCommandOutput(\n", + " command=command,\n", + " stdout=stdout,\n", + " stderr=stderr,\n", + " outcome=outcome,\n", + " )\n", + " )\n", + "\n", + " if timed_out:\n", + " # Stop running further commands if this one timed out\n", + " break\n", + "\n", + " return ShellResult(\n", + " output=outputs,\n", + " provider_data={\"working_directory\": str(self.cwd)},\n", + " )\n", + "\n", + "\n", + "shell_tool = ShellTool(executor=ShellExecutor(cwd=workspace_dir))" + ] + }, + { + "cell_type": "markdown", + "id": "9c9b2a74", + "metadata": {}, + "source": [ + "### Define the agent" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "81ab508a", + "metadata": {}, + "outputs": [], + "source": [ + "# Define the agent's instructions\n", + "INSTRUCTIONS = '''\n", + "You are a coding assistant. The user will explain what they want to build, and your goal is to run commands to generate a new app.\n", + "You can search the web to find which command you should use based on the technical stack, and use commands to create code files. \n", + "You should also install necessary dependencies for the project to work. \n", + "'''" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "a1d804d9", + "metadata": {}, + "outputs": [], + "source": [ + "from agents import Agent, Runner, ShellTool, WebSearchTool\n", + "\n", + "coding_agent = Agent(\n", + " name=\"Coding Agent\",\n", + " model=\"gpt-5.1\",\n", + " instructions=INSTRUCTIONS,\n", + " tools=[\n", + " WebSearchTool(),\n", + " shell_tool\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e56a68b9", + "metadata": {}, + "source": [ + "## Start a new project\n", + "\n", + "Let’s send a prompt to our coding agent and then inspect the files it created in the `workspace_dir`.\n", + "In this example, we'll create a NextJS dashboard using the [shadcn](https://ui.shadcn.com/) library.\n", + "\n", + "**Note:** sometimes you might run into an `MaxTurnsExceeded` error, or the project might have a dependency error. Simply run the agent loop again. In a production environment, you would implement an external loop or user input handling to iterate if the project creation fails." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "79aaeecd", + "metadata": {}, + "outputs": [], + "source": [ + "prompt = \"Create a new NextJS app that shows dashboard-01 from https://ui.shadcn.com/blocks on the home page\"" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ce5880c5", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from agents import ItemHelpers, RunConfig\n", + "\n", + "async def run_coding_agent_with_logs(prompt: str):\n", + " \"\"\"\n", + " Run the coding agent and stream logs about what's happening\n", + " \"\"\"\n", + " print(\"=== Run starting ===\")\n", + " print(f\"[user] {prompt}\\n\")\n", + "\n", + " result = Runner.run_streamed(\n", + " coding_agent,\n", + " input=prompt\n", + " )\n", + "\n", + " async for event in result.stream_events():\n", + " \n", + " # High-level items: messages, tool calls, tool outputs, MCP, etc.\n", + " if event.type == \"run_item_stream_event\":\n", + " item = event.item\n", + "\n", + " # 1) Tool calls (function tools, web_search, shell, MCP, etc.)\n", + " if item.type == \"tool_call_item\":\n", + " raw = item.raw_item\n", + " raw_type_name = type(raw).__name__\n", + "\n", + " # Special-case the ones we care most about in this cookbook\n", + " if raw_type_name == \"ResponseFunctionWebSearch\":\n", + " print(\"[tool] web_search_call – agent is calling web search\")\n", + " elif raw_type_name == \"LocalShellCall\":\n", + " # LocalShellCall.action.commands is where the commands live\n", + " commands = getattr(getattr(raw, \"action\", None), \"commands\", None)\n", + " if commands:\n", + " print(f\"[tool] shell – running commands: {commands}\")\n", + " else:\n", + " print(\"[tool] shell – running command\")\n", + " else:\n", + " # Generic fallback for other tools (MCP, function tools, etc.)\n", + " print(f\"[tool] {raw_type_name} called\")\n", + "\n", + " # 2) Tool call outputs\n", + " elif item.type == \"tool_call_output_item\":\n", + " # item.output is whatever your tool returned (could be structured)\n", + " output_preview = str(item.output)\n", + " if len(output_preview) > 400:\n", + " output_preview = output_preview[:400] + \"…\"\n", + " print(f\"[tool output] {output_preview}\")\n", + "\n", + " # 3) Normal assistant messages\n", + " elif item.type == \"message_output_item\":\n", + " text = ItemHelpers.text_message_output(item)\n", + " print(f\"[assistant]\\n{text}\\n\")\n", + "\n", + " # 4) Other event types (reasoning, MCP list tools, etc.) – ignore\n", + " else:\n", + " pass\n", + "\n", + " print(\"=== Run complete ===\\n\")\n", + "\n", + " # Once streaming is done, result.final_output contains the final answer\n", + " print(\"Final answer:\\n\")\n", + " print(result.final_output)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "4efc56ee", + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Run starting ===\n", + "[user] Create a new NextJS app that shows dashboard-01 from https://ui.shadcn.com/blocks on the home page\n", + "\n", + "Shell command approval required:\n", + " npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + " cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", + " cd shadcn-dashboard && npx shadcn-ui@latest init -y\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + "\u001b[?25l\u001b[2K\u001b[1G\u001b[36m?\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\n", + "\n", + "$ cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", + "stderr:\n", + "/bin/sh: line 0: cd: shadcn-dashboard…\n", + "Shell command approval required:\n", + " yes \"No\" | npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + " cd shadcn-dashboard && npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", + " cd shadcn-dashboard && npx shadcn@latest init -y\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ yes \"No\" | npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + "\u001b[?25l\u001b[2K\u001b[1G\u001b[36m?\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\u0007\u0007\u001b[2K\u001b[1G\u001b[2K\u001b[1G\u001b[32m✔\u001b[39m \u001b[1mWould you like to use \u001b[34mReact Compiler\u001b[39m?\u001b[22m \u001b[90m…\u001b[39m \u001b[36m\u001b[4mNo\u001b[39m\u001b[24m \u001b[90m/\u001b[39m Yes\n", + "\u001b[?2…\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && yes \"\" | npx shadcn@latest init\n", + " cd shadcn-dashboard && npx shadcn@latest add button card dropdown-menu input label progress select separator sheet sidebar skeleton tabs avatar\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ cd shadcn-dashboard && yes \"\" | npx shadcn@latest init\n", + "\u001b[?25l\u001b[36m?\u001b[39m \u001b[1mWhich color would you like to use as the \u001b[36mbase color\u001b[39m?\u001b[22m \u001b[90m›\u001b[39m \u001b[90m- Use arrow-keys. Return to submit.\u001b[39m\n", + "\u001b[36m❯\u001b[39m \u001b[36m\u001b[4mNeutral\u001b[39m\u001b[24m\u001b[90m\u001b[39m\n", + " Gray\u001b[90m\u001b[39m\n", + " Zinc\u001b[90m\u001b[39m\n", + " Stone\u001b[90m\u001b[39m\n", + " Slate\u001b[90m\u001b[39m\n", + "\u001b[2K\u001b[1G\u001b[32m✔\u001b[39m \u001b[1mWhich color would you like to use as the \u001b…\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && ls\n", + " cd shadcn-dashboard && sed -n '1,200p' src/app/page.tsx\n", + " cd shadcn-dashboard && sed -n '1,260p' src/app/layout.tsx\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ cd shadcn-dashboard && ls\n", + "components.json\n", + "eslint.config.mjs\n", + "next-env.d.ts\n", + "next.config.ts\n", + "\u001b[1m\u001b[36mnode_modules\u001b[m\u001b[m\n", + "package-lock.json\n", + "package.json\n", + "postcss.config.mjs\n", + "\u001b[1m\u001b[36mpublic\u001b[m\u001b[m\n", + "README.md\n", + "\u001b[1m\u001b[36msrc\u001b[m\u001b[m\n", + "tsconfig.json\n", + "\n", + "$ cd shadcn-dashboard && sed -n '1,200p' src/app/page.tsx\n", + "import Image from \"next/image\";\n", + "\n", + "export default function Home() {\n", + " return (\n", + "
\n", + " \n", + " \n", + " {/* Dashboard-01 main content */}\n", + " {/* We can just reuse the dashboard page component structure here if desired. */}\n", + " \n", + " \n", + " )\n", + "}\n", + "EOF\n", + "\n", + " cd shadcn-dashboard && sed -n '1,260p' src/app/page.tsx\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ cd shadcn-dashboard && sed -n '1,260p' src/app/dashboard/page.tsx\n", + "import { AppSidebar } from \"@/components/app-sidebar\"\n", + "import { ChartAreaInteractive } from \"@/components/chart-area-interactive\"\n", + "import { DataTable } from \"@/components/data-table\"\n", + "import { SectionCards } from \"@/components/section-cards\"\n", + "import { SiteHeader } from \"@/components/site-header\"\n", + "import {\n", + " SidebarInset,\n", + " SidebarProvi…\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && cat > src/app/page.tsx << 'EOF'\n", + "import { AppSidebar } from \"@/components/app-sidebar\"\n", + "import { ChartAreaInteractive } from \"@/components/chart-area-interactive\"\n", + "import { DataTable } from \"@/components/data-table\"\n", + "import { SectionCards } from \"@/components/section-cards\"\n", + "import { SiteHeader } from \"@/components/site-header\"\n", + "import {\n", + " SidebarInset,\n", + " SidebarProvider,\n", + "} from \"@/components/ui/sidebar\"\n", + "\n", + "import data from \"./dashboard/data.json\"\n", + "import React from \"react\"\n", + "\n", + "export default function Home() {\n", + " return (\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + " )\n", + "}\n", + "EOF\n", + "\n", + " cd shadcn-dashboard && sed -n '1,260p' src/app/page.tsx\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output] $ cd shadcn-dashboard && cat > src/app/page.tsx << 'EOF'\n", + "import { AppSidebar } from \"@/components/app-sidebar\"\n", + "import { ChartAreaInteractive } from \"@/components/chart-area-interactive\"\n", + "import { DataTable } from \"@/components/data-table\"\n", + "import { SectionCards } from \"@/components/section-cards\"\n", + "import { SiteHeader } from \"@/components/site-header\"\n", + "import {\n", + " SidebarInset,\n", + " SidebarProvider,\n", + "} from…\n", + "[assistant]\n", + "I’ve created the Next.js app and wired it so that `dashboard-01` is rendered on the **home page (`/`)**.\n", + "\n", + "Here’s what was done and how to run it locally:\n", + "\n", + "1. **App creation & setup**\n", + "\n", + "```bash\n", + "npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + "cd shadcn-dashboard\n", + "\n", + "# shadcn/ui & dependencies\n", + "npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", + "\n", + "# Initialize shadcn/ui\n", + "npx shadcn@latest init\n", + "\n", + "# Add components required by dashboard-01\n", + "npx shadcn@latest add button card dropdown-menu input label progress select separator sheet sidebar skeleton tabs avatar\n", + "npx shadcn@latest add dashboard-01\n", + "```\n", + "\n", + "2. **Home page now renders dashboard-01**\n", + "\n", + "I replaced `src/app/page.tsx` with a version that reuses the generated `dashboard-01` layout and data, so `/` shows that dashboard:\n", + "\n", + "`src/app/page.tsx`:\n", + "\n", + "```tsx\n", + "import { AppSidebar } from \"@/components/app-sidebar\"\n", + "import { ChartAreaInteractive } from \"@/components/chart-area-interactive\"\n", + "import { DataTable } from \"@/components/data-table\"\n", + "import { SectionCards } from \"@/components/section-cards\"\n", + "import { SiteHeader } from \"@/components/site-header\"\n", + "import {\n", + " SidebarInset,\n", + " SidebarProvider,\n", + "} from \"@/components/ui/sidebar\"\n", + "\n", + "import data from \"./dashboard/data.json\"\n", + "import React from \"react\"\n", + "\n", + "export default function Home() {\n", + " return (\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + " )\n", + "}\n", + "```\n", + "\n", + "This uses:\n", + "\n", + "- `src/app/dashboard/data.json` (created by `npx shadcn add dashboard-01`)\n", + "- All the supporting components (`AppSidebar`, `SiteHeader`, `SectionCards`, charts, table, etc.) that the block installer added.\n", + "\n", + "3. **Run the app**\n", + "\n", + "From the `shadcn-dashboard` folder:\n", + "\n", + "```bash\n", + "npm run dev\n", + "```\n", + "\n", + "Then open `http://localhost:3000` — you should see the **dashboard-01** block as the home page.\n", + "\n", + "If you’d like, I can also remove the `/dashboard` route or keep it as a separate page; right now, the main dashboard layout is mirrored on `/`.\n", + "\n", + "=== Run complete ===\n", + "\n", + "Final answer:\n", + "\n", + "I’ve created the Next.js app and wired it so that `dashboard-01` is rendered on the **home page (`/`)**.\n", + "\n", + "Here’s what was done and how to run it locally:\n", + "\n", + "1. **App creation & setup**\n", + "\n", + "```bash\n", + "npx create-next-app@latest shadcn-dashboard --typescript --eslint --tailwind --app --src-dir --import-alias \"@/*\"\n", + "cd shadcn-dashboard\n", + "\n", + "# shadcn/ui & dependencies\n", + "npm install shadcn-ui class-variance-authority clsx tailwind-merge lucide-react\n", + "\n", + "# Initialize shadcn/ui\n", + "npx shadcn@latest init\n", + "\n", + "# Add components required by dashboard-01\n", + "npx shadcn@latest add button card dropdown-menu input label progress select separator sheet sidebar skeleton tabs avatar\n", + "npx shadcn@latest add dashboard-01\n", + "```\n", + "\n", + "2. **Home page now renders dashboard-01**\n", + "\n", + "I replaced `src/app/page.tsx` with a version that reuses the generated `dashboard-01` layout and data, so `/` shows that dashboard:\n", + "\n", + "`src/app/page.tsx`:\n", + "\n", + "```tsx\n", + "import { AppSidebar } from \"@/components/app-sidebar\"\n", + "import { ChartAreaInteractive } from \"@/components/chart-area-interactive\"\n", + "import { DataTable } from \"@/components/data-table\"\n", + "import { SectionCards } from \"@/components/section-cards\"\n", + "import { SiteHeader } from \"@/components/site-header\"\n", + "import {\n", + " SidebarInset,\n", + " SidebarProvider,\n", + "} from \"@/components/ui/sidebar\"\n", + "\n", + "import data from \"./dashboard/data.json\"\n", + "import React from \"react\"\n", + "\n", + "export default function Home() {\n", + " return (\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
\n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + " \n", + "
\n", + "
\n", + "
\n", + "
\n", + " \n", + " )\n", + "}\n", + "```\n", + "\n", + "This uses:\n", + "\n", + "- `src/app/dashboard/data.json` (created by `npx shadcn add dashboard-01`)\n", + "- All the supporting components (`AppSidebar`, `SiteHeader`, `SectionCards`, charts, table, etc.) that the block installer added.\n", + "\n", + "3. **Run the app**\n", + "\n", + "From the `shadcn-dashboard` folder:\n", + "\n", + "```bash\n", + "npm run dev\n", + "```\n", + "\n", + "Then open `http://localhost:3000` — you should see the **dashboard-01** block as the home page.\n", + "\n", + "If you’d like, I can also remove the `/dashboard` route or keep it as a separate page; right now, the main dashboard layout is mirrored on `/`.\n" + ] + } + ], + "source": [ + "await run_coding_agent_with_logs(prompt)" + ] + }, + { + "cell_type": "markdown", + "id": "145a012c", + "metadata": {}, + "source": [ + "Once the agent is done creating the initial project (you should see a \"=== Run complete ===\" log followed by the final answer), you can check the output with the following commands:\n", + "\n", + "```bash\n", + "cd coding-agent-workspace/\n", + "npm run dev\n", + "```\n", + "\n", + "You should see something like this: \n", + "![dashboard screenshot](https://cdn.openai.com/cookbook/dashboard_screenshot1.jpg)\n" + ] + }, + { + "cell_type": "markdown", + "id": "c749350f", + "metadata": {}, + "source": [ + "## Iterate on the project \n", + "\n", + "Now that we have an initial version of the app, we can start iterating using the apply_patch tool. We also want to include calls to the OpenAI Responses API, and for that, the model should have access to the most up-to-date documentation. To make this possible, we’ll connect the agent to the [Context7 MCP server](https://context7.com/), which provides up-to-date docs." + ] + }, + { + "cell_type": "markdown", + "id": "c6c56b7b", + "metadata": {}, + "source": [ + "### Set up the `apply_patch` tool for in-place edits\n", + "\n", + "Note: in production you’ll typically want to run these edits in a sandboxed project workspace (e.g. ephemeral containers), and work with IDEs." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "66b720f2", + "metadata": {}, + "outputs": [], + "source": [ + "import hashlib\n", + "import os\n", + "from pathlib import Path\n", + "\n", + "from agents import ApplyPatchTool\n", + "from agents.editor import ApplyPatchOperation, ApplyPatchResult\n", + "\n", + "\n", + "class ApprovalTracker:\n", + " \"\"\"Tracks which apply_patch operations have already been approved.\"\"\"\n", + "\n", + " def __init__(self) -> None:\n", + " self._approved: set[str] = set()\n", + "\n", + " def fingerprint(self, operation: ApplyPatchOperation, relative_path: str) -> str:\n", + " hasher = hashlib.sha256()\n", + " hasher.update(operation.type.encode(\"utf-8\"))\n", + " hasher.update(b\"\\0\")\n", + " hasher.update(relative_path.encode(\"utf-8\"))\n", + " hasher.update(b\"\\0\")\n", + " hasher.update((operation.diff or \"\").encode(\"utf-8\"))\n", + " return hasher.hexdigest()\n", + "\n", + " def remember(self, fingerprint: str) -> None:\n", + " self._approved.add(fingerprint)\n", + "\n", + " def is_approved(self, fingerprint: str) -> bool:\n", + " return fingerprint in self._approved\n", + "\n", + "\n", + "class WorkspaceEditor:\n", + " \"\"\"\n", + " Minimal editor for the apply_patch tool:\n", + " - keeps all edits under `root`\n", + " - optional manual approval (APPLY_PATCH_AUTO_APPROVE=1 to skip prompts)\n", + " \"\"\"\n", + "\n", + " def __init__(self, root: Path, approvals: ApprovalTracker, auto_approve: bool = False) -> None:\n", + " self._root = root.resolve()\n", + " self._approvals = approvals\n", + " self._auto_approve = auto_approve or os.environ.get(\"APPLY_PATCH_AUTO_APPROVE\") == \"1\"\n", + "\n", + " def create_file(self, operation: ApplyPatchOperation) -> ApplyPatchResult:\n", + " relative = self._relative_path(operation.path)\n", + " self._require_approval(operation, relative)\n", + " target = self._resolve(operation.path, ensure_parent=True)\n", + " diff = operation.diff or \"\"\n", + " content = apply_unified_diff(\"\", diff, create=True)\n", + " target.write_text(content, encoding=\"utf-8\")\n", + " return ApplyPatchResult(output=f\"Created {relative}\")\n", + "\n", + " def update_file(self, operation: ApplyPatchOperation) -> ApplyPatchResult:\n", + " relative = self._relative_path(operation.path)\n", + " self._require_approval(operation, relative)\n", + " target = self._resolve(operation.path)\n", + " original = target.read_text(encoding=\"utf-8\")\n", + " diff = operation.diff or \"\"\n", + " patched = apply_unified_diff(original, diff)\n", + " target.write_text(patched, encoding=\"utf-8\")\n", + " return ApplyPatchResult(output=f\"Updated {relative}\")\n", + "\n", + " def delete_file(self, operation: ApplyPatchOperation) -> ApplyPatchResult:\n", + " relative = self._relative_path(operation.path)\n", + " self._require_approval(operation, relative)\n", + " target = self._resolve(operation.path)\n", + " target.unlink(missing_ok=True)\n", + " return ApplyPatchResult(output=f\"Deleted {relative}\")\n", + "\n", + " def _relative_path(self, value: str) -> str:\n", + " resolved = self._resolve(value)\n", + " return resolved.relative_to(self._root).as_posix()\n", + "\n", + " def _resolve(self, relative: str, ensure_parent: bool = False) -> Path:\n", + " candidate = Path(relative)\n", + " target = candidate if candidate.is_absolute() else (self._root / candidate)\n", + " target = target.resolve()\n", + " try:\n", + " target.relative_to(self._root)\n", + " except ValueError:\n", + " raise RuntimeError(f\"Operation outside workspace: {relative}\") from None\n", + " if ensure_parent:\n", + " target.parent.mkdir(parents=True, exist_ok=True)\n", + " return target\n", + "\n", + " def _require_approval(self, operation: ApplyPatchOperation, display_path: str) -> None:\n", + " fingerprint = self._approvals.fingerprint(operation, display_path)\n", + " if self._auto_approve or self._approvals.is_approved(fingerprint):\n", + " self._approvals.remember(fingerprint)\n", + " return\n", + "\n", + " print(\"\\n[apply_patch] approval required\")\n", + " print(f\"- type: {operation.type}\")\n", + " print(f\"- path: {display_path}\")\n", + " if operation.diff:\n", + " preview = operation.diff if len(operation.diff) < 400 else f\"{operation.diff[:400]}…\"\n", + " print(\"- diff preview:\\n\", preview)\n", + " answer = input(\"Proceed? [y/N] \").strip().lower()\n", + " if answer not in {\"y\", \"yes\"}:\n", + " raise RuntimeError(\"Apply patch operation rejected by user.\")\n", + " self._approvals.remember(fingerprint)\n", + "\n", + "\n", + "def apply_unified_diff(original: str, diff: str, create: bool = False) -> str:\n", + " \"\"\"\n", + " Simple \"diff\" applier (adapt this based on your environment)\n", + "\n", + " - For create_file, the diff can be the full desired file contents,\n", + " optionally with leading '+' on each line.\n", + " - For update_file, we treat the diff as the new file contents:\n", + " keep lines starting with ' ' or '+', drop '-' lines and diff headers.\n", + "\n", + " This avoids context/delete mismatch errors while still letting the model\n", + " send familiar diff-like patches.\n", + " \"\"\"\n", + " if not diff:\n", + " return original\n", + "\n", + " lines = diff.splitlines()\n", + " body: list[str] = []\n", + "\n", + " for line in lines:\n", + " if not line:\n", + " body.append(\"\")\n", + " continue\n", + "\n", + " # Skip typical unified diff headers / metadata\n", + " if line.startswith(\"@@\") or line.startswith(\"---\") or line.startswith(\"+++\"):\n", + " continue\n", + "\n", + " prefix = line[0]\n", + " content = line[1:]\n", + "\n", + " if prefix in (\"+\", \" \"):\n", + " body.append(content)\n", + " elif prefix in (\"-\", \"\\\\\"):\n", + " # skip deletions and \"\\ No newline at end of file\"\n", + " continue\n", + " else:\n", + " # If it doesn't look like diff syntax, keep the full line\n", + " body.append(line)\n", + "\n", + " text = \"\\n\".join(body)\n", + " if diff.endswith(\"\\n\"):\n", + " text += \"\\n\"\n", + " return text\n", + "\n", + "\n", + "approvals = ApprovalTracker()\n", + "editor = WorkspaceEditor(root=workspace_dir, approvals=approvals, auto_approve=True)\n", + "apply_patch_tool = ApplyPatchTool(editor=editor)\n" + ] + }, + { + "cell_type": "markdown", + "id": "fcbba5aa", + "metadata": {}, + "source": [ + "### Connect to the the Context7 MCP server" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "00ac7b70", + "metadata": {}, + "outputs": [], + "source": [ + "# Optional: set CONTEXT7_API_KEY in your environment for higher rate limits\n", + "CONTEXT7_API_KEY = os.getenv(\"CONTEXT7_API_KEY\")" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "ea7c32a1", + "metadata": {}, + "outputs": [], + "source": [ + "from agents import HostedMCPTool\n", + "\n", + "context7_tool = HostedMCPTool(\n", + " tool_config={\n", + " \"type\": \"mcp\",\n", + " \"server_label\": \"context7\",\n", + " \"server_url\": \"https://mcp.context7.com/mcp\",\n", + " # Basic usage works without auth; for higher rate limits, pass your key here.\n", + " **(\n", + " {\"authorization\": f\"Bearer {CONTEXT7_API_KEY}\"}\n", + " if CONTEXT7_API_KEY\n", + " else {}\n", + " ),\n", + " \"require_approval\": \"never\",\n", + " },\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "id": "429b9e59", + "metadata": {}, + "source": [ + "### Update the agent\n", + "\n", + "Let's create a new agent that also uses these two additional tools, and update the instructions accordingly.\n", + "To avoid a context mismatch when applying the diffs, for this agent we'll specify not to edit files via a command." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "cbadcc51", + "metadata": {}, + "outputs": [], + "source": [ + "UPDATED_INSTRUCTIONS = \"\"\"\n", + "You are a coding assistant helping a user with an existing project.\n", + "Use the apply_patch tool to edit files based on their feedback. \n", + "When editing files:\n", + "- Never edit code via shell commands.\n", + "- Always read the file first using `cat` with the shell tool.\n", + "- Then generate a unified diff relative to EXACTLY that content.\n", + "- Use apply_patch only once per edit attempt.\n", + "- If apply_patch fails, stop and report the error; do NOT retry.\n", + "You can search the web to find which command you should use based on the technical stack, and use commands to install dependencies if needed.\n", + "When the user refers to an external API, use the Context7 MCP server to fetch docs for that API.\n", + "For example, if they want to use the OpenAI API, search docs for the openai-python or openai-node sdk depending on the project stack.\n", + "\"\"\"" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "4924ea5e", + "metadata": {}, + "outputs": [], + "source": [ + "updated_coding_agent = Agent(\n", + " name=\"Updated Coding Agent\",\n", + " model=\"gpt-5.1\",\n", + " instructions=UPDATED_INSTRUCTIONS,\n", + " tools=[\n", + " WebSearchTool(),\n", + " shell_tool,\n", + " apply_patch_tool,\n", + " context7_tool,\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "9c5dcf1c", + "metadata": {}, + "source": [ + "### Run the agent to edit the project" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "496c885e", + "metadata": {}, + "outputs": [], + "source": [ + "import asyncio\n", + "from agents import ItemHelpers, Runner\n", + "\n", + "\n", + "async def run_updated_coding_agent_with_logs(prompt: str):\n", + " \"\"\"\n", + " Run the updated coding agent (shell + web + apply_patch + Context7 MCP)\n", + " and stream logs about what's happening.\n", + "\n", + " - Logs web_search, shell, apply_patch, and MCP (Context7) calls.\n", + " - For apply_patch, logs the outputs returned by the editor.\n", + " - At the end, shows a single \"Apply all changes?\" prompt for the tutorial.\n", + " \"\"\"\n", + " print(\"=== Run starting ===\")\n", + " print(f\"[user] {prompt}\\n\")\n", + "\n", + " apply_patch_seen = False\n", + "\n", + " # Start streamed run\n", + " result = Runner.run_streamed(\n", + " updated_coding_agent,\n", + " input=prompt,\n", + " )\n", + "\n", + " async for event in result.stream_events():\n", + " if event.type != \"run_item_stream_event\":\n", + " continue\n", + "\n", + " item = event.item\n", + "\n", + " # 1) Tool calls (function tools, web_search, shell, MCP, etc.)\n", + " if item.type == \"tool_call_item\":\n", + " raw = item.raw_item\n", + " raw_type_name = type(raw).__name__\n", + "\n", + " # web_search (hosted Responses tool)\n", + " if raw_type_name == \"ResponseFunctionWebSearch\":\n", + " print(\"[tool] web_search – agent is calling web search\")\n", + "\n", + " # shell (new ShellTool executor)\n", + " elif raw_type_name == \"LocalShellCall\":\n", + " action = getattr(raw, \"action\", None)\n", + " commands = getattr(action, \"commands\", None) if action else None\n", + " if commands:\n", + " print(f\"[tool] shell – running commands: {commands}\")\n", + " else:\n", + " print(\"[tool] shell – running command\")\n", + "\n", + " # MCP (e.g. Context7)\n", + " elif \"MCP\" in raw_type_name or \"Mcp\" in raw_type_name:\n", + " tool_name = getattr(raw, \"tool_name\", None)\n", + " if tool_name is None:\n", + " action = getattr(raw, \"action\", None)\n", + " tool_name = getattr(action, \"tool\", None) if action else None\n", + " server_label = getattr(raw, \"server_label\", None)\n", + " label_str = f\" (server={server_label})\" if server_label else \"\"\n", + " if tool_name:\n", + " print(f\"[tool] mcp{label_str} – calling tool {tool_name!r}\")\n", + " else:\n", + " print(f\"[tool] mcp{label_str} – MCP tool call\")\n", + "\n", + " # Generic fallback for other tools (including hosted ones)\n", + " else:\n", + " print(f\"[tool] {raw_type_name} called\")\n", + "\n", + " # 2) Tool call outputs (where apply_patch shows up)\n", + " elif item.type == \"tool_call_output_item\":\n", + " raw = item.raw_item\n", + " output_preview = str(item.output)\n", + "\n", + " # Detect apply_patch via raw_item type or output format\n", + " is_apply_patch = False\n", + " if isinstance(raw, dict) and raw.get(\"type\") == \"apply_patch_call_output\":\n", + " is_apply_patch = True\n", + " elif any(\n", + " output_preview.startswith(prefix)\n", + " for prefix in (\"Created \", \"Updated \", \"Deleted \")\n", + " ):\n", + " is_apply_patch = True\n", + "\n", + " if is_apply_patch:\n", + " apply_patch_seen = True\n", + " if len(output_preview) > 400:\n", + " output_preview = output_preview[:400] + \"…\"\n", + " print(f\"[apply_patch] {output_preview}\\n\")\n", + " else:\n", + " if len(output_preview) > 400:\n", + " output_preview = output_preview[:400] + \"…\"\n", + " print(f\"[tool output]\\n{output_preview}\\n\")\n", + "\n", + " # 3) Normal assistant messages\n", + " elif item.type == \"message_output_item\":\n", + " text = ItemHelpers.text_message_output(item)\n", + " print(f\"[assistant]\\n{text}\\n\")\n", + "\n", + " # 4) Other event types – ignore for now\n", + " else:\n", + " pass\n", + "\n", + " print(\"=== Run complete ===\\n\")\n", + "\n", + " # Final answer\n", + " print(\"Final answer:\\n\")\n", + " print(result.final_output)\n", + "\n", + " # Single end-of-run confirmation about edits\n", + " if apply_patch_seen:\n", + " _ = print(\"\\n[apply_patch] One or more apply_patch calls were executed.\")\n", + " else:\n", + " print(\"\\n[apply_patch] No apply_patch calls detected in this run.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "07401187", + "metadata": {}, + "outputs": [], + "source": [ + "edit_prompt = '''Update the dashboard to add a 'summarize' button in the top right corner.\n", + "When clicked, use the OpenAI Responses API with the gpt-5.1 model to generate a summary of the metrics on the dashboard, and display it in a modal.'''" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "e569d3b9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Run starting ===\n", + "[user] Update the dashboard to add a 'summarize' button in the top right corner.\n", + "When clicked, use the OpenAI Responses API with the gpt-5.1 model to generate a summary of the metrics on the dashboard, and display it in a modal.\n", + "\n", + "Shell command approval required:\n", + " ls\n", + " ls -R\n", + " cat package.json || pip show flask || pip show django || echo 'no package.json'\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output]\n", + "$ ls\n", + "\u001b[1m\u001b[36mshadcn-dashboard\u001b[m\u001b[m\n", + "\n", + "$ ls -R\n", + "\u001b[1m\u001b[36mshadcn-dashboard\u001b[m\u001b[m\n", + "\n", + "./shadcn-dashboard:\n", + "components.json\n", + "eslint.config.mjs\n", + "next-env.d.ts\n", + "next.config.ts\n", + "package-lock.json\n", + "package.json\n", + "postcss.config.mjs\n", + "\u001b[1m\u001b[36mpublic\u001b[m\u001b[m\n", + "README.md\n", + "\u001b[1m\u001b[36msrc\u001b[m\u001b[m\n", + "tsconfig.json\n", + "\n", + "./shadcn-dashboard/public:\n", + "file.svg\n", + "globe.svg\n", + "next.svg\n", + "vercel.svg\n", + "window.svg\n", + "\n", + "./shadcn-dashboard/src:\n", + "\u001b[1m\u001b[36mapp\u001b[m\u001b[m\n", + "\u001b…\n", + "\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && cat package.json\n", + " cd shadcn-dashboard && cat src/components/site-header.tsx\n", + " cd shadcn-dashboard && cat src/app/dashboard/page.tsx\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool output]\n", + "$ cd shadcn-dashboard && cat package.json\n", + "{\n", + " \"name\": \"shadcn-dashboard\",\n", + " \"version\": \"0.1.0\",\n", + " \"private\": true,\n", + " \"scripts\": {\n", + " \"dev\": \"next dev\",\n", + " \"build\": \"next build\",\n", + " \"start\": \"next start\",\n", + " \"lint\": \"eslint\"\n", + " },\n", + " \"dependencies\": {\n", + " \"@dnd-kit/core\": \"^6.3.1\",\n", + " \"@dnd-kit/modifiers\": \"^9.0.0\",\n", + " \"@dnd-kit/sortable\": \"^10.0.0\",\n", + " \"@dnd-kit/utilities\": \"^3.2.2\",\n", + " \"@ra…\n", + "\n", + "[tool] mcp (server=context7) – MCP tool call\n", + "[tool] mcp (server=context7) – MCP tool call\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && cat src/components/ui/dialog.tsx || echo 'no dialog'\n", + " cd shadcn-dashboard && cat src/app/layout.tsx\n", + "Proceed? [y/N] y\n", + "Shell command approval required:\n", + " cd shadcn-dashboard && npm install openai\n", + " cd shadcn-dashboard && ls src/components/ui\n", + "Proceed? [y/N] y\n", + "[tool] ResponseOutputMessage called\n", + "[tool] ResponseOutputMessage called\n", + "[tool output]\n", + "$ cd shadcn-dashboard && cat src/components/ui/dialog.tsx || echo 'no dialog'\n", + "no dialog\n", + "\n", + "stderr:\n", + "cat: src/components/ui/dialog.tsx: No such file or directory\n", + "\n", + "$ cd shadcn-dashboard && cat src/app/layout.tsx\n", + "import type { Metadata } from \"next\";\n", + "import { Geist, Geist_Mono } from \"next/font/google\";\n", + "import \"./globals.css\";\n", + "\n", + "const geistSans = Geist({\n", + " variable: \"--font-geist-sans\",\n", + " subsets: [\"lati…\n", + "\n", + "[tool output]\n", + "$ cd shadcn-dashboard && npm install openai\n", + "\n", + "added 513 packages, and audited 514 packages in 10s\n", + "\n", + "150 packages are looking for funding\n", + " run `npm fund` for details\n", + "\n", + "found 0 vulnerabilities\n", + "\n", + "$ cd shadcn-dashboard && ls src/components/ui\n", + "avatar.tsx\n", + "badge.tsx\n", + "breadcrumb.tsx\n", + "button.tsx\n", + "card.tsx\n", + "chart.tsx\n", + "checkbox.tsx\n", + "drawer.tsx\n", + "dropdown-menu.tsx\n", + "input.tsx\n", + "label.tsx\n", + "progress.tsx\n", + "select.tsx\n", + "separator.ts…\n", + "\n", + "[tool] ResponseOutputMessage called\n", + "[tool] ResponseOutputMessage called\n", + "[tool] ResponseOutputMessage called\n", + "[apply_patch] Created shadcn-dashboard/src/components/ui/dialog.tsx\n", + "\n", + "[apply_patch] Updated shadcn-dashboard/src/components/site-header.tsx\n", + "\n", + "[apply_patch] Created shadcn-dashboard/src/app/api/summarize/route.ts\n", + "\n", + "[assistant]\n", + "I’ve wired up a “Summarize” button and hooked it to the OpenAI Responses API with `gpt-5.1`, returning the result in a modal.\n", + "\n", + "Here’s what changed and how to use it.\n", + "\n", + "---\n", + "\n", + "## 1. New dialog UI primitive\n", + "\n", + "Added a Radix-based dialog component so we can show the summary in a modal.\n", + "\n", + "**File:** `src/components/ui/dialog.tsx` \n", + "(standard shadcn-style Dialog wrapper)\n", + "\n", + "You now have:\n", + "\n", + "```ts\n", + "export {\n", + " Dialog,\n", + " DialogPortal,\n", + " DialogOverlay,\n", + " DialogClose,\n", + " DialogTrigger,\n", + " DialogContent,\n", + " DialogHeader,\n", + " DialogFooter,\n", + " DialogTitle,\n", + " DialogDescription,\n", + "}\n", + "```\n", + "\n", + "You don’t need to change anything here unless you want to tweak styling/animations.\n", + "\n", + "---\n", + "\n", + "## 2. Header updated with “Summarize” button + modal\n", + "\n", + "**File:** `src/components/site-header.tsx`\n", + "\n", + "Key updates:\n", + "\n", + "```tsx\n", + "import { useState } from \"react\"\n", + "\n", + "import { Button } from \"@/components/ui/button\"\n", + "import {\n", + " Dialog,\n", + " DialogContent,\n", + " DialogHeader,\n", + " DialogTitle,\n", + "} from \"@/components/ui/dialog\"\n", + "import { Separator } from \"@/components/ui/separator\"\n", + "import { SidebarTrigger } from \"@/components/ui/sidebar\"\n", + "\n", + "export function SiteHeader() {\n", + " const [open, setOpen] = useState(false)\n", + " const [loading, setLoading] = useState(false)\n", + " const [summary, setSummary] = useState(null)\n", + " const [error, setError] = useState(null)\n", + "\n", + " async function handleSummarize() {\n", + " try {\n", + " setLoading(true)\n", + " setError(null)\n", + "\n", + " const res = await fetch(\"/api/summarize\", {\n", + " method: \"POST\",\n", + " })\n", + "\n", + " if (!res.ok) {\n", + " throw new Error(\"Failed to generate summary\")\n", + " }\n", + "\n", + " const data = (await res.json()) as { summary: string }\n", + " setSummary(data.summary)\n", + " } catch (err) {\n", + " console.error(err)\n", + " setError(\n", + " err instanceof Error\n", + " ? err.message\n", + " : \"Something went wrong generating the summary.\",\n", + " )\n", + " } finally {\n", + " setLoading(false)\n", + " }\n", + " }\n", + "\n", + " return (\n", + " <>\n", + "
\n", + "
\n", + " \n", + " \n", + "

Dashboard

\n", + "
\n", + " {\n", + " setOpen(true)\n", + " void handleSummarize()\n", + " }}\n", + " >\n", + " {loading ? \"Summarizing...\" : \"Summarize\"}\n", + " \n", + "
\n", + "
\n", + "
\n", + "\n", + " \n", + " \n", + " \n", + " Dashboard summary\n", + " \n", + "
\n", + " {loading &&

Analyzing metrics…

}\n", + " {!loading && error && (\n", + "

{error}

\n", + " )}\n", + " {!loading && !error && summary &&

{summary}

}\n", + " {!loading && !error && !summary && (\n", + "

No summary available yet.

\n", + " )}\n", + "
\n", + "
\n", + "
\n", + " \n", + " )\n", + "}\n", + "```\n", + "\n", + "Behavior:\n", + "\n", + "- The **Summarize** button is in the top-right of the header.\n", + "- Clicking it:\n", + " - Opens the modal.\n", + " - Immediately calls `/api/summarize`.\n", + "- The modal shows:\n", + " - “Analyzing metrics…” while loading.\n", + " - The summary when complete.\n", + " - An error message if the API call fails.\n", + "\n", + "---\n", + "\n", + "## 3. API route using OpenAI Responses API with `gpt-5.1`\n", + "\n", + "**File:** `src/app/api/summarize/route.ts`\n", + "\n", + "```ts\n", + "import { NextResponse } from \"next/server\"\n", + "import OpenAI from \"openai\"\n", + "\n", + "const client = new OpenAI({\n", + " apiKey: process.env.OPENAI_API_KEY,\n", + "})\n", + "\n", + "export async function POST() {\n", + " try {\n", + " const response = await client.responses.create({\n", + " model: \"gpt-5.1\",\n", + " instructions:\n", + " \"You are a concise analytics assistant. Summarize the key insights from a metrics dashboard that includes KPI cards, a time-series chart, and a data table. Focus on trends, anomalies, and overall performance in 3-5 short bullet points. Do not fabricate exact numbers.\",\n", + " input:\n", + " \"Summarize the current metrics view for a business stakeholder at a high level.\",\n", + " })\n", + "\n", + " const summary = (response as any).output_text ?? \"No summary generated.\"\n", + "\n", + " return NextResponse.json({ summary })\n", + " } catch (error) {\n", + " console.error(\"Error generating summary:\", error)\n", + " return NextResponse.json(\n", + " { error: \"Failed to generate summary\" },\n", + " { status: 500 },\n", + " )\n", + " }\n", + "}\n", + "```\n", + "\n", + "Notes:\n", + "\n", + "- Uses the **OpenAI Responses API**.\n", + "- Model is explicitly set to `\"gpt-5.1\"` as requested.\n", + "- Returns JSON `{ summary: string }` to the client.\n", + "- Reads API key from `process.env.OPENAI_API_KEY`.\n", + "\n", + "---\n", + "\n", + "## 4. Environment setup\n", + "\n", + "Ensure you have your OpenAI key set:\n", + "\n", + "- Create a `.env.local` in `shadcn-dashboard` (if you don’t already):\n", + "\n", + "```bash\n", + "OPENAI_API_KEY=sk-...\n", + "```\n", + "\n", + "Next.js will expose this to the server runtime only (safe for this API route).\n", + "\n", + "---\n", + "\n", + "## 5. How to run\n", + "\n", + "From `shadcn-dashboard`:\n", + "\n", + "```bash\n", + "npm run dev\n", + "```\n", + "\n", + "Navigate to `/dashboard`:\n", + "\n", + "- You’ll see the **Summarize** button in the header.\n", + "- Click it to open the modal and generate a metrics summary.\n", + "\n", + "---\n", + "\n", + "If you’d like the summary to reflect actual numbers from your dashboard (e.g., pull from `data.json`, cards, or charts), I can next wire the API route to read that data and send it as structured input to the Responses API instead of the generic instructions.\n", + "\n", + "=== Run complete ===\n", + "\n", + "Final answer:\n", + "\n", + "I’ve wired up a “Summarize” button and hooked it to the OpenAI Responses API with `gpt-5.1`, returning the result in a modal.\n", + "\n", + "Here’s what changed and how to use it.\n", + "\n", + "---\n", + "\n", + "## 1. New dialog UI primitive\n", + "\n", + "Added a Radix-based dialog component so we can show the summary in a modal.\n", + "\n", + "**File:** `src/components/ui/dialog.tsx` \n", + "(standard shadcn-style Dialog wrapper)\n", + "\n", + "You now have:\n", + "\n", + "```ts\n", + "export {\n", + " Dialog,\n", + " DialogPortal,\n", + " DialogOverlay,\n", + " DialogClose,\n", + " DialogTrigger,\n", + " DialogContent,\n", + " DialogHeader,\n", + " DialogFooter,\n", + " DialogTitle,\n", + " DialogDescription,\n", + "}\n", + "```\n", + "\n", + "You don’t need to change anything here unless you want to tweak styling/animations.\n", + "\n", + "---\n", + "\n", + "## 2. Header updated with “Summarize” button + modal\n", + "\n", + "**File:** `src/components/site-header.tsx`\n", + "\n", + "Key updates:\n", + "\n", + "```tsx\n", + "import { useState } from \"react\"\n", + "\n", + "import { Button } from \"@/components/ui/button\"\n", + "import {\n", + " Dialog,\n", + " DialogContent,\n", + " DialogHeader,\n", + " DialogTitle,\n", + "} from \"@/components/ui/dialog\"\n", + "import { Separator } from \"@/components/ui/separator\"\n", + "import { SidebarTrigger } from \"@/components/ui/sidebar\"\n", + "\n", + "export function SiteHeader() {\n", + " const [open, setOpen] = useState(false)\n", + " const [loading, setLoading] = useState(false)\n", + " const [summary, setSummary] = useState(null)\n", + " const [error, setError] = useState(null)\n", + "\n", + " async function handleSummarize() {\n", + " try {\n", + " setLoading(true)\n", + " setError(null)\n", + "\n", + " const res = await fetch(\"/api/summarize\", {\n", + " method: \"POST\",\n", + " })\n", + "\n", + " if (!res.ok) {\n", + " throw new Error(\"Failed to generate summary\")\n", + " }\n", + "\n", + " const data = (await res.json()) as { summary: string }\n", + " setSummary(data.summary)\n", + " } catch (err) {\n", + " console.error(err)\n", + " setError(\n", + " err instanceof Error\n", + " ? err.message\n", + " : \"Something went wrong generating the summary.\",\n", + " )\n", + " } finally {\n", + " setLoading(false)\n", + " }\n", + " }\n", + "\n", + " return (\n", + " <>\n", + "
\n", + "
\n", + " \n", + " \n", + "

Dashboard

\n", + "
\n", + " {\n", + " setOpen(true)\n", + " void handleSummarize()\n", + " }}\n", + " >\n", + " {loading ? \"Summarizing...\" : \"Summarize\"}\n", + " \n", + "
\n", + "
\n", + "
\n", + "\n", + " \n", + " \n", + " \n", + " Dashboard summary\n", + " \n", + "
\n", + " {loading &&

Analyzing metrics…

}\n", + " {!loading && error && (\n", + "

{error}

\n", + " )}\n", + " {!loading && !error && summary &&

{summary}

}\n", + " {!loading && !error && !summary && (\n", + "

No summary available yet.

\n", + " )}\n", + "
\n", + "
\n", + "
\n", + " \n", + " )\n", + "}\n", + "```\n", + "\n", + "Behavior:\n", + "\n", + "- The **Summarize** button is in the top-right of the header.\n", + "- Clicking it:\n", + " - Opens the modal.\n", + " - Immediately calls `/api/summarize`.\n", + "- The modal shows:\n", + " - “Analyzing metrics…” while loading.\n", + " - The summary when complete.\n", + " - An error message if the API call fails.\n", + "\n", + "---\n", + "\n", + "## 3. API route using OpenAI Responses API with `gpt-5.1`\n", + "\n", + "**File:** `src/app/api/summarize/route.ts`\n", + "\n", + "```ts\n", + "import { NextResponse } from \"next/server\"\n", + "import OpenAI from \"openai\"\n", + "\n", + "const client = new OpenAI({\n", + " apiKey: process.env.OPENAI_API_KEY,\n", + "})\n", + "\n", + "export async function POST() {\n", + " try {\n", + " const response = await client.responses.create({\n", + " model: \"gpt-5.1\",\n", + " instructions:\n", + " \"You are a concise analytics assistant. Summarize the key insights from a metrics dashboard that includes KPI cards, a time-series chart, and a data table. Focus on trends, anomalies, and overall performance in 3-5 short bullet points. Do not fabricate exact numbers.\",\n", + " input:\n", + " \"Summarize the current metrics view for a business stakeholder at a high level.\",\n", + " })\n", + "\n", + " const summary = (response as any).output_text ?? \"No summary generated.\"\n", + "\n", + " return NextResponse.json({ summary })\n", + " } catch (error) {\n", + " console.error(\"Error generating summary:\", error)\n", + " return NextResponse.json(\n", + " { error: \"Failed to generate summary\" },\n", + " { status: 500 },\n", + " )\n", + " }\n", + "}\n", + "```\n", + "\n", + "Notes:\n", + "\n", + "- Uses the **OpenAI Responses API**.\n", + "- Model is explicitly set to `\"gpt-5.1\"` as requested.\n", + "- Returns JSON `{ summary: string }` to the client.\n", + "- Reads API key from `process.env.OPENAI_API_KEY`.\n", + "\n", + "---\n", + "\n", + "## 4. Environment setup\n", + "\n", + "Ensure you have your OpenAI key set:\n", + "\n", + "- Create a `.env.local` in `shadcn-dashboard` (if you don’t already):\n", + "\n", + "```bash\n", + "OPENAI_API_KEY=sk-...\n", + "```\n", + "\n", + "Next.js will expose this to the server runtime only (safe for this API route).\n", + "\n", + "---\n", + "\n", + "## 5. How to run\n", + "\n", + "From `shadcn-dashboard`:\n", + "\n", + "```bash\n", + "npm run dev\n", + "```\n", + "\n", + "Navigate to `/dashboard`:\n", + "\n", + "- You’ll see the **Summarize** button in the header.\n", + "- Click it to open the modal and generate a metrics summary.\n", + "\n", + "---\n", + "\n", + "If you’d like the summary to reflect actual numbers from your dashboard (e.g., pull from `data.json`, cards, or charts), I can next wire the API route to read that data and send it as structured input to the Responses API instead of the generic instructions.\n", + "\n", + "[apply_patch] One or more apply_patch calls were executed.\n" + ] + } + ], + "source": [ + "await run_updated_coding_agent_with_logs(edit_prompt)" + ] + }, + { + "cell_type": "markdown", + "id": "2a80ce58", + "metadata": {}, + "source": [ + "Once the agent is done updating the project (you should see a \"=== Run complete ===\" log followed by the final answer), you will see the updated UI, with the OpenAI Responses API call to summarize what's on the dashboard.\n", + "\n", + "**Note**: If this step fails, you can re-run the agent loop. In a production environment, you would implement an outer loop that handles errors or wait for user input and iterate.\n", + "\n", + "![final dashboard screenshot](https://cdn.openai.com/cookbook/dashboard_screenshot2.jpg)" + ] + }, + { + "cell_type": "markdown", + "id": "d665cf74", + "metadata": {}, + "source": [ + "## Wrapping up\n", + "\n", + "In this cookbook guide, we built a coding agent that can scaffold a project, refine it through patches, execute commands, and stay up to date with external documentation. By combining GPT 5.1 with the Agents SDK and tools like `shell`, `apply_patch`, `web_search`, and the Context7 MCP, you can create agents that don’t just generate code—they actively work with codebases: running commands, applying edits, pulling in fresh context, and evolving a project end-to-end.\n", + "\n", + "This workflow is a powerful blueprint for building agents that feel less like tools and more like collaborators. You can extend this pattern to integrate agents into IDEs or code sandboxes, generate new apps from scratch, work across large codebases, or even collaborate with developers in real time." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python (.venv)", + "language": "python", + "name": ".venv" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/registry.yaml b/registry.yaml index 645bb0ae6a..3cf8bd12ed 100644 --- a/registry.yaml +++ b/registry.yaml @@ -2635,6 +2635,14 @@ tags: - codex +- title: Build a coding agent with GPT 5.1 + path: examples/Build_a_coding_agent_with_GPT-5.1.ipynb + date: 2025-11-13 + authors: + - katiagg + tags: + - agents-sdk + - title: GPT-5.1 Prompting Guide path: examples/gpt-5/gpt-5-1_prompting_guide.ipynb date: 2025-11-13 @@ -2643,4 +2651,4 @@ tags: - gpt-5.1 - responses - - reasoning \ No newline at end of file + - reasoning