# ArcLLM Step 5: OpenAI Adapter + StopReason Normalization

This notebook walks through everything built in Step 5 — the **OpenAI adapter** that translates between ArcLLM's universal types and OpenAI's Chat Completions API, plus the **StopReason** type that normalizes stop reasons across providers.

**What was built:**
- `StopReason` — a `Literal` type that normalizes stop reasons across all providers
- `OpenaiAdapter` — request building + response parsing for OpenAI's Chat Completions API
- Stop reason mapping: OpenAI `finish_reason` → canonical `StopReason`

**Why it matters:** With two adapters, the abstraction proves itself. An agent writes the same code regardless of whether it's talking to Anthropic or OpenAI — different auth, different message formats, different tool wrapping, different stop reasons — all hidden by the adapter.

In [None]:
import os
import json
from unittest.mock import AsyncMock
import httpx

from arcllm import (
    AnthropicAdapter, OpenaiAdapter, BaseAdapter,
    ArcLLMAPIError, ArcLLMConfigError, ArcLLMError, ArcLLMParseError,
    ProviderConfig, ProviderSettings, ModelMetadata,
    Message, TextBlock, ToolUseBlock, ToolResultBlock,
    Tool, ToolCall, Usage, LLMResponse, StopReason,
)
print("All imports successful — including StopReason and OpenaiAdapter!")

In [None]:
# Setup: fake OpenAI config for mocked tests
os.environ["ARCLLM_TEST_KEY"] = "sk-test-openai-key-for-walkthrough"

FAKE_MODEL = "gpt-4o-test"

fake_config = ProviderConfig(
    provider=ProviderSettings(
        api_format="openai-chat",
        base_url="https://api.openai.com",
        api_key_env="ARCLLM_TEST_KEY",
        default_model=FAKE_MODEL,
        default_temperature=0.7,
    ),
    models={
        FAKE_MODEL: ModelMetadata(
            context_window=128000,
            max_output_tokens=16384,
            supports_tools=True,
            supports_vision=True,
            supports_thinking=False,
            input_modalities=["text", "image"],
            cost_input_per_1m=2.50,
            cost_output_per_1m=10.00,
            cost_cache_read_per_1m=1.25,
            cost_cache_write_per_1m=2.50,
        )
    },
)
print(f"Fake config ready: model={FAKE_MODEL}, key_env=ARCLLM_TEST_KEY")

In [None]:
# Response helpers — simulate OpenAI API responses
def make_openai_text_response(
    text="Hello!",
    model=FAKE_MODEL,
    prompt_tokens=10,
    completion_tokens=5,
    finish_reason="stop",
    **extra_usage,
):
    usage = {
        "prompt_tokens": prompt_tokens,
        "completion_tokens": completion_tokens,
        "total_tokens": prompt_tokens + completion_tokens,
    }
    usage.update(extra_usage)
    return {
        "id": "chatcmpl-walkthrough",
        "object": "chat.completion",
        "model": model,
        "choices": [
            {
                "index": 0,
                "message": {"role": "assistant", "content": text},
                "finish_reason": finish_reason,
            }
        ],
        "usage": usage,
    }


def make_openai_tool_response(
    tool_id="call_01",
    tool_name="search",
    tool_args=None,
    text=None,
):
    tool_calls = [
        {
            "id": tool_id,
            "type": "function",
            "function": {
                "name": tool_name,
                "arguments": json.dumps(tool_args or {"query": "test"}),
            },
        }
    ]
    return {
        "id": "chatcmpl-walkthrough",
        "object": "chat.completion",
        "model": FAKE_MODEL,
        "choices": [
            {
                "index": 0,
                "message": {
                    "role": "assistant",
                    "content": text,
                    "tool_calls": tool_calls,
                },
                "finish_reason": "tool_calls",
            }
        ],
        "usage": {
            "prompt_tokens": 20,
            "completion_tokens": 15,
            "total_tokens": 35,
        },
    }


print("Response helpers ready.")

---
## 1. StopReason — Normalized Across Providers

Before Step 5, `LLMResponse.stop_reason` was a plain `str`. Different providers use different values:

| What happened | Anthropic says | OpenAI says | ArcLLM canonical |
|---------------|---------------|-------------|------------------|
| Model finished | `"end_turn"` | `"stop"` | `"end_turn"` |
| Model wants to use a tool | `"tool_use"` | `"tool_calls"` | `"tool_use"` |
| Hit max tokens | `"max_tokens"` | `"length"` | `"max_tokens"` |
| Stop sequence | `"stop_sequence"` | (N/A) | `"stop_sequence"` |
| Content filter | (N/A) | `"content_filter"` | `"end_turn"` |

Step 5 added `StopReason = Literal["end_turn", "tool_use", "max_tokens", "stop_sequence"]` and updated `LLMResponse` to use it. Now agents can check `resp.stop_reason == "tool_use"` regardless of provider.

In [None]:
# StopReason is a Literal type — pydantic validates it
from typing import get_args

valid_values = get_args(StopReason)
print(f"StopReason = Literal{list(valid_values)}")
print(f"\nType: {StopReason}")

In [None]:
# All 4 valid values work in LLMResponse
for reason in get_args(StopReason):
    resp = LLMResponse(
        content="test",
        usage=Usage(input_tokens=1, output_tokens=1, total_tokens=2),
        model="test",
        stop_reason=reason,
    )
    print(f"  {reason:<15} -> accepted")

In [None]:
# Invalid values are rejected by pydantic
from pydantic import ValidationError

for bad_value in ["done", "finished", "stop", "length", "tool_calls"]:
    try:
        LLMResponse(
            content="test",
            usage=Usage(input_tokens=1, output_tokens=1, total_tokens=2),
            model="test",
            stop_reason=bad_value,
        )
        print(f"  {bad_value:<15} -> ACCEPTED (unexpected!)")
    except ValidationError:
        print(f"  {bad_value:<15} -> REJECTED (correct)")

print("\nNote: raw OpenAI values like 'stop' and 'length' are rejected.")
print("The adapter maps them to canonical values before creating LLMResponse.")

---
## 2. OpenAI vs Anthropic — The Differences

Let's see every difference the adapter hides, side by side.

In [None]:
adapter = OpenaiAdapter(fake_config, FAKE_MODEL)
print(f"Adapter name: {adapter.name}")
print(f"Is BaseAdapter? {isinstance(adapter, BaseAdapter)}")
print(f"Is LLMProvider? True (inherits via BaseAdapter)")

In [None]:
# Side-by-side comparison of every adapter difference
diffs = [
    ("Auth header",
     "x-api-key: sk-ant-...",
     "Authorization: Bearer sk-..."),
    ("System messages",
     "Extracted to top-level 'system' param",
     "Stay in-line in messages array"),
    ("Tool definition key",
     "'input_schema' (renamed from parameters)",
     "'parameters' (stays as-is)"),
    ("Tool wrapping",
     "Flat: {name, description, input_schema}",
     'Nested: {type: "function", function: {...}}'),
    ("Tool call location",
     "Content blocks: [{type: 'tool_use', ...}]",
     "Message-level: message.tool_calls[]"),
    ("Tool call args",
     "'input' key (dict)",
     "'arguments' key (JSON string!)"),
    ("Tool results",
     "role='user' with tool_result blocks",
     "role='tool' flattened (one msg per result)"),
    ("Stop reason",
     "Already canonical (end_turn, tool_use)",
     "Mapped: stop->end_turn, tool_calls->tool_use"),
    ("Response content",
     "content: [{type: 'text', text: '...'}]",
     "choices[0].message.content: '...'"),
    ("Usage tokens",
     "input_tokens / output_tokens",
     "prompt_tokens / completion_tokens"),
    ("API version",
     "anthropic-version header required",
     "No version header needed"),
]

print(f"{'Concern':<22} {'Anthropic':<42} {'OpenAI'}")
print("-" * 110)
for concern, anthropic, openai in diffs:
    print(f"{concern:<22} {anthropic:<42} {openai}")

---
## 3. Request Building — Headers

OpenAI uses Bearer token auth (not a custom header like Anthropic).

In [None]:
headers = adapter._build_headers()
print("OpenAI request headers:")
for key, value in headers.items():
    display_val = value[:25] + "..." if "Bearer" in value else value
    print(f"  {key}: {display_val}")

print(f"\nNo 'anthropic-version' header — OpenAI doesn't need one.")
print(f"Bearer prefix: {headers['Authorization'].startswith('Bearer ')}")

---
## 4. Request Building — System Messages (In-line)

Unlike Anthropic (which extracts system messages to a top-level param), OpenAI keeps system messages **in the messages array**. The adapter just passes them through.

In [None]:
# System messages stay in-line — not extracted
messages = [
    Message(role="system", content="You are a helpful assistant."),
    Message(role="user", content="Hello!"),
]

body = adapter._build_request_body(messages)
print(f"Messages in request body: {len(body['messages'])}")
for i, msg in enumerate(body["messages"]):
    print(f"  [{i}] role={msg['role']!r}, content={msg['content']!r}")

print(f"\n'system' key in body? {'system' in body}")
print("System message stays in the messages array — NOT extracted like Anthropic.")

---
## 5. Request Building — Tool Definitions

OpenAI wraps tools in `{"type": "function", "function": {...}}` and keeps the key as `parameters` (Anthropic renames it to `input_schema`).

In [None]:
# Tool definition formatting
tool = Tool(
    name="search_database",
    description="Search the knowledge base",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"},
            "limit": {"type": "integer", "default": 10},
        },
        "required": ["query"],
    },
)

formatted = adapter._format_tool(tool)
print("OpenAI tool format:")
print(json.dumps(formatted, indent=2))

print(f"\nOuter 'type' key: {formatted['type']!r}")
print(f"Nested under 'function': {list(formatted['function'].keys())}")
print(f"Key is 'parameters' (not 'input_schema'): {'parameters' in formatted['function']}")

---
## 6. Request Building — Message Formatting

The OpenAI adapter handles three special cases:
1. **Plain text messages** — pass through as-is
2. **Assistant messages with ToolUseBlocks** — format as `tool_calls` array with JSON string arguments
3. **Tool result messages** — flatten from one message with N blocks to N individual messages

In [None]:
# Simple text message — passthrough
msg = Message(role="user", content="Hello!")
formatted = adapter._format_message(msg)
print("Text message:")
print(f"  {json.dumps(formatted)}")

In [None]:
# Assistant message with ToolUseBlocks -> tool_calls array
msg = Message(
    role="assistant",
    content=[
        TextBlock(text="Let me look that up."),
        ToolUseBlock(id="call_01", name="search", arguments={"query": "cats"}),
        ToolUseBlock(id="call_02", name="lookup", arguments={"id": 42}),
    ],
)
formatted = adapter._format_message(msg)
print("Assistant with tool calls:")
print(json.dumps(formatted, indent=2))

print(f"\nArguments are JSON STRINGS (not dicts):")
for tc in formatted["tool_calls"]:
    args = tc["function"]["arguments"]
    print(f"  {tc['function']['name']}: {args!r}  (type: {type(args).__name__})")

print("\nThis is a key OpenAI quirk — arguments must be serialized to JSON strings.")

In [None]:
# Assistant with ONLY tool calls (no text) -> content is None
msg = Message(
    role="assistant",
    content=[
        ToolUseBlock(id="call_01", name="calc", arguments={"x": 1}),
    ],
)
formatted = adapter._format_message(msg)
print(f"content: {formatted['content']}  (None when no text blocks)")
print(f"tool_calls: {len(formatted['tool_calls'])} call(s)")

### 6a. Tool Result Flattening

This is the most interesting OpenAI adapter behavior.

In ArcLLM, a single `Message(role="tool")` can contain **multiple** `ToolResultBlock`s (the agent executed 2 tools, here are both results).

But OpenAI requires **one message per tool result**, each with its own `tool_call_id`. So one ArcLLM message becomes N OpenAI messages.

```
ArcLLM (1 message):              OpenAI (2 messages):
Message(role="tool",       -->   {"role": "tool", "tool_call_id": "t1", "content": "42"}
  content=[                -->   {"role": "tool", "tool_call_id": "t2", "content": "hello"}
    ToolResult(id="t1"),
    ToolResult(id="t2"),
  ])
```

This flattening happens in `_format_messages()` (not `_format_message()`) — clean separation of concerns.

In [None]:
# Tool result flattening: 1 ArcLLM message -> 2 OpenAI messages
messages = [
    Message(role="user", content="Do two things."),
    Message(
        role="tool",
        content=[
            ToolResultBlock(tool_use_id="call_01", content="Result from tool 1"),
            ToolResultBlock(tool_use_id="call_02", content="Result from tool 2"),
        ],
    ),
]

formatted = adapter._format_messages(messages)
print(f"ArcLLM messages: {len(messages)}")
print(f"OpenAI messages: {len(formatted)}  (flattened!)")
print()
for i, msg in enumerate(formatted):
    print(f"  [{i}] {json.dumps(msg)}")

In [None]:
# Single tool result — no flattening needed (1 -> 1)
messages = [
    Message(
        role="tool",
        content=[ToolResultBlock(tool_use_id="call_01", content="42")],
    ),
]

formatted = adapter._format_messages(messages)
print(f"1 ArcLLM message -> {len(formatted)} OpenAI message(s)")
print(f"  {json.dumps(formatted[0])}")

---
## 7. Request Building — Full Request Body

`_build_request_body()` combines everything into the final API payload.

In [None]:
# Simple text request
body = adapter._build_request_body(
    [Message(role="user", content="What's the weather?")]
)
print("Simple request body:")
print(json.dumps(body, indent=2))
print(f"\nNo 'tools' key (not provided).")
print(f"No 'system' key (OpenAI doesn't extract system messages).")

In [None]:
# Request with system message + tools
body = adapter._build_request_body(
    [
        Message(role="system", content="You are a research assistant."),
        Message(role="user", content="Find papers about LLM agents"),
    ],
    tools=[tool],  # from above
)
print("Full request body with system + tools:")
print(json.dumps(body, indent=2))

In [None]:
# kwargs override config defaults
body_default = adapter._build_request_body(
    [Message(role="user", content="Hi")]
)
body_override = adapter._build_request_body(
    [Message(role="user", content="Hi")],
    max_tokens=1000, temperature=0.2,
)

print(f"Default:  max_tokens={body_default['max_tokens']}, temperature={body_default['temperature']}")
print(f"Override: max_tokens={body_override['max_tokens']}, temperature={body_override['temperature']}")
print(f"\nOverride chain: kwargs > provider config > model metadata")

---
## 8. Response Parsing — Text

OpenAI responses come wrapped in `choices[0].message`. The adapter extracts content, stop reason, and usage.

In [None]:
# Parse a text response
raw = make_openai_text_response(text="The answer is 42.")
print("Raw OpenAI response structure:")
print(json.dumps(raw, indent=2))

In [None]:
# Parse it into LLMResponse
resp = adapter._parse_response(raw)
print(f"Type:        {type(resp).__name__}")
print(f"Content:     {resp.content!r}")
print(f"Tool calls:  {resp.tool_calls}")
print(f"Stop reason: {resp.stop_reason!r}  (mapped from 'stop')")
print(f"Model:       {resp.model}")
print(f"Usage:       {resp.usage.input_tokens}in + {resp.usage.output_tokens}out = {resp.usage.total_tokens}")
print(f"Raw stored:  {resp.raw is raw}")

---
## 9. Response Parsing — Tool Calls

OpenAI puts tool calls at the **message level** (`message.tool_calls[]`), not as content blocks like Anthropic. Arguments come as JSON **strings** that need parsing.

In [None]:
# Parse a tool use response
raw = make_openai_tool_response(
    tool_id="call_abc",
    tool_name="search",
    tool_args={"query": "LLM agents", "limit": 5},
)

print("Raw tool call in OpenAI format:")
raw_tc = raw["choices"][0]["message"]["tool_calls"][0]
print(f"  id: {raw_tc['id']}")
print(f"  function.name: {raw_tc['function']['name']}")
print(f"  function.arguments: {raw_tc['function']['arguments']!r}")
print(f"  arguments type: {type(raw_tc['function']['arguments']).__name__}  <- JSON string!")

In [None]:
# After parsing — arguments are a proper dict
resp = adapter._parse_response(raw)
print(f"Parsed tool call:")
print(f"  id:        {resp.tool_calls[0].id}")
print(f"  name:      {resp.tool_calls[0].name}")
print(f"  arguments: {resp.tool_calls[0].arguments}")
print(f"  type:      {type(resp.tool_calls[0].arguments).__name__}  <- proper dict!")
print(f"\nStop reason: {resp.stop_reason!r}  (mapped from 'tool_calls')")

In [None]:
# Mixed response: text + tool calls together
raw = make_openai_tool_response(
    text="Let me search for that.",
    tool_id="call_xyz",
    tool_name="search",
    tool_args={"query": "cats"},
)
resp = adapter._parse_response(raw)
print(f"Content:     {resp.content!r}")
print(f"Tool calls:  {len(resp.tool_calls)}")
print("Both present — text AND tool calls in the same response.")

In [None]:
# Null content — when model only uses tools (no text)
raw = make_openai_tool_response()
raw["choices"][0]["message"]["content"] = None
resp = adapter._parse_response(raw)
print(f"Content: {resp.content}  (None — pure tool call, no text)")
print(f"Tool calls: {len(resp.tool_calls)}")

---
## 10. Stop Reason Mapping

The `_STOP_REASON_MAP` dict at module level maps every OpenAI `finish_reason` to a canonical `StopReason`.

In [None]:
# Show the full mapping
from arcllm.adapters.openai import _STOP_REASON_MAP

print("OpenAI finish_reason -> ArcLLM StopReason:")
print("-" * 45)
for openai_val, arcllm_val in _STOP_REASON_MAP.items():
    print(f"  {openai_val!r:<20} -> {arcllm_val!r}")

print(f"\nUnknown values default to 'end_turn' (safe fallback).")

In [None]:
# Test each mapping through _parse_response
test_cases = [
    ("stop", "end_turn"),
    ("tool_calls", "tool_use"),
    ("length", "max_tokens"),
    ("content_filter", "end_turn"),
]

for openai_reason, expected in test_cases:
    if openai_reason == "tool_calls":
        raw = make_openai_tool_response()
    else:
        raw = make_openai_text_response(finish_reason=openai_reason)
    resp = adapter._parse_response(raw)
    status = "pass" if resp.stop_reason == expected else "FAIL"
    print(f"  {openai_reason!r:<18} -> {resp.stop_reason!r:<14} (expected {expected!r}) [{status}]")

In [None]:
# Unknown finish_reason -> safe default
result = adapter._map_stop_reason("some_future_reason")
print(f"Unknown reason 'some_future_reason' -> {result!r}")
print("Safe default prevents crashes when OpenAI adds new finish reasons.")

---
## 11. Usage Parsing

OpenAI uses different token field names than ArcLLM's canonical format, plus has `reasoning_tokens` for o1/o3 models.

In [None]:
# Standard usage mapping
raw = make_openai_text_response(prompt_tokens=100, completion_tokens=50)
resp = adapter._parse_response(raw)

print("Token name mapping:")
print(f"  OpenAI 'prompt_tokens'     -> ArcLLM 'input_tokens':  {resp.usage.input_tokens}")
print(f"  OpenAI 'completion_tokens'  -> ArcLLM 'output_tokens': {resp.usage.output_tokens}")
print(f"  OpenAI 'total_tokens'       -> ArcLLM 'total_tokens':  {resp.usage.total_tokens}")
print(f"  reasoning_tokens:           {resp.usage.reasoning_tokens}  (None — not an o1/o3 model)")

In [None]:
# Reasoning tokens (o1/o3 models include completion_tokens_details)
raw = make_openai_text_response(prompt_tokens=200, completion_tokens=150)
raw["usage"]["completion_tokens_details"] = {"reasoning_tokens": 80}

resp = adapter._parse_response(raw)
print("o1/o3 model with reasoning tokens:")
print(f"  input_tokens:     {resp.usage.input_tokens}")
print(f"  output_tokens:    {resp.usage.output_tokens}")
print(f"  reasoning_tokens: {resp.usage.reasoning_tokens}")
print(f"  total_tokens:     {resp.usage.total_tokens}")
print(f"\nReasoning tokens tell agents how much 'thinking' happened.")

---
## 12. Tool Call Parsing Edge Cases

OpenAI sends tool call arguments as JSON strings. The adapter handles three cases:
1. JSON string (normal) -> `json.loads()`
2. Dict (defensive) -> pass-through
3. Garbage -> `ArcLLMParseError`

In [None]:
# Normal case: JSON string arguments (this is what OpenAI sends)
tc = adapter._parse_tool_call({
    "id": "call_1",
    "type": "function",
    "function": {
        "name": "calc",
        "arguments": '{"expression": "2+2", "precision": 2}',
    },
})
print(f"JSON string -> parsed dict: {tc.arguments}")
print(f"Type: {type(tc.arguments).__name__}")

In [None]:
# Defensive case: dict arguments (pass-through)
tc = adapter._parse_tool_call({
    "id": "call_1",
    "type": "function",
    "function": {
        "name": "calc",
        "arguments": {"expression": "2+2"},
    },
})
print(f"Dict -> pass-through: {tc.arguments}")
print(f"Type: {type(tc.arguments).__name__}")

In [None]:
# Bad JSON string -> ArcLLMParseError with raw data preserved
try:
    adapter._parse_tool_call({
        "id": "call_1",
        "type": "function",
        "function": {
            "name": "calc",
            "arguments": "not valid json {{{",
        },
    })
except ArcLLMParseError as e:
    print(f"ArcLLMParseError: {e}")
    print(f"Raw string:      {e.raw_string!r}")
    print(f"Original error:  {type(e.original_error).__name__}")

In [None]:
# Unexpected type (not dict, not string) -> ArcLLMParseError
try:
    adapter._parse_tool_call({
        "id": "call_1",
        "type": "function",
        "function": {
            "name": "calc",
            "arguments": 12345,
        },
    })
except ArcLLMParseError as e:
    print(f"Unexpected type caught: {e}")
    print(f"Raw string: {e.raw_string!r}")

---
## 13. Full invoke() Cycle (Mocked)

Let's simulate the complete `invoke()` flow with mocked HTTP responses.

In [None]:
# Text conversation
async def demo_text_invoke():
    adapter = OpenaiAdapter(fake_config, FAKE_MODEL)

    response_data = make_openai_text_response(text="Austin is 75F and sunny.")
    mock_response = httpx.Response(
        200, json=response_data,
        request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_response)

    resp = await adapter.invoke([
        Message(role="system", content="You are a weather bot."),
        Message(role="user", content="What's the weather in Austin?"),
    ])

    print(f"Response type: {type(resp).__name__}")
    print(f"Content:       {resp.content}")
    print(f"Stop reason:   {resp.stop_reason}")
    print(f"Tokens:        {resp.usage.total_tokens}")

    # Verify the request was built correctly
    call_kwargs = adapter._client.post.call_args
    url = call_kwargs[0][0]
    body = call_kwargs[1]["json"]
    print(f"\nURL:           {url}")
    print(f"Messages:      {len(body['messages'])} (system stays in-line)")
    print(f"Auth header:   {call_kwargs[1]['headers']['Authorization'][:25]}...")

await demo_text_invoke()

In [None]:
# Tool use conversation
async def demo_tool_invoke():
    adapter = OpenaiAdapter(fake_config, FAKE_MODEL)

    response_data = make_openai_tool_response(
        tool_id="call_weather",
        tool_name="get_weather",
        tool_args={"city": "Austin", "units": "fahrenheit"},
        text="Let me check the weather.",
    )
    mock_response = httpx.Response(
        200, json=response_data,
        request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_response)

    tools = [
        Tool(
            name="get_weather",
            description="Get current weather for a city",
            parameters={
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        )
    ]

    resp = await adapter.invoke(
        [Message(role="user", content="What's the weather in Austin?")],
        tools=tools,
    )

    print(f"Content:     {resp.content!r}")
    print(f"Stop reason: {resp.stop_reason}")
    print(f"Tool calls:  {len(resp.tool_calls)}")
    for tc in resp.tool_calls:
        print(f"  {tc.name}({tc.arguments})")

await demo_tool_invoke()

In [None]:
# HTTP error handling
async def demo_error_invoke():
    adapter = OpenaiAdapter(fake_config, FAKE_MODEL)

    for status, label in [(429, "Rate limited"), (401, "Invalid key"), (500, "Server error")]:
        mock_response = httpx.Response(
            status, text=f'{label}',
            request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions"),
        )
        adapter._client = AsyncMock()
        adapter._client.post = AsyncMock(return_value=mock_response)

        try:
            await adapter.invoke([Message(role="user", content="Hi")])
        except ArcLLMAPIError as e:
            print(f"  HTTP {e.status_code}: provider={e.provider!r}, body={e.body!r}")

await demo_error_invoke()

---
## 14. Simulated Agentic Tool-Calling Loop (OpenAI)

The exact same loop pattern from the Anthropic walkthrough, but using `OpenaiAdapter`. The agent code is **identical** — only the adapter changes.

In [None]:
async def demo_openai_agentic_loop():
    """Simulate a full agentic tool-calling loop with OpenAI adapter."""
    adapter = OpenaiAdapter(fake_config, FAKE_MODEL)

    tools = [
        Tool(
            name="get_weather",
            description="Get weather for a city",
            parameters={
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        )
    ]

    messages = [
        Message(role="system", content="Use tools to answer questions."),
        Message(role="user", content="What's the weather in Austin?"),
    ]

    # --- Turn 1: LLM wants to use a tool ---
    turn1_data = make_openai_tool_response(
        tool_id="call_weather_01",
        tool_name="get_weather",
        tool_args={"city": "Austin"},
    )
    mock_resp1 = httpx.Response(
        200, json=turn1_data,
        request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_resp1)

    resp = await adapter.invoke(messages, tools=tools)
    print(f"Turn 1: stop_reason={resp.stop_reason}")
    print(f"  LLM wants to call: {resp.tool_calls[0].name}({resp.tool_calls[0].arguments})")

    # Agent executes the tool
    tool_result = "75 deg F, sunny, humidity 45%"
    print(f"  Agent executes tool -> result: {tool_result!r}")

    # Agent adds assistant response + tool result to conversation
    tc = resp.tool_calls[0]
    messages.append(Message(
        role="assistant",
        content=[ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)],
    ))
    messages.append(Message(
        role="tool",
        content=[ToolResultBlock(tool_use_id=tc.id, content=tool_result)],
    ))

    # --- Turn 2: LLM gives final answer ---
    turn2_data = make_openai_text_response(
        text="The weather in Austin is 75 deg F and sunny with 45% humidity."
    )
    mock_resp2 = httpx.Response(
        200, json=turn2_data,
        request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions"),
    )
    adapter._client.post = AsyncMock(return_value=mock_resp2)

    resp = await adapter.invoke(messages, tools=tools)
    print(f"\nTurn 2: stop_reason={resp.stop_reason}")
    print(f"  Final answer: {resp.content}")
    print(f"\nLoop complete! {len(messages)} messages in conversation.")

    # Verify the request — tool result was flattened correctly
    call_kwargs = adapter._client.post.call_args
    sent_body = call_kwargs[1]["json"]
    tool_msgs = [m for m in sent_body["messages"] if m["role"] == "tool"]
    print(f"\nTool result messages sent to OpenAI: {len(tool_msgs)}")
    for tm in tool_msgs:
        print(f"  tool_call_id={tm['tool_call_id']}, content={tm['content']!r}")

await demo_openai_agentic_loop()

---
## 15. Cross-Provider Proof: Same Agent Code, Different Adapters

This is the whole point of ArcLLM. The agent writes `adapter.invoke(messages, tools)` once, and it works with **any** provider. Let's prove it.

In [None]:
# Setup both adapters with fake configs
os.environ["ARCLLM_ANTHROPIC_KEY"] = "sk-ant-test-key"
os.environ["ARCLLM_OPENAI_KEY"] = "sk-openai-test-key"

anthropic_config = ProviderConfig(
    provider=ProviderSettings(
        api_format="anthropic-messages",
        base_url="https://api.anthropic.com",
        api_key_env="ARCLLM_ANTHROPIC_KEY",
        default_model="claude-test",
        default_temperature=0.7,
    ),
    models={
        "claude-test": ModelMetadata(
            context_window=200000, max_output_tokens=8192,
            supports_tools=True, supports_vision=True, supports_thinking=True,
            input_modalities=["text", "image"],
            cost_input_per_1m=3.0, cost_output_per_1m=15.0,
            cost_cache_read_per_1m=0.3, cost_cache_write_per_1m=3.75,
        )
    },
)

openai_config = ProviderConfig(
    provider=ProviderSettings(
        api_format="openai-chat",
        base_url="https://api.openai.com",
        api_key_env="ARCLLM_OPENAI_KEY",
        default_model="gpt-4o-test",
        default_temperature=0.7,
    ),
    models={
        "gpt-4o-test": ModelMetadata(
            context_window=128000, max_output_tokens=16384,
            supports_tools=True, supports_vision=True, supports_thinking=False,
            input_modalities=["text", "image"],
            cost_input_per_1m=2.50, cost_output_per_1m=10.00,
            cost_cache_read_per_1m=1.25, cost_cache_write_per_1m=2.50,
        )
    },
)

print("Both configs ready.")

In [None]:
async def agent_tool_loop(adapter, mock_responses):
    """
    Generic agentic loop — works with ANY adapter.
    This is the code agents actually write.
    """
    adapter._client = AsyncMock()
    response_iter = iter(mock_responses)

    tools = [
        Tool(
            name="calculate",
            description="Evaluate a math expression",
            parameters={
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"],
            },
        )
    ]

    messages = [Message(role="user", content="What is 7 * 8?")]
    turn = 0

    while True:
        turn += 1
        mock_resp = next(response_iter)
        adapter._client.post = AsyncMock(return_value=mock_resp)

        resp = await adapter.invoke(messages, tools=tools)

        if resp.stop_reason == "end_turn":
            return resp.content, turn

        if resp.stop_reason == "tool_use":
            tc = resp.tool_calls[0]
            result = str(eval(tc.arguments["expression"]))

            messages.append(Message(
                role="assistant",
                content=[ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)],
            ))
            messages.append(Message(
                role="tool",
                content=[ToolResultBlock(tool_use_id=tc.id, content=result)],
            ))

        if turn > 5:
            return "(max turns)", turn


# Mock responses for Anthropic format
anthropic_responses = [
    httpx.Response(200, json={
        "id": "msg_1", "type": "message", "role": "assistant", "model": "claude-test",
        "content": [{"type": "tool_use", "id": "toolu_01", "name": "calculate", "input": {"expression": "7 * 8"}}],
        "stop_reason": "tool_use",
        "usage": {"input_tokens": 50, "output_tokens": 20},
    }, request=httpx.Request("POST", "https://api.anthropic.com/v1/messages")),
    httpx.Response(200, json={
        "id": "msg_2", "type": "message", "role": "assistant", "model": "claude-test",
        "content": [{"type": "text", "text": "7 * 8 = 56"}],
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 80, "output_tokens": 10},
    }, request=httpx.Request("POST", "https://api.anthropic.com/v1/messages")),
]

# Mock responses for OpenAI format
openai_responses = [
    httpx.Response(200, json=make_openai_tool_response(
        tool_id="call_01", tool_name="calculate", tool_args={"expression": "7 * 8"},
    ), request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions")),
    httpx.Response(200, json=make_openai_text_response(
        text="7 * 8 = 56",
    ), request=httpx.Request("POST", "https://api.openai.com/v1/chat/completions")),
]

# Run the SAME agent code with both adapters
print("=" * 60)
print("Same agent code, different adapters:")
print("=" * 60)

anthropic_adapter = AnthropicAdapter(anthropic_config, "claude-test")
answer, turns = await agent_tool_loop(anthropic_adapter, anthropic_responses)
print(f"\nAnthropic: '{answer}' (in {turns} turns)")

openai_adapter = OpenaiAdapter(openai_config, "gpt-4o-test")
answer, turns = await agent_tool_loop(openai_adapter, openai_responses)
print(f"OpenAI:    '{answer}' (in {turns} turns)")

print("\nThe agent_tool_loop() function is IDENTICAL for both.")
print("It checks resp.stop_reason == 'end_turn' and 'tool_use' — works everywhere.")

---
## 16. Live API Calls (Real Anthropic + Cross-Provider Comparison)

Let's use the real Anthropic API to run the same agent pattern we tested with mocks above — proving the full stack works end-to-end.

(No OpenAI key is configured, so OpenAI calls use mocks. The agent code is the same either way.)

In [None]:
from dotenv import load_dotenv
from arcllm import load_provider_config

load_dotenv()

real_config = load_provider_config("anthropic")
real_model = real_config.provider.default_model

print(f"Provider: {real_config.provider.api_format}")
print(f"Model:    {real_model}")
print(f"API key:  {os.environ[real_config.provider.api_key_env][:12]}...")

### 16a. StopReason in Action (Real API)

Make a real call with and without tools to see both `end_turn` and `tool_use` stop reasons.

In [None]:
async def live_stop_reasons():
    calc_tool = Tool(
        name="calculate",
        description="Evaluate a mathematical expression",
        parameters={
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"],
        },
    )

    async with AnthropicAdapter(real_config, real_model) as adapter:
        # Text response -> end_turn
        resp1 = await adapter.invoke(
            [Message(role="user", content="Say 'hello' and nothing else.")],
            max_tokens=10, temperature=0.0,
        )
        print(f"Text response:")
        print(f"  content:     {resp1.content!r}")
        print(f"  stop_reason: {resp1.stop_reason!r}  <- StopReason type")

        # Tool response -> tool_use
        resp2 = await adapter.invoke(
            [Message(role="user", content="What is 847 * 293? Use the calculator.")],
            tools=[calc_tool], max_tokens=200, temperature=0.0,
        )
        print(f"\nTool response:")
        print(f"  content:     {resp2.content!r}")
        print(f"  stop_reason: {resp2.stop_reason!r}  <- StopReason type")
        if resp2.tool_calls:
            tc = resp2.tool_calls[0]
            print(f"  tool:        {tc.name}({tc.arguments})")

    print(f"\nBoth stop reasons are canonical StopReason values.")
    print(f"An agent checks resp.stop_reason == 'tool_use' — works for Anthropic AND OpenAI.")

await live_stop_reasons()

### 16b. Real Agentic Loop (Same Pattern as OpenAI Mock)

The same multi-tool loop pattern from Section 14, but hitting the real Anthropic API.

In [None]:
async def live_agentic_loop():
    tools = [
        Tool(
            name="lookup_price",
            description="Look up the price of an item in USD",
            parameters={
                "type": "object",
                "properties": {"item": {"type": "string"}},
                "required": ["item"],
            },
        ),
        Tool(
            name="calculate",
            description="Evaluate a math expression",
            parameters={
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"],
            },
        ),
    ]

    prices = {"coffee": 4.50, "bagel": 3.25, "orange juice": 5.00}

    def execute_tool(name, arguments):
        if name == "lookup_price":
            item = arguments["item"].lower()
            price = prices.get(item)
            return f"${price:.2f}" if price else f"Item '{item}' not found"
        elif name == "calculate":
            try:
                return str(eval(arguments["expression"]))
            except Exception as e:
                return f"Error: {e}"
        return "Unknown tool"

    messages = [
        Message(role="system", content="Use tools to answer. Be concise."),
        Message(role="user", content="I want 2 coffees and 1 bagel. What's the total?"),
    ]

    total_tokens = 0
    turn = 0

    async with AnthropicAdapter(real_config, real_model) as adapter:
        while True:
            turn += 1
            resp = await adapter.invoke(messages, tools=tools, max_tokens=300, temperature=0.0)
            total_tokens += resp.usage.total_tokens

            print(f"--- Turn {turn}: stop_reason={resp.stop_reason} ---")

            if resp.stop_reason == "end_turn":
                print(f"  Final answer: {resp.content}")
                break

            if resp.stop_reason == "tool_use":
                content_blocks = []
                if resp.content:
                    content_blocks.append(TextBlock(text=resp.content))
                for tc in resp.tool_calls:
                    content_blocks.append(
                        ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)
                    )
                messages.append(Message(role="assistant", content=content_blocks))

                result_blocks = []
                for tc in resp.tool_calls:
                    result = execute_tool(tc.name, tc.arguments)
                    print(f"  Tool: {tc.name}({tc.arguments}) -> {result}")
                    result_blocks.append(
                        ToolResultBlock(tool_use_id=tc.id, content=result)
                    )
                messages.append(Message(role="tool", content=result_blocks))

            if turn > 6:
                print("  (safety valve)")
                break

    print(f"\nLoop complete: {turn} turns, {total_tokens} total tokens")
    print("\nThis EXACT same loop works with OpenaiAdapter — just swap the adapter.")

await live_agentic_loop()

---
## Summary

Step 5 built the **OpenAI adapter** and the **StopReason** normalization:

```
types.py                 ->  + StopReason = Literal["end_turn", "tool_use", "max_tokens", "stop_sequence"]
                             + LLMResponse.stop_reason now uses StopReason (not str)
adapters/openai.py       ->  OpenaiAdapter (request building + response parsing)
```

**Key differences from Anthropic adapter:**

| Feature | Anthropic | OpenAI |
|---------|-----------|--------|
| Auth | `x-api-key` header | `Authorization: Bearer` |
| System messages | Extracted to top-level | Stay in messages array |
| Tool key | `input_schema` | `parameters` |
| Tool wrapping | Flat | `{type: "function", function: {...}}` |
| Tool call location | Content blocks | `message.tool_calls[]` |
| Tool call args | Dict (usually) | JSON string (always) |
| Tool results | `role="user"` with blocks | `role="tool"` flattened (1-to-N) |
| Stop reasons | Already canonical | Mapped via `_STOP_REASON_MAP` |
| Response content | `content: [{type, text}]` | `choices[0].message.content` |
| Usage fields | `input_tokens`/`output_tokens` | `prompt_tokens`/`completion_tokens` |
| Reasoning tokens | N/A | `completion_tokens_details.reasoning_tokens` |

**The point:** Agents don't care about any of this. They write:
```python
resp = await adapter.invoke(messages, tools)
if resp.stop_reason == "tool_use":
    # execute tools, loop
```
...and it works with both providers.