# ArcLLM Step 3: Anthropic Adapter + Tool Support

This notebook walks through everything built in Step 3 — the **adapter layer** that translates between ArcLLM's universal types and the Anthropic Messages API.

**What was built:**
- `ArcLLMAPIError` — new exception for HTTP errors from providers
- `BaseAdapter` — shared plumbing all adapters inherit (config, API key, httpx client, context manager)
- `AnthropicAdapter` — request building + response parsing for Anthropic's API

**Why it matters:** This is where the abstraction earns its keep. Agents write `model.invoke(messages, tools)` using universal types. The adapter handles everything provider-specific — headers, system message extraction, `parameters` → `input_schema`, `role="tool"` → `role="user"`, parsing tool calls, etc.

In [None]:
import os
import json
from unittest.mock import AsyncMock
import httpx

from arcllm import (
    AnthropicAdapter, BaseAdapter,
    ArcLLMAPIError, ArcLLMConfigError, ArcLLMError, ArcLLMParseError,
    ProviderConfig, ProviderSettings, ModelMetadata,
    Message, TextBlock, ToolUseBlock, ToolResultBlock, ImageBlock,
    Tool, ToolCall, Usage, LLMResponse, LLMProvider,
)
print("All imports successful!")

In [None]:
# Setup: create a fake config and set a test API key
# We'll use these throughout the notebook
os.environ["ARCLLM_TEST_KEY"] = "sk-test-key-for-walkthrough"

FAKE_MODEL = "claude-test-1"

fake_config = ProviderConfig(
    provider=ProviderSettings(
        api_format="anthropic-messages",
        base_url="https://api.anthropic.com",
        api_key_env="ARCLLM_TEST_KEY",
        default_model=FAKE_MODEL,
        default_temperature=0.7,
    ),
    models={
        FAKE_MODEL: ModelMetadata(
            context_window=200000,
            max_output_tokens=8192,
            supports_tools=True,
            supports_vision=True,
            supports_thinking=True,
            input_modalities=["text", "image"],
            cost_input_per_1m=3.0,
            cost_output_per_1m=15.0,
            cost_cache_read_per_1m=0.3,
            cost_cache_write_per_1m=3.75,
        )
    },
)
print(f"Fake config ready: model={FAKE_MODEL}, key_env=ARCLLM_TEST_KEY")

---
## 1. ArcLLMAPIError

Step 3 added a new exception to the hierarchy:

```
ArcLLMError (base)
├── ArcLLMParseError   (tool call JSON couldn't be parsed)
├── ArcLLMConfigError  (config validation failed)
└── ArcLLMAPIError     (provider returned HTTP error)  ← NEW
```

`ArcLLMAPIError` carries three pieces of info:
- `status_code` — HTTP status (429, 401, 500, etc.)
- `body` — raw response body from the provider
- `provider` — which provider returned the error

This lets agents and the retry module make smart decisions (e.g., 429 → retry, 401 → don't).

In [None]:
# Create an API error
err = ArcLLMAPIError(status_code=429, body="rate limited", provider="anthropic")
print(f"Error message: {err}")
print(f"Status code:   {err.status_code}")
print(f"Body:          {err.body}")
print(f"Provider:      {err.provider}")
print(f"Is ArcLLMError? {isinstance(err, ArcLLMError)}")

In [None]:
# Smart retry logic based on status code
def should_retry(error: ArcLLMAPIError) -> bool:
    """Example: retry on transient errors, don't on auth/validation."""
    if error.status_code == 429:  # Rate limited
        return True
    if error.status_code >= 500:  # Server error
        return True
    return False  # 401, 403, 400 — don't retry

errors = [
    ArcLLMAPIError(429, "rate limited", "anthropic"),
    ArcLLMAPIError(500, "internal error", "anthropic"),
    ArcLLMAPIError(401, "invalid key", "anthropic"),
    ArcLLMAPIError(400, "bad request", "anthropic"),
]

for e in errors:
    print(f"  HTTP {e.status_code}: retry={should_retry(e)}")

---
## 2. BaseAdapter — Shared Plumbing

`BaseAdapter` is a concrete class that all provider adapters inherit from. It handles:

| Responsibility | How |
|----------------|-----|
| Config storage | `_config`, `_model_name`, `_model_meta` |
| API key resolution | Reads from `os.environ` at init, fails fast |
| HTTP client | Creates `httpx.AsyncClient` with 60s timeout |
| Lifecycle | `async with adapter:` context manager, `.close()` |
| Interface | Inherits from `LLMProvider` ABC |

In [None]:
# Create a BaseAdapter — it stores config and resolves the API key
adapter = BaseAdapter(fake_config, FAKE_MODEL)
print(f"Config stored:   {adapter._config.provider.api_format}")
print(f"Model name:      {adapter._model_name}")
print(f"Model meta:      {adapter._model_meta is not None}")
print(f"API key resolved: {adapter._api_key[:10]}...")
print(f"HTTP client:     {type(adapter._client).__name__}")
print(f"Is LLMProvider?  {isinstance(adapter, LLMProvider)}")

In [None]:
# Model metadata is looked up automatically from config
print(f"Model metadata for '{FAKE_MODEL}':")
print(f"  Context window: {adapter._model_meta.context_window:,}")
print(f"  Max output:     {adapter._model_meta.max_output_tokens:,}")
print(f"  Supports tools: {adapter._model_meta.supports_tools}")

# Unknown model name → _model_meta is None (not an error)
adapter_unknown = BaseAdapter(fake_config, "nonexistent-model")
print(f"\nUnknown model meta: {adapter_unknown._model_meta}")

In [None]:
# Security: missing API key → fail fast at init
# (Not during an LLM call 3 hours into a batch job)
saved_key = os.environ.pop("ARCLLM_TEST_KEY")
try:
    BaseAdapter(fake_config, FAKE_MODEL)
except ArcLLMConfigError as e:
    print(f"BLOCKED at init: {e}")
finally:
    os.environ["ARCLLM_TEST_KEY"] = saved_key

In [None]:
# Empty API key is also rejected
saved_key = os.environ["ARCLLM_TEST_KEY"]
os.environ["ARCLLM_TEST_KEY"] = ""
try:
    BaseAdapter(fake_config, FAKE_MODEL)
except ArcLLMConfigError as e:
    print(f"BLOCKED (empty key): {e}")
finally:
    os.environ["ARCLLM_TEST_KEY"] = saved_key

In [None]:
# Async context manager — clean resource lifecycle
import asyncio

async def demo_context_manager():
    async with BaseAdapter(fake_config, FAKE_MODEL) as adapter:
        print(f"Inside context: client={adapter._client is not None}")
    print(f"After context:  client={adapter._client}  (closed!)")

await demo_context_manager()

In [None]:
# close() is idempotent — safe to call multiple times
async def demo_close_twice():
    adapter = BaseAdapter(fake_config, FAKE_MODEL)
    await adapter.close()
    await adapter.close()  # no error
    print("close() called twice — no error")

await demo_close_twice()

---
## 3. AnthropicAdapter — Request Building

The `AnthropicAdapter` translates ArcLLM's universal types into Anthropic's specific API format. Let's see each translation step.

In [None]:
adapter = AnthropicAdapter(fake_config, FAKE_MODEL)
print(f"Adapter name: {adapter.name}")
print(f"Is LLMProvider? {isinstance(adapter, LLMProvider)}")
print(f"Is BaseAdapter? {isinstance(adapter, BaseAdapter)}")

### 3a. Headers

Anthropic requires specific headers: API key, API version, content type.

In [None]:
headers = adapter._build_headers()
print("Request headers:")
for key, value in headers.items():
    display_val = value[:15] + "..." if key == "x-api-key" else value
    print(f"  {key}: {display_val}")

### 3b. System Message Extraction

Anthropic's API takes `system` as a **top-level parameter**, not inside the messages array. ArcLLM uses `role="system"` in the universal Message type, so the adapter must extract it.

Multiple system messages are concatenated with newlines.

In [None]:
# Single system message → extracted as top-level param
messages = [
    Message(role="system", content="You are a helpful assistant."),
    Message(role="user", content="Hello!"),
]

system_text, remaining = adapter._extract_system(messages)
print(f"System text: {system_text!r}")
print(f"Remaining messages: {len(remaining)}")
print(f"  [0] role={remaining[0].role}, content={remaining[0].content!r}")

In [None]:
# Multiple system messages → concatenated
messages = [
    Message(role="system", content="Be concise."),
    Message(role="system", content="Use tools when needed."),
    Message(role="user", content="Hi"),
]

system_text, remaining = adapter._extract_system(messages)
print(f"Concatenated system: {system_text!r}")
print(f"Remaining: {len(remaining)} messages")

In [None]:
# No system message → None (won't be included in request)
messages = [Message(role="user", content="Hi")]
system_text, remaining = adapter._extract_system(messages)
print(f"No system message: {system_text}")

### 3c. Content Block Formatting

Each ArcLLM content block type maps to a specific Anthropic API format.

In [None]:
# TextBlock → simple passthrough
text_formatted = adapter._format_content_block(TextBlock(text="Hello!"))
print(f"TextBlock ->")
print(f"  {json.dumps(text_formatted, indent=2)}")

In [None]:
# ImageBlock → nested source object with base64
img_formatted = adapter._format_content_block(
    ImageBlock(source="base64data...", media_type="image/png")
)
print(f"ImageBlock ->")
print(f"  {json.dumps(img_formatted, indent=2)}")

In [None]:
# ToolUseBlock → 'arguments' becomes 'input' (Anthropic's naming)
tool_formatted = adapter._format_content_block(
    ToolUseBlock(id="toolu_01", name="search", arguments={"query": "cats"})
)
print(f"ToolUseBlock ->")
print(f"  {json.dumps(tool_formatted, indent=2)}")
print(f"\nNote: 'arguments' (ArcLLM) → 'input' (Anthropic)")

In [None]:
# ToolResultBlock with string content
result_str = adapter._format_content_block(
    ToolResultBlock(tool_use_id="toolu_01", content="Found 3 results")
)
print(f"ToolResultBlock (string) ->")
print(f"  {json.dumps(result_str, indent=2)}")

# ToolResultBlock with nested content blocks
result_list = adapter._format_content_block(
    ToolResultBlock(
        tool_use_id="toolu_01",
        content=[TextBlock(text="Result 1"), TextBlock(text="Result 2")],
    )
)
print(f"\nToolResultBlock (list) ->")
print(f"  {json.dumps(result_list, indent=2)}")

### 3d. Role Translation

ArcLLM uses `role="tool"` for tool results. Anthropic expects `role="user"` with tool_result content blocks. The adapter handles this mapping.

In [None]:
# role="tool" → role="user" in the Anthropic request
tool_msg = Message(
    role="tool",
    content=[ToolResultBlock(tool_use_id="t1", content="42")],
)
formatted = adapter._format_message(tool_msg)
print(f"ArcLLM role:    {tool_msg.role!r}")
print(f"Anthropic role: {formatted['role']!r}")
print(f"Content:        {json.dumps(formatted['content'], indent=2)}")

In [None]:
# Other roles pass through unchanged
for role in ["user", "assistant"]:
    msg = Message(role=role, content="test")
    fmt = adapter._format_message(msg)
    print(f"  {role:>10} → {fmt['role']}")

### 3e. Tool Definition Formatting

ArcLLM uses `parameters` (JSON Schema). Anthropic expects `input_schema`. Same data, different key name.

In [None]:
# Tool definition: parameters → input_schema
tool = Tool(
    name="search_database",
    description="Search the knowledge base",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query"},
            "limit": {"type": "integer", "default": 10},
        },
        "required": ["query"],
    },
)

formatted_tool = adapter._format_tool(tool)
print(f"ArcLLM key:    'parameters'")
print(f"Anthropic key: 'input_schema'")
print(f"\nFormatted tool:")
print(json.dumps(formatted_tool, indent=2))
print(f"\n'parameters' in output? {'parameters' in formatted_tool}")
print(f"'input_schema' in output? {'input_schema' in formatted_tool}")

### 3f. Full Request Body

`_build_request_body()` combines everything into the final API payload.

In [None]:
# Simple text request
messages = [Message(role="user", content="What's the weather?")]
body = adapter._build_request_body(messages)
print("Simple request body:")
print(json.dumps(body, indent=2))
print(f"\nNote: no 'system' key (no system message)")
print(f"Note: no 'tools' key (no tools provided)")

In [None]:
# Request with system message + tools
messages = [
    Message(role="system", content="You are a research assistant."),
    Message(role="user", content="Find papers about LLM agents"),
]
tools = [tool]  # from above

body = adapter._build_request_body(messages, tools=tools)
print("Full request body:")
print(json.dumps(body, indent=2))

In [None]:
# kwargs override config defaults
body_default = adapter._build_request_body(messages)
body_override = adapter._build_request_body(messages, max_tokens=1000, temperature=0.0)

print(f"Default:  max_tokens={body_default['max_tokens']}, temperature={body_default['temperature']}")
print(f"Override: max_tokens={body_override['max_tokens']}, temperature={body_override['temperature']}")
print(f"\nOverride chain: kwargs > provider config > model metadata")

---
## 4. AnthropicAdapter — Response Parsing

When Anthropic responds, the adapter parses the raw JSON into ArcLLM's universal `LLMResponse` type.

In [None]:
# Helper: simulate an Anthropic API response
def make_anthropic_response(content_blocks, stop_reason="end_turn",
                            input_tokens=100, output_tokens=50, **extra_usage):
    usage = {"input_tokens": input_tokens, "output_tokens": output_tokens}
    usage.update(extra_usage)
    return {
        "id": "msg_walkthrough",
        "type": "message",
        "role": "assistant",
        "model": FAKE_MODEL,
        "content": content_blocks,
        "stop_reason": stop_reason,
        "usage": usage,
    }

print("Helper ready.")

In [None]:
# Parse a text response
raw = make_anthropic_response([{"type": "text", "text": "The weather is sunny."}])
resp = adapter._parse_response(raw)

print(f"Type:        {type(resp).__name__}")
print(f"Content:     {resp.content!r}")
print(f"Tool calls:  {resp.tool_calls}")
print(f"Stop reason: {resp.stop_reason}")
print(f"Model:       {resp.model}")
print(f"Usage:       {resp.usage.input_tokens}in + {resp.usage.output_tokens}out = {resp.usage.total_tokens} total")

In [None]:
# Parse a tool use response
raw = make_anthropic_response(
    [
        {"type": "tool_use", "id": "toolu_01", "name": "search", "input": {"query": "LLM agents"}},
    ],
    stop_reason="tool_use",
)
resp = adapter._parse_response(raw)

print(f"Content:     {resp.content}  (None — pure tool call)")
print(f"Stop reason: {resp.stop_reason}")
print(f"Tool calls:  {len(resp.tool_calls)}")
print(f"  [0] id={resp.tool_calls[0].id}, name={resp.tool_calls[0].name}")
print(f"      arguments={resp.tool_calls[0].arguments}")

In [None]:
# Parse a mixed response (text + tool calls)
raw = make_anthropic_response(
    [
        {"type": "text", "text": "Let me search for that."},
        {"type": "tool_use", "id": "toolu_02", "name": "search", "input": {"query": "cats"}},
    ],
    stop_reason="tool_use",
)
resp = adapter._parse_response(raw)

print(f"Content:     {resp.content!r}")
print(f"Tool calls:  {len(resp.tool_calls)}")
print(f"Both present — text AND tool calls in same response")

In [None]:
# Parse extended thinking (Claude's chain-of-thought)
raw = make_anthropic_response(
    [
        {"type": "thinking", "thinking": "Let me reason about this step by step..."},
        {"type": "text", "text": "The answer is 42."},
    ]
)
resp = adapter._parse_response(raw)

print(f"Content:  {resp.content!r}")
print(f"Thinking: {resp.thinking!r}")
print(f"\nThinking is separated from content — agents can log it without exposing it.")

In [None]:
# Usage parsing with cache tokens
raw = make_anthropic_response(
    [{"type": "text", "text": "Cached response."}],
    cache_read_input_tokens=1200,
    cache_creation_input_tokens=300,
)
resp = adapter._parse_response(raw)

print(f"Usage:")
print(f"  Input tokens:       {resp.usage.input_tokens}")
print(f"  Output tokens:      {resp.usage.output_tokens}")
print(f"  Total tokens:       {resp.usage.total_tokens}")
print(f"  Cache read tokens:  {resp.usage.cache_read_tokens}")
print(f"  Cache write tokens: {resp.usage.cache_write_tokens}")
print(f"\nNote: Anthropic uses 'cache_read_input_tokens' → ArcLLM normalizes to 'cache_read_tokens'")

In [None]:
# Stop reasons pass through as-is
for reason in ["end_turn", "tool_use", "max_tokens", "stop_sequence"]:
    raw = make_anthropic_response([{"type": "text", "text": "x"}], stop_reason=reason)
    resp = adapter._parse_response(raw)
    print(f"  {reason:<15} → resp.stop_reason = {resp.stop_reason!r}")

In [None]:
# Raw response preserved for debugging
raw = make_anthropic_response([{"type": "text", "text": "Hi"}])
resp = adapter._parse_response(raw)

print(f"resp.raw is the original dict: {resp.raw is raw}")
print(f"resp.raw['id'] = {resp.raw['id']}")
print(f"\nUseful for debugging, but never logged in production.")

---
## 5. Tool Call Parsing Edge Cases

Tool call arguments can arrive as a dict (normal) or a string (edge case). The adapter handles both, and raises `ArcLLMParseError` on garbage.

In [None]:
# Normal case: arguments as dict (pass-through)
tc = adapter._parse_tool_call({
    "type": "tool_use", "id": "t1", "name": "calc",
    "input": {"expression": "2+2"}
})
print(f"Dict input → arguments: {tc.arguments}")
print(f"Type: {type(tc.arguments).__name__}")

In [None]:
# Edge case: arguments as JSON string (json.loads it)
tc = adapter._parse_tool_call({
    "type": "tool_use", "id": "t1", "name": "calc",
    "input": '{"expression": "2+2"}'
})
print(f"String input → parsed to dict: {tc.arguments}")
print(f"Type: {type(tc.arguments).__name__}")

In [None]:
# Bad JSON string → ArcLLMParseError with raw data preserved
try:
    adapter._parse_tool_call({
        "type": "tool_use", "id": "t1", "name": "calc",
        "input": "not valid json {{{"
    })
except ArcLLMParseError as e:
    print(f"ArcLLMParseError: {e}")
    print(f"Raw string:      {e.raw_string!r}")
    print(f"Original error:  {type(e.original_error).__name__}")

In [None]:
# Unexpected type (not dict, not string) → ArcLLMParseError
try:
    adapter._parse_tool_call({
        "type": "tool_use", "id": "t1", "name": "calc",
        "input": 12345
    })
except ArcLLMParseError as e:
    print(f"Unexpected type caught: {e}")

---
## 6. Full invoke() Cycle (Mocked)

Let's simulate the full `invoke()` flow — building the request, sending it (mocked), and parsing the response. No real API calls.

In [None]:
# Simulate a text conversation
async def demo_text_invoke():
    adapter = AnthropicAdapter(fake_config, FAKE_MODEL)

    # Mock the HTTP client to return a fake response
    response_data = make_anthropic_response(
        [{"type": "text", "text": "Austin is 75F and sunny."}]
    )
    mock_response = httpx.Response(
        200, json=response_data,
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_response)

    # Call invoke() just like an agent would
    resp = await adapter.invoke([
        Message(role="system", content="You are a weather bot."),
        Message(role="user", content="What's the weather in Austin?"),
    ])

    print(f"Response type: {type(resp).__name__}")
    print(f"Content:       {resp.content}")
    print(f"Stop reason:   {resp.stop_reason}")
    print(f"Tokens:        {resp.usage.total_tokens}")

await demo_text_invoke()

In [None]:
# Simulate a tool use conversation
async def demo_tool_invoke():
    adapter = AnthropicAdapter(fake_config, FAKE_MODEL)

    response_data = make_anthropic_response(
        [
            {"type": "text", "text": "Let me look that up."},
            {"type": "tool_use", "id": "toolu_abc", "name": "get_weather",
             "input": {"city": "Austin", "units": "fahrenheit"}},
        ],
        stop_reason="tool_use",
    )
    mock_response = httpx.Response(
        200, json=response_data,
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_response)

    # Define tools
    tools = [
        Tool(
            name="get_weather",
            description="Get current weather for a city",
            parameters={
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        )
    ]

    resp = await adapter.invoke(
        [Message(role="user", content="What's the weather in Austin?")],
        tools=tools,
    )

    print(f"Content:     {resp.content!r}")
    print(f"Stop reason: {resp.stop_reason}")
    print(f"Tool calls:  {len(resp.tool_calls)}")
    for tc in resp.tool_calls:
        print(f"  {tc.name}({tc.arguments})")

await demo_tool_invoke()

In [None]:
# Simulate an HTTP error from the API
async def demo_error_invoke():
    adapter = AnthropicAdapter(fake_config, FAKE_MODEL)

    mock_response = httpx.Response(
        429, text='{"error": {"message": "Rate limit exceeded"}}',
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_response)

    try:
        await adapter.invoke([Message(role="user", content="Hi")])
    except ArcLLMAPIError as e:
        print(f"Caught: {e}")
        print(f"  status_code: {e.status_code}")
        print(f"  provider:    {e.provider}")
        print(f"  body:        {e.body}")

await demo_error_invoke()

---
## 7. Simulated Agentic Tool-Calling Loop

This is the core pattern ArcLLM is built for. Let's walk through a complete tool-calling loop with mocked responses.

In [None]:
async def demo_agentic_loop():
    """Simulate a full agentic tool-calling loop."""
    adapter = AnthropicAdapter(fake_config, FAKE_MODEL)

    # Define a tool
    tools = [
        Tool(
            name="get_weather",
            description="Get weather for a city",
            parameters={
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        )
    ]

    # Agent manages the conversation
    messages = [
        Message(role="system", content="Use tools to answer questions."),
        Message(role="user", content="What's the weather in Austin?"),
    ]

    # --- Turn 1: LLM wants to use a tool ---
    turn1_data = make_anthropic_response(
        [{"type": "tool_use", "id": "toolu_01", "name": "get_weather",
          "input": {"city": "Austin"}}],
        stop_reason="tool_use",
    )
    mock_resp1 = httpx.Response(
        200, json=turn1_data,
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    )
    adapter._client = AsyncMock()
    adapter._client.post = AsyncMock(return_value=mock_resp1)

    resp = await adapter.invoke(messages, tools=tools)
    print(f"Turn 1: stop_reason={resp.stop_reason}")
    print(f"  LLM wants to call: {resp.tool_calls[0].name}({resp.tool_calls[0].arguments})")

    # Agent executes the tool
    tool_result = "75°F, sunny, humidity 45%"  # simulated
    print(f"  Agent executes tool → result: {tool_result!r}")

    # Agent adds assistant response + tool result to conversation
    messages.append(Message(
        role="assistant",
        content=[ToolUseBlock(id="toolu_01", name="get_weather", arguments={"city": "Austin"})],
    ))
    messages.append(Message(
        role="tool",
        content=[ToolResultBlock(tool_use_id="toolu_01", content=tool_result)],
    ))

    # --- Turn 2: LLM gives final answer ---
    turn2_data = make_anthropic_response(
        [{"type": "text", "text": "The weather in Austin is 75°F and sunny with 45% humidity."}],
        stop_reason="end_turn",
    )
    mock_resp2 = httpx.Response(
        200, json=turn2_data,
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    )
    adapter._client.post = AsyncMock(return_value=mock_resp2)

    resp = await adapter.invoke(messages, tools=tools)
    print(f"\nTurn 2: stop_reason={resp.stop_reason}")
    print(f"  Final answer: {resp.content}")
    print(f"\nLoop complete! {len(messages)} messages in conversation.")

await demo_agentic_loop()

---
## 8. What the Adapter Hides From You

Here's a summary of every Anthropic quirk the adapter handles so agents don't have to.

In [None]:
quirks = [
    ("System messages",
     "role='system' in message list",
     "top-level 'system' param, not in messages"),
    ("Tool definitions",
     "Tool.parameters (JSON Schema)",
     "'input_schema' key instead of 'parameters'"),
    ("Tool results",
     "role='tool'",
     "role='user' with tool_result content blocks"),
    ("Tool call args",
     "ToolCall.arguments (always dict)",
     "'input' key, can be dict OR string"),
    ("Headers",
     "(handled internally)",
     "x-api-key + anthropic-version required"),
    ("Usage tokens",
     "Usage.cache_read_tokens",
     "'cache_read_input_tokens' (different name)"),
    ("Thinking",
     "LLMResponse.thinking (separate field)",
     "'thinking' content block type"),
]

print(f"{'Quirk':<20} {'ArcLLM (universal)':<40} {'Anthropic (specific)'}")
print("-" * 100)
for quirk, arcllm, anthropic in quirks:
    print(f"{quirk:<20} {arcllm:<40} {anthropic}")

---
## 9. Live API Calls (Real Anthropic)

Everything above used mocks. Now let's hit the real Anthropic API using the key from `.env`.

These cells load the actual provider config from TOML and make real HTTP calls.

---
## Summary

Step 3 built the **adapter layer**:

```
exceptions.py            ->  + ArcLLMAPIError (status_code, body, provider)
adapters/base.py         ->  BaseAdapter (config, key, httpx client, context manager)
adapters/anthropic.py    ->  AnthropicAdapter (request building + response parsing)
```

**Key design decisions:**
- Config object injection (adapter doesn't know how config was loaded)
- API key resolved at init (fail-fast, not during LLM call)
- Private methods per concern (each independently testable)
- Adapter owns its httpx client (connection reuse across loop iterations)
- `ArcLLMAPIError` carries status code for smart retry decisions
- `BaseAdapter` is concrete (not abstract) — DRY for shared plumbing
- Tool call parsing: dict pass-through, string → json.loads, garbage → ArcLLMParseError
- All 38 tests pass with mocked HTTP responses (no real API calls)

In [None]:
# Load the real API key from .env and the real provider config from TOML
from dotenv import load_dotenv
from arcllm import load_provider_config

load_dotenv()

real_config = load_provider_config("anthropic")
real_model = real_config.provider.default_model

print(f"Provider:  {real_config.provider.api_format}")
print(f"Base URL:  {real_config.provider.base_url}")
print(f"Model:     {real_model}")
print(f"API key:   {os.environ[real_config.provider.api_key_env][:12]}...")
print(f"Context:   {real_config.models[real_model].context_window:,} tokens")

### 9a. Simple Text Call

The most basic call — send a message, get text back.

In [None]:
# Real text call
async def live_text_call():
    async with AnthropicAdapter(real_config, real_model) as adapter:
        resp = await adapter.invoke(
            [Message(role="user", content="What is 2 + 2? Answer in exactly one word.")],
            max_tokens=10,
            temperature=0.0,
        )

    print(f"Content:     {resp.content!r}")
    print(f"Model:       {resp.model}")
    print(f"Stop reason: {resp.stop_reason}")
    print(f"Tokens:      {resp.usage.input_tokens}in + {resp.usage.output_tokens}out = {resp.usage.total_tokens}")
    print(f"Cache read:  {resp.usage.cache_read_tokens}")
    print(f"Raw keys:    {list(resp.raw.keys())}")

await live_text_call()

### 9b. System Message + Temperature Override

Show that system messages work and kwargs override config defaults.

In [None]:
# System message + deterministic temperature
async def live_system_message():
    async with AnthropicAdapter(real_config, real_model) as adapter:
        resp = await adapter.invoke(
            [
                Message(role="system", content="You are a pirate. Respond in pirate speak."),
                Message(role="user", content="How do I make a sandwich?"),
            ],
            max_tokens=100,
            temperature=0.0,
        )

    print(f"System message worked — pirate response:")
    print(f"  {resp.content}")
    print(f"\nTokens: {resp.usage.total_tokens}")

await live_system_message()

### 9c. Real Tool Call

Give the model a tool and watch it decide to use it. Then feed back the result and get a final answer.

In [None]:
# Real tool call — the model will decide to use the tool
async def live_tool_call():
    # Define a calculator tool
    calc_tool = Tool(
        name="calculate",
        description="Evaluate a mathematical expression and return the result",
        parameters={
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The math expression to evaluate, e.g. '(15 * 7) + 23'"
                }
            },
            "required": ["expression"],
        },
    )

    async with AnthropicAdapter(real_config, real_model) as adapter:
        # Turn 1: ask a math question — model should call the tool
        messages = [
            Message(role="user", content="What is 847 * 293? Use the calculator tool."),
        ]

        print("--- Turn 1: invoke() with tool ---")
        resp = await adapter.invoke(messages, tools=[calc_tool], max_tokens=200, temperature=0.0)
        print(f"Stop reason: {resp.stop_reason}")
        print(f"Content:     {resp.content!r}")
        print(f"Tool calls:  {len(resp.tool_calls)}")

        if resp.tool_calls:
            tc = resp.tool_calls[0]
            print(f"  Tool: {tc.name}")
            print(f"  Args: {tc.arguments}")
            print(f"  ID:   {tc.id}")

            # Agent executes the tool (we'll just eval it)
            expr = tc.arguments.get("expression", "0")
            try:
                result = str(eval(expr))  # in real code, use a safe evaluator!
            except Exception:
                result = "Error evaluating expression"
            print(f"\n  Agent executes: eval({expr!r}) = {result}")

            # Turn 2: feed result back, get final answer
            messages.append(Message(
                role="assistant",
                content=[ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)],
            ))
            messages.append(Message(
                role="tool",
                content=[ToolResultBlock(tool_use_id=tc.id, content=result)],
            ))

            print("\n--- Turn 2: invoke() with tool result ---")
            resp2 = await adapter.invoke(messages, tools=[calc_tool], max_tokens=200, temperature=0.0)
            print(f"Stop reason: {resp2.stop_reason}")
            print(f"Content:     {resp2.content!r}")
            print(f"Total tokens across both turns: {resp.usage.total_tokens + resp2.usage.total_tokens}")

await live_tool_call()

### 9d. Multi-Turn Agentic Loop (Real)

A complete agentic loop with multiple tools — the model decides which tool to call, the agent executes it, and the loop continues until `stop_reason == "end_turn"`.

In [None]:
# Full agentic loop — runs until the model says "end_turn"
async def live_agentic_loop():
    # Two tools: a lookup and a calculator
    tools = [
        Tool(
            name="lookup_price",
            description="Look up the price of an item in USD",
            parameters={
                "type": "object",
                "properties": {
                    "item": {"type": "string", "description": "The item to look up"}
                },
                "required": ["item"],
            },
        ),
        Tool(
            name="calculate",
            description="Evaluate a math expression",
            parameters={
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression"}
                },
                "required": ["expression"],
            },
        ),
    ]

    # Fake database for the lookup tool
    prices = {"coffee": 4.50, "bagel": 3.25, "orange juice": 5.00}

    def execute_tool(name: str, arguments: dict) -> str:
        if name == "lookup_price":
            item = arguments["item"].lower()
            price = prices.get(item)
            return f"${price:.2f}" if price else f"Item '{item}' not found"
        elif name == "calculate":
            try:
                return str(eval(arguments["expression"]))
            except Exception as e:
                return f"Error: {e}"
        return "Unknown tool"

    messages = [
        Message(role="system", content="Use tools to answer questions. Be concise."),
        Message(role="user", content="I want 2 coffees and 1 bagel. What's the total?"),
    ]

    total_tokens = 0
    turn = 0

    async with AnthropicAdapter(real_config, real_model) as adapter:
        while True:
            turn += 1
            resp = await adapter.invoke(messages, tools=tools, max_tokens=300, temperature=0.0)
            total_tokens += resp.usage.total_tokens

            print(f"--- Turn {turn}: stop_reason={resp.stop_reason} ---")

            if resp.stop_reason == "end_turn":
                print(f"  Final answer: {resp.content}")
                break

            if resp.stop_reason == "tool_use":
                # Add the assistant's tool-use message to conversation
                content_blocks = []
                if resp.content:
                    content_blocks.append(TextBlock(text=resp.content))
                for tc in resp.tool_calls:
                    content_blocks.append(
                        ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)
                    )
                messages.append(Message(role="assistant", content=content_blocks))

                # Execute each tool and add results
                result_blocks = []
                for tc in resp.tool_calls:
                    result = execute_tool(tc.name, tc.arguments)
                    print(f"  Tool: {tc.name}({tc.arguments}) → {result}")
                    result_blocks.append(
                        ToolResultBlock(tool_use_id=tc.id, content=result)
                    )
                messages.append(Message(role="tool", content=result_blocks))

            if turn > 5:  # safety valve
                print("  (max turns reached)")
                break

    print(f"\nLoop complete: {turn} turns, {len(messages)} messages, {total_tokens} total tokens")

await live_agentic_loop()

### 9e. Inspect the Raw Response

The `raw` field on `LLMResponse` gives you the full provider response for debugging.

In [None]:
# Inspect the raw Anthropic response
async def live_raw_response():
    async with AnthropicAdapter(real_config, real_model) as adapter:
        resp = await adapter.invoke(
            [Message(role="user", content="Say 'hello' and nothing else.")],
            max_tokens=10,
            temperature=0.0,
        )

    print("ArcLLM LLMResponse (normalized):")
    print(f"  content:     {resp.content!r}")
    print(f"  stop_reason: {resp.stop_reason}")
    print(f"  model:       {resp.model}")

    print(f"\nRaw Anthropic response (what the API actually returned):")
    print(json.dumps(resp.raw, indent=2))

await live_raw_response()