# ArcLLM Step 7: Module System + Retry + Fallback

This notebook walks through everything built in Step 7 — the **module system** that adds opt-in behaviors (retry, fallback) by wrapping adapters in middleware layers.

**What was built:**
- `BaseModule` — transparent wrapper foundation all modules inherit from
- `RetryModule` — exponential backoff with jitter on transient failures (429, 500, 502, 503, 529, connection errors)
- `FallbackModule` — automatic provider chain switching on failure
- `load_model()` updated — module kwargs (`retry=True`, `fallback={...}`) enable wrapping

**Why it matters:** In production with thousands of concurrent agents, transient failures are guaranteed. Rate limiting (429), server errors (500), network blips — they all happen. Without retry/fallback, every agent crashes. With these modules, agents survive transparently:

```python
model = load_model("anthropic", retry=True, fallback={"chain": ["openai"]})
resp = await model.invoke(messages)  # retries and falls back automatically
```

The agent code doesn't change. The modules wrap `invoke()` and handle failures invisibly.

In [None]:
import os
import json
import asyncio
import logging
from unittest.mock import AsyncMock, MagicMock, patch
import httpx

from arcllm import (
    load_model, clear_cache,
    AnthropicAdapter, OpenaiAdapter, BaseAdapter,
    ArcLLMAPIError, ArcLLMConfigError,
    Message, Tool, LLMResponse, LLMProvider, Usage, StopReason,
)
from arcllm.modules import BaseModule, RetryModule, FallbackModule

print("All imports successful — including BaseModule, RetryModule, FallbackModule!")

In [None]:
# Setup: API keys for adapter construction
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-test-key-for-walkthrough"
os.environ["OPENAI_API_KEY"] = "sk-openai-test-key-for-walkthrough"
clear_cache()

# Helper: mock inner provider
def make_inner(side_effects, name="test-provider"):
    """Create a mock LLMProvider with specified invoke() side effects."""
    inner = MagicMock(spec=LLMProvider)
    inner.name = name
    inner.model_name = "test-model"
    inner.validate_config.return_value = True
    inner.invoke = AsyncMock(side_effect=side_effects)
    return inner

# Helper: standard OK response
OK = LLMResponse(
    content="Success!",
    usage=Usage(input_tokens=10, output_tokens=5, total_tokens=15),
    model="test-model",
    stop_reason="end_turn",
)

# Helper: create API errors
def api_error(status_code, provider="anthropic"):
    return ArcLLMAPIError(status_code=status_code, body=f"HTTP {status_code}", provider=provider)

print("Helpers ready.")

---
## 1. The Module Pattern: Wrapper Classes (Middleware)

Each module is a wrapper that implements `LLMProvider` and wraps an inner `LLMProvider`. This is the decorator pattern — modules stack around the adapter.

```
Agent calls invoke()
       ↓
RetryModule.invoke()      ← outermost: handles retries
       ↓
FallbackModule.invoke()   ← middle: handles provider switching
       ↓
AnthropicAdapter.invoke() ← innermost: actual HTTP call
```

The agent only sees `LLMProvider.invoke()`. It doesn't know modules exist.

In [None]:
# BaseModule: transparent wrapper — delegates everything to inner
inner = make_inner([OK])
module = BaseModule({}, inner)

print(f"BaseModule is LLMProvider: {isinstance(module, LLMProvider)}")
print(f"module.name:              {module.name}  (delegates to inner)")
print(f"module.validate_config(): {module.validate_config()}  (delegates to inner)")

# invoke() passes through
messages = [Message(role="user", content="hi")]
result = await module.invoke(messages)
print(f"\ninvoke() result: {result.content!r}  (from inner, unchanged)")
print(f"inner.invoke called: {inner.invoke.await_count} time(s)")

In [None]:
# Modules stack — each wraps the previous
inner = make_inner([OK])
fallback = FallbackModule({"chain": []}, inner)
retry = RetryModule({}, fallback)

print("Stacking: Retry -> Fallback -> Inner")
print(f"  retry._inner         = {type(retry._inner).__name__}")
print(f"  fallback._inner      = {type(fallback._inner).__name__}")
print(f"  All are LLMProvider: {all(isinstance(x, LLMProvider) for x in [retry, fallback, inner])}")

---
## 2. RetryModule — Transient Failure Recovery

The `RetryModule` wraps `invoke()` with a retry loop. When a transient error occurs, it waits with exponential backoff + jitter and tries again.

### What's retryable?

| Error | Retryable? | Why |
|-------|------------|-----|
| HTTP 429 | Yes | Rate limited — wait and retry |
| HTTP 500 | Yes | Server error — transient |
| HTTP 502 | Yes | Bad gateway — transient |
| HTTP 503 | Yes | Service unavailable — transient |
| HTTP 529 | Yes | Anthropic overload — transient |
| Connection error | Yes | Network blip |
| Timeout | Yes | Server slow |
| HTTP 400 | **No** | Bad request — won't fix itself |
| HTTP 401 | **No** | Auth error — wrong key |
| HTTP 403 | **No** | Forbidden — won't fix itself |
| ValueError etc. | **No** | Programming error |

In [None]:
# Success on first try — no retry needed
inner = make_inner([OK])
module = RetryModule({"backoff_base_seconds": 0.01}, inner)

result = await module.invoke(messages)
print(f"First try succeeded: {result.content!r}")
print(f"Attempts: {inner.invoke.await_count}")

In [None]:
# Retry on 429 (rate limited) — fail once, succeed on retry
inner = make_inner([api_error(429), OK])
module = RetryModule({"backoff_base_seconds": 0.01}, inner)

result = await module.invoke(messages)
print(f"429 -> retry -> success: {result.content!r}")
print(f"Attempts: {inner.invoke.await_count}")

In [None]:
# Retry on all transient codes
for code in [429, 500, 502, 503, 529]:
    inner = make_inner([api_error(code), OK])
    module = RetryModule({"backoff_base_seconds": 0.001}, inner)
    result = await module.invoke(messages)
    print(f"  HTTP {code} -> retry -> {result.content!r}  (attempts: {inner.invoke.await_count})")

In [None]:
# Retry on connection errors and timeouts
for error in [httpx.ConnectError("refused"), httpx.ReadTimeout("timed out")]:
    inner = make_inner([error, OK])
    module = RetryModule({"backoff_base_seconds": 0.001}, inner)
    result = await module.invoke(messages)
    print(f"  {type(error).__name__} -> retry -> {result.content!r}")

In [None]:
# NO retry on non-transient errors — raised immediately
for code in [400, 401, 403]:
    inner = make_inner([api_error(code)])
    module = RetryModule({"backoff_base_seconds": 0.001}, inner)
    try:
        await module.invoke(messages)
    except ArcLLMAPIError as e:
        print(f"  HTTP {code} -> NOT retried (attempts: {inner.invoke.await_count})")

# Non-API errors pass through too
inner = make_inner([ValueError("bad value")])
module = RetryModule({"backoff_base_seconds": 0.001}, inner)
try:
    await module.invoke(messages)
except ValueError:
    print(f"  ValueError -> NOT retried (attempts: {inner.invoke.await_count})")

---
## 3. Retry Exhaustion — When All Retries Fail

After `max_retries` attempts, the **last error** is raised. The original error type is preserved.

In [None]:
# max_retries=3 means 4 attempts total (1 initial + 3 retries)
config = {"max_retries": 3, "backoff_base_seconds": 0.001}
inner = make_inner([api_error(429)] * 4)  # all 4 fail
module = RetryModule(config, inner)

try:
    await module.invoke(messages)
except ArcLLMAPIError as e:
    print(f"All retries exhausted!")
    print(f"  Attempts:     {inner.invoke.await_count}")
    print(f"  Error type:   {type(e).__name__}")
    print(f"  Status code:  {e.status_code}")
    print(f"  Provider:     {e.provider}")

In [None]:
# Error type preserved — connection error stays a connection error
inner = make_inner([httpx.ConnectError("refused")] * 4)
module = RetryModule(config, inner)

try:
    await module.invoke(messages)
except httpx.ConnectError as e:
    print(f"Original error type preserved: {type(e).__name__}")
    print(f"Agents can catch the specific type for custom handling.")

---
## 4. Exponential Backoff + Jitter

The retry wait time uses **exponential backoff with proportional jitter**:

```
backoff = base_seconds * 2^attempt
jitter  = random.uniform(0, backoff)  # proportional to backoff
wait    = min(backoff + jitter, max_wait)
```

**Why jitter?** With thousands of agents hitting the same rate limit, they'd all retry at the exact same time (thundering herd). Jitter spreads them out.

In [None]:
# Demonstrate exponential backoff (with jitter=0 for clarity)
config = {
    "max_retries": 4,
    "backoff_base_seconds": 1.0,
    "max_wait_seconds": 100.0,
}
inner = make_inner([api_error(429)] * 4 + [OK])
module = RetryModule(config, inner)

with patch("arcllm.modules.retry.asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
    with patch("arcllm.modules.retry.random.uniform", return_value=0.0):  # no jitter
        await module.invoke(messages)

waits = [call.args[0] for call in mock_sleep.await_args_list]
print("Exponential backoff (base=1.0, jitter=0):")
for i, wait in enumerate(waits):
    formula = f"1.0 * 2^{i}"
    print(f"  Attempt {i+1}: wait {wait:.1f}s  ({formula} = {1.0 * 2**i:.1f})")

In [None]:
# Backoff capped at max_wait_seconds
config = {
    "max_retries": 3,
    "backoff_base_seconds": 10.0,
    "max_wait_seconds": 15.0,  # cap
}
inner = make_inner([api_error(500)] * 3 + [OK])
module = RetryModule(config, inner)

with patch("arcllm.modules.retry.asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
    with patch("arcllm.modules.retry.random.uniform", return_value=0.0):
        await module.invoke(messages)

waits = [call.args[0] for call in mock_sleep.await_args_list]
print(f"Capped backoff (base=10, max_wait=15):")
for i, wait in enumerate(waits):
    uncapped = 10.0 * 2**i
    capped = "(capped!)" if uncapped > 15 else ""
    print(f"  Attempt {i+1}: wait {wait:.1f}s  (uncapped: {uncapped:.1f}s {capped})")

In [None]:
# Jitter in action — proportional to backoff
module2 = RetryModule(
    {"max_retries": 3, "backoff_base_seconds": 1.0, "max_wait_seconds": 100.0},
    make_inner([api_error(429)] * 3 + [OK]),
)

# Return 50% of backoff as jitter
with patch("arcllm.modules.retry.asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
    with patch("arcllm.modules.retry.random.uniform", return_value=0.5):
        await module2.invoke(messages)

waits = [call.args[0] for call in mock_sleep.await_args_list]
print(f"Jitter = 50% of backoff:")
for i, wait in enumerate(waits):
    base = 1.0 * 2**i
    print(f"  Attempt {i+1}: backoff={base:.1f} + jitter=0.5 = {wait:.1f}s")

print(f"\nJitter prevents thundering herd with thousands of concurrent agents.")

---
## 5. Retry-After Header Support

When a provider sends a `Retry-After` header (via `ArcLLMAPIError.retry_after`), the module honors it instead of calculating backoff. Still capped at `max_wait_seconds`.

In [None]:
# Retry-After honored
config = {"max_retries": 1, "backoff_base_seconds": 1.0, "max_wait_seconds": 100.0}
error_with_retry_after = ArcLLMAPIError(
    status_code=429, body="rate limited", provider="anthropic", retry_after=5.0
)
inner = make_inner([error_with_retry_after, OK])
module = RetryModule(config, inner)

with patch("arcllm.modules.retry.asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
    await module.invoke(messages)

actual_wait = mock_sleep.await_args_list[0].args[0]
print(f"Retry-After: 5.0s")
print(f"Actual wait: {actual_wait}s  (honored the header, not calculated)")

In [None]:
# Retry-After capped at max_wait_seconds
config = {"max_retries": 1, "backoff_base_seconds": 1.0, "max_wait_seconds": 3.0}
error = ArcLLMAPIError(status_code=429, body="rate limited", provider="test", retry_after=10.0)
inner = make_inner([error, OK])
module = RetryModule(config, inner)

with patch("arcllm.modules.retry.asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
    await module.invoke(messages)

actual = mock_sleep.await_args_list[0].args[0]
print(f"Retry-After: 10.0s, max_wait: 3.0s")
print(f"Actual wait: {actual}s  (capped at max_wait)")

---
## 6. Retry Config Customization

Every aspect of retry is configurable.

In [None]:
# Custom max_retries
inner = make_inner([api_error(429)] * 5)
module = RetryModule({"max_retries": 1, "backoff_base_seconds": 0.001}, inner)

try:
    await module.invoke(messages)
except ArcLLMAPIError:
    print(f"max_retries=1 -> {inner.invoke.await_count} total attempts (1 initial + 1 retry)")

In [None]:
# Custom retryable codes — only retry 503
config = {
    "max_retries": 3,
    "backoff_base_seconds": 0.001,
    "retryable_status_codes": [503],  # ONLY 503
}

# 429 should NOT be retried
inner = make_inner([api_error(429)])
module = RetryModule(config, inner)
try:
    await module.invoke(messages)
except ArcLLMAPIError:
    print(f"429 with custom codes: {inner.invoke.await_count} attempt (NOT retried)")

# 503 SHOULD be retried
inner = make_inner([api_error(503), OK])
module = RetryModule(config, inner)
result = await module.invoke(messages)
print(f"503 with custom codes: {inner.invoke.await_count} attempts (retried -> success)")

In [None]:
# Config validation — bad values rejected at construction
inner = make_inner([OK])

for config, expected_error in [
    ({"max_retries": -1}, "max_retries must be >= 0"),
    ({"backoff_base_seconds": 0}, "backoff_base_seconds must be > 0"),
    ({"max_wait_seconds": 0}, "max_wait_seconds must be > 0"),
]:
    try:
        RetryModule(config, inner)
    except ArcLLMConfigError as e:
        print(f"  {config} -> REJECTED: {e}")

# max_retries=0 is valid (1 attempt, no retries)
module = RetryModule({"max_retries": 0}, inner)
print(f"\n  max_retries=0 -> ALLOWED (1 attempt, no retries)")

---
## 7. FallbackModule — Provider Chain Switching

When the primary provider fails (any exception), the `FallbackModule` walks a **chain** of alternative providers, creating each on-demand via `load_model()`.

```
invoke() -> primary fails
         -> try chain[0] (e.g., "openai") -> load_model("openai") -> invoke()
         -> chain[0] fails
         -> try chain[1] (e.g., "ollama") -> load_model("ollama") -> invoke()
         -> all fail? raise PRIMARY error (not the last fallback error)
```

**Key:** Fallback adapters are created **on-demand** — no wasted memory for unused providers.

In [None]:
# Primary succeeds — fallback never touched
inner = make_inner([OK])
module = FallbackModule({"chain": ["openai"]}, inner)

with patch("arcllm.modules.fallback.load_model") as mock_load:
    result = await module.invoke(messages)
    print(f"Primary succeeded: {result.content!r}")
    print(f"load_model called: {mock_load.call_count} times (fallback not needed)")

In [None]:
# Primary fails -> first fallback succeeds
inner = make_inner([api_error(500)])
fallback_response = LLMResponse(
    content="Fallback response from OpenAI!",
    usage=Usage(input_tokens=10, output_tokens=5, total_tokens=15),
    model="gpt-4o",
    stop_reason="end_turn",
)
fallback_inner = make_inner([fallback_response])

module = FallbackModule({"chain": ["openai"]}, inner)

with patch("arcllm.modules.fallback.load_model", return_value=fallback_inner) as mock_load:
    result = await module.invoke(messages)
    print(f"Primary failed (500), fallback succeeded!")
    print(f"  Response: {result.content!r}")
    print(f"  Model:    {result.model}")
    print(f"  load_model called with: {mock_load.call_args_list}")

In [None]:
# Primary fails -> first fallback fails -> second fallback succeeds
inner = make_inner([api_error(500)])
fallback_1 = make_inner([api_error(503)])
fallback_2_response = LLMResponse(
    content="Third time's the charm!",
    usage=Usage(input_tokens=10, output_tokens=5, total_tokens=15),
    model="ollama-local", stop_reason="end_turn",
)
fallback_2 = make_inner([fallback_2_response])

module = FallbackModule({"chain": ["openai", "ollama"]}, inner)

with patch("arcllm.modules.fallback.load_model", side_effect=[fallback_1, fallback_2]) as mock_load:
    result = await module.invoke(messages)
    print(f"Primary: 500, Fallback 1 (openai): 503, Fallback 2 (ollama): success!")
    print(f"  Response: {result.content!r}")
    print(f"  Chain walked: {[call.args[0] for call in mock_load.call_args_list]}")

In [None]:
# All fallbacks fail -> raises PRIMARY error (not the last fallback error)
primary_error = ArcLLMAPIError(status_code=500, body="primary broke", provider="anthropic")
inner = make_inner([primary_error])
fallback_inner = make_inner([api_error(503)])

module = FallbackModule({"chain": ["openai"]}, inner)

with patch("arcllm.modules.fallback.load_model", return_value=fallback_inner):
    try:
        await module.invoke(messages)
    except ArcLLMAPIError as e:
        print(f"All fallbacks exhausted!")
        print(f"  Raised error: HTTP {e.status_code} from {e.provider}")
        print(f"  Body: {e.body!r}")
        print(f"  This is the PRIMARY error, not the fallback error.")

In [None]:
# Empty chain — error passes through immediately
inner = make_inner([api_error(500)])
module = FallbackModule({"chain": []}, inner)

try:
    await module.invoke(messages)
except ArcLLMAPIError as e:
    print(f"Empty chain: error passes through (HTTP {e.status_code})")

In [None]:
# Chain length validated — max 10 providers
inner = make_inner([OK])
try:
    FallbackModule({"chain": [f"p{i}" for i in range(11)]}, inner)
except ArcLLMConfigError as e:
    print(f"Chain too long: {e}")

# 10 is allowed
module = FallbackModule({"chain": [f"p{i}" for i in range(10)]}, inner)
print(f"10 providers: allowed (len={len(module._chain)})")

---
## 8. Fallback Cleanup — Adapters Closed After Use

Fallback adapters are created on-demand and **closed after each use** (whether the fallback succeeded or failed). No connection pool leaks.

In [None]:
# Verify fallback adapter closed after success
inner = make_inner([api_error(500)])
fallback_inner = make_inner([fallback_response])
fallback_inner.close = AsyncMock()

module = FallbackModule({"chain": ["openai"]}, inner)

with patch("arcllm.modules.fallback.load_model", return_value=fallback_inner):
    await module.invoke(messages)

print(f"Fallback adapter close() called: {fallback_inner.close.await_count} time(s)")
print("Cleanup happens in a 'finally' block — even on failure.")

---
## 9. Module Stacking via load_model()

The updated `load_model()` reads module config and wraps the adapter automatically.

```python
# Simple — no modules
model = load_model("anthropic")

# With retry
model = load_model("anthropic", retry=True)

# With custom retry config
model = load_model("anthropic", retry={"max_retries": 5})

# With fallback
model = load_model("anthropic", fallback={"chain": ["openai"]})

# Both — stacking order: Retry(Fallback(Adapter))
model = load_model("anthropic", retry=True, fallback={"chain": ["openai"]})
```

In [None]:
clear_cache()

# No modules — returns adapter directly
model = load_model("anthropic")
print(f"No modules:        {type(model).__name__}")

# retry=True — wraps with RetryModule
model = load_model("anthropic", retry=True)
print(f"retry=True:        {type(model).__name__} -> {type(model._inner).__name__}")

# fallback=dict — wraps with FallbackModule
model = load_model("anthropic", fallback={"chain": ["openai"]})
print(f"fallback={{chain}}: {type(model).__name__} -> {type(model._inner).__name__}")

In [None]:
# Both modules — stacking order: Retry(Fallback(Adapter))
clear_cache()
model = load_model("anthropic", retry=True, fallback={"chain": ["openai"]})

print("Stacking order (outermost first):")
print(f"  Outermost: {type(model).__name__}")
print(f"  Middle:    {type(model._inner).__name__}")
print(f"  Innermost: {type(model._inner._inner).__name__}")
print(f"\nWhy this order?")
print(f"  Retry wraps Fallback: retry a fallback that already tried alternatives.")
print(f"  If Anthropic 429 -> retry. If Anthropic 500 -> retry.")
print(f"  After retries exhausted -> Fallback tries OpenAI.")

In [None]:
# Custom retry config via dict
clear_cache()
model = load_model("anthropic", retry={"max_retries": 5, "backoff_base_seconds": 2.0})

print(f"Custom retry config:")
print(f"  max_retries:        {model._max_retries}")
print(f"  backoff_base:       {model._backoff_base}")
print(f"  max_wait:           {model._max_wait}  (default, not overridden)")

---
## 10. Config Resolution Priority

Module config follows a 4-level priority chain:

```
1. kwarg=False  -> DISABLED (overrides everything)
2. kwarg={dict} -> ENABLED with custom values (merged over config.toml)
3. kwarg=True   -> ENABLED with config.toml defaults
4. kwarg=None   -> check config.toml 'enabled' flag
```

The `config.toml` has defaults for all modules:

In [None]:
# Show config.toml module defaults
from arcllm.config import load_global_config

global_config = load_global_config()
print("config.toml module settings:")
for name, cfg in global_config.modules.items():
    settings = {k: v for k, v in cfg.model_dump().items()}
    print(f"  [{name}]: {settings}")

In [None]:
# kwarg=False overrides config.toml enabled=true
# (Normally retry is disabled in config.toml, but let's test the logic)
clear_cache()

# retry=False -> always disabled, even if config.toml says enabled
model = load_model("anthropic", retry=False)
print(f"retry=False: {type(model).__name__}  (no RetryModule wrapping)")

# retry=True -> enabled with config.toml defaults
model = load_model("anthropic", retry=True)
print(f"retry=True:  {type(model).__name__}  (wrapped with config.toml defaults)")
print(f"  max_retries from config.toml: {model._max_retries}")

---
## 11. Agent Code Is Unchanged

The whole point of the module pattern — agent code stays exactly the same whether modules are enabled or not.

In [None]:
async def agent_invoke(model, query):
    """Generic agent code — works with or without modules."""
    resp = await model.invoke(
        [Message(role="user", content=query)],
        max_tokens=50, temperature=0.0,
    )
    return resp

# Mock the HTTP layer for demonstration
def mock_adapter(model):
    """Walk the wrapper chain and mock the innermost adapter's client."""
    target = model
    while hasattr(target, '_inner'):
        target = target._inner
    target._client = AsyncMock()
    target._client.post = AsyncMock(return_value=httpx.Response(
        200, json={
            "id": "msg_test", "type": "message", "role": "assistant",
            "model": target._model_name,
            "content": [{"type": "text", "text": "Hello from the adapter!"}],
            "stop_reason": "end_turn",
            "usage": {"input_tokens": 10, "output_tokens": 5},
        },
        request=httpx.Request("POST", "https://api.anthropic.com/v1/messages"),
    ))

clear_cache()

# Without modules
model_plain = load_model("anthropic")
mock_adapter(model_plain)
resp = await agent_invoke(model_plain, "Hello!")
print(f"Without modules: {resp.content!r}  (type: {type(model_plain).__name__})")

# With retry
model_retry = load_model("anthropic", retry=True)
mock_adapter(model_retry)
resp = await agent_invoke(model_retry, "Hello!")
print(f"With retry:      {resp.content!r}  (type: {type(model_retry).__name__})")

# With retry + fallback
model_full = load_model("anthropic", retry=True, fallback={"chain": ["openai"]})
mock_adapter(model_full)
resp = await agent_invoke(model_full, "Hello!")
print(f"Retry+Fallback:  {resp.content!r}  (type: {type(model_full).__name__})")

print(f"\nagent_invoke() is IDENTICAL in all three cases.")
print(f"Modules are invisible to the agent.")

---
## 12. Retry Logging

Both modules log their activity — useful for observability without changing agent code.

In [None]:
# Set up logging to see retry/fallback activity
import sys

handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter("%(name)s [%(levelname)s] %(message)s"))

retry_logger = logging.getLogger("arcllm.modules.retry")
retry_logger.addHandler(handler)
retry_logger.setLevel(logging.WARNING)

fallback_logger = logging.getLogger("arcllm.modules.fallback")
fallback_logger.addHandler(handler)
fallback_logger.setLevel(logging.WARNING)

print("Logging configured.")

In [None]:
# Watch retry logging
inner = make_inner([api_error(429), api_error(429), OK])
module = RetryModule({"max_retries": 3, "backoff_base_seconds": 0.001}, inner)

print("Retry with 2 failures then success:")
print("-" * 60)
result = await module.invoke(messages)
print("-" * 60)
print(f"Result: {result.content!r}")

In [None]:
# Watch retry exhaustion logging
inner = make_inner([api_error(500)] * 4)
module = RetryModule({"max_retries": 3, "backoff_base_seconds": 0.001}, inner)

print("Retry exhaustion:")
print("-" * 60)
try:
    await module.invoke(messages)
except ArcLLMAPIError:
    pass
print("-" * 60)

# Clean up loggers
retry_logger.removeHandler(handler)
fallback_logger.removeHandler(handler)

---
## 13. Live API — Retry + Fallback in Action

Let's use the real Anthropic API with retry enabled.

In [None]:
from dotenv import load_dotenv

load_dotenv(override=True)
clear_cache()

print(f"API key: {os.environ.get('ANTHROPIC_API_KEY', 'NOT SET')[:12]}...")

In [None]:
# load_model with retry=True — production-ready with one kwarg
async def live_retry_demo():
    model = load_model("anthropic", retry=True)
    print(f"Model type:  {type(model).__name__}")
    print(f"Inner type:  {type(model._inner).__name__}")
    print(f"Max retries: {model._max_retries}")
    print(f"Backoff:     {model._backoff_base}s base")

    async with model:
        resp = await model.invoke(
            [Message(role="user", content="What is 2 + 2? One word.")],
            max_tokens=10, temperature=0.0,
        )

    print(f"\nResponse:    {resp.content!r}")
    print(f"Model:       {resp.model}")
    print(f"Tokens:      {resp.usage.total_tokens}")
    print(f"Stop reason: {resp.stop_reason}")
    print(f"\nIf this had gotten a 429, it would have automatically retried.")

await live_retry_demo()

In [None]:
# Custom retry config — more aggressive for critical tasks
async def live_custom_retry():
    model = load_model("anthropic", retry={"max_retries": 5, "backoff_base_seconds": 0.5})
    print(f"Custom config: max_retries={model._max_retries}, backoff={model._backoff_base}s")

    async with model:
        resp = await model.invoke(
            [Message(role="user", content="Say 'hello' and nothing else.")],
            max_tokens=10, temperature=0.0,
        )

    print(f"Response: {resp.content!r}")

await live_custom_retry()

In [None]:
# Full agentic loop with retry — same pattern as before, just add retry=True
async def live_agentic_with_retry():
    model = load_model("anthropic", retry=True)

    tools = [
        Tool(
            name="calculate",
            description="Evaluate a mathematical expression",
            parameters={
                "type": "object",
                "properties": {"expression": {"type": "string"}},
                "required": ["expression"],
            },
        )
    ]

    msgs = [Message(role="user", content="What is 123 * 456? Use the calculator.")]
    turn = 0

    async with model:
        while True:
            turn += 1
            resp = await model.invoke(msgs, tools=tools, max_tokens=200, temperature=0.0)

            if resp.stop_reason == "end_turn":
                print(f"Turn {turn}: {resp.content}")
                break

            if resp.stop_reason == "tool_use":
                for tc in resp.tool_calls:
                    try:
                        result = str(eval(tc.arguments["expression"]))
                    except Exception:
                        result = "Error"
                    print(f"Turn {turn}: calculate({tc.arguments['expression']}) = {result}")
                    msgs.append(Message(role="assistant", content=[
                        ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments),
                    ]))
                    msgs.append(Message(role="tool", content=[
                        ToolResultBlock(tool_use_id=tc.id, content=result),
                    ]))

            if turn > 5:
                break

    expected = 123 * 456
    print(f"\nExpected: {expected}")
    print(f"Retry was active throughout — any transient error would have been handled.")

await live_agentic_with_retry()

---
## Summary

Step 7 built the **module system** and the first two modules:

```
modules/base.py       ->  BaseModule — transparent wrapper, delegates all calls
modules/retry.py      ->  RetryModule — exponential backoff + jitter
modules/fallback.py   ->  FallbackModule — provider chain on failure
registry.py (updated) ->  load_model() now accepts retry= and fallback= kwargs
```

**Module stacking:**
```
load_model("anthropic", retry=True, fallback={"chain": ["openai"]})
  -> RetryModule(FallbackModule(AnthropicAdapter))
```

**Key design decisions:**
- Wrapper pattern (middleware) — each module wraps `invoke()`, composable and testable
- Retry: 429/500/502/503/529 + connection errors. Exponential backoff with jitter.
- Retry-After header honored when present, capped at max_wait
- Fallback: config-driven chain, adapters created on-demand, closed after use
- Fallback raises primary error when chain exhausted (not last fallback error)
- Config resolution: kwarg=False > kwarg={dict} > kwarg=True > config.toml
- Agent code unchanged — `model.invoke()` just works
- Circular import solved with lazy import in fallback.py

**What's next:** Rate limiter (Step 8), Router (Step 9), then Observability (Steps 10-13).