Implement AnthropicModelInterface and AnthropicCacheProvider

## Overview

Implement `AnthropicModelInterface` — the real provider implementation of the `ModelInterface` trait that makes actual Anthropic API calls. Also implements `AnthropicCacheProvider` as a natural companion since both are Anthropic-specific and share the same API client.

This is a leaf-node implementation — it only depends on the `ModelInterface` and `CacheProvider` traits being defined (#1, #25).

## When to Build

Implement before `ContextManager` (#7). Accurate token counting stops being an approximation and starts being a correctness issue when compaction decisions, cache block hashing, and budget tracking all depend on it. The `bytes / 4` heuristic in `ReplayModelInterface` must be replaced with the real Anthropic token counting API before #7 is implemented.

Also unblocks:
- #38 RecordingModelInterface (needs a real provider to wrap)
- #37 Request-hash matching (needs #38)

---

## AnthropicModelInterface

### Constructor

```
AnthropicModelInterface {
  api_key: String,
  model_id: String,        // "claude-sonnet-4-6", "claude-opus-4-6", etc.
  base_url: Option<String>, // default: "https://api.anthropic.com"
  timeout: Duration,        // default: 120s
  max_retries: u32,         // default: 3 (exponential backoff for 429, 529)
  http_client: HttpClient,  // injected for testability
}

// Constructors
AnthropicModelInterface::new(api_key, model_id) -> Self
AnthropicModelInterface::from_env(env_var, model_id) -> Result<Self>
  // reads API key from env_var (e.g. "ANTHROPIC_API_KEY")
  // returns error if env var not set
```

### ModelInterface implementation

**`call(request)`**
- Translate `ModelRequest` → Anthropic Messages API JSON body
- POST to `https://api.anthropic.com/v1/messages`
- Translate Anthropic response → `ModelResponse`
- Extract `TokenUsage` from response `usage` field
- Handle provider errors as typed `ModelError` variants:
  - 429 → `ModelError::RateLimited { retry_after }`
  - 529 → `ModelError::RateLimited { retry_after: None }`
  - 408/504 → `ModelError::Timeout`
  - 4xx → `ModelError::ProviderError { code, message }`
- Retry on `RateLimited` and `Timeout` with exponential backoff (handled internally, caller never sees these)
- `ContextLimitExceeded` checked before the API call using `count_tokens`
- `BudgetExceeded` checked before the API call using `request.params.max_tokens`

**`call_streaming(request, on_token)`**
- POST with `stream: true`
- Parse Anthropic SSE event stream:
  - `message_start` → extract input token count
  - `content_block_start { type: "text" }` → begin text block
  - `content_block_start { type: "thinking" }` → begin thinking block
  - `content_block_start { type: "tool_use" }` → begin tool use block
  - `content_block_delta { type: "text_delta" }` → fire `StreamEvent::TextDelta`
  - `content_block_delta { type: "thinking_delta" }` → fire `StreamEvent::ThinkingDelta`
  - `content_block_delta { type: "input_json_delta" }` → fire `StreamEvent::ToolCallDelta`
  - `content_block_stop` → close current block
  - `message_delta { stop_reason, usage }` → extract output tokens
  - `message_stop` → fire `StreamEvent::Done`, return complete `ModelResponse`

**`count_tokens(request)`**
- POST to `https://api.anthropic.com/v1/messages/count_tokens`
- Returns exact token count from Anthropic's tokenizer
- Replaces the `bytes / 4` heuristic used in `ReplayModelInterface`
- Note: this endpoint has its own rate limits — cache results where possible

**`provider()`**
```
ProviderInfo {
  name: "anthropic",
  model_id: self.model_id,
  context_window: model_context_window(self.model_id),
  // claude-sonnet-4-6: 200_000
  // claude-opus-4-6:   200_000
  // claude-haiku-4-5:  200_000
}
```

---

## AnthropicCacheProvider

Implements `CacheProvider` for the Anthropic prefix caching API.

```
AnthropicCacheProvider {
  max_cache_anchors: u32,   // default 4
}
```

**`annotate(context)`**
- Insert `cache_control: { type: "ephemeral" }` on the last message/block in each cache block:
  - After the last Static chunk in the system prompt (Block 1 breakpoint)
  - After the last PerSession segment in the system prompt (Block 2 breakpoint)
  - After the last tool result in message history
  - Fourth anchor available for large tool schemas if needed
- Anthropic API format: cache control goes on individual content blocks, not on the system string

**`parse_cache_stats(response)`**
- Extract from response `usage`:
  - `cache_read_input_tokens` → `CacheStats.cache_read_tokens`
  - `cache_creation_input_tokens` → `CacheStats.cache_write_tokens`
- Compute costs using Anthropic's published cache pricing for the model

**`supports_caching()`** → `true`

**`provider_name()`** → `"anthropic"`

---

## Implementation Notes

- All four languages must implement both `AnthropicModelInterface` and `AnthropicCacheProvider`
- HTTP client: use the language's standard async HTTP library (`reqwest` in Rust, `fetch` in TypeScript, `httpx` in Python, `net/http` in Go)
- API key must never appear in logs or traces — redact in `ObservabilityProvider` spans
- The `AnthropicModelInterface` is not a mock — integration tests that use it make real API calls and are tagged accordingly (Level 3 tests per #20, excluded from default CI)
- Retry logic lives inside the implementation, not in the harness loop — the caller never sees `RateLimited` from a transient 429

## Test Structure

**Unit tests** (no real API calls):
- Request serialization: verify `ModelRequest` translates to correct Anthropic JSON body
- Response deserialization: verify Anthropic JSON response translates to correct `ModelResponse`
- Error mapping: verify each HTTP error code maps to the correct `ModelError` variant
- Cache annotation: verify `AnthropicCacheProvider.annotate()` inserts markers in the right positions
- Cache stats parsing: verify `parse_cache_stats()` extracts correct values from mock response JSON

**Integration tests** (real API, Level 3, tagged `#[ignore]` / skipped by default):
- One real `call()` — verifies the response shape and token usage fields
- One real `call_streaming()` — verifies all SSE event types are handled
- One real `count_tokens()` — verifies it returns a non-zero count

## Checklist

- [ ] Rust: `AnthropicModelInterface` + `AnthropicCacheProvider`
- [ ] TypeScript: `AnthropicModelInterface` + `AnthropicCacheProvider`
- [ ] Python: `AnthropicModelInterface` + `AnthropicCacheProvider`
- [ ] Go: `AnthropicModelInterface` + `AnthropicCacheProvider`
- [ ] All unit tests pass in all four languages
- [ ] Integration tests tagged and passing against real API
- [ ] `count_tokens` heuristic replaced in `ReplayModelInterface`
- [ ] Fixture `fixtures/model_responses/model_interface/basic_text.jsonl` regenerated by recording against real API

## Related Issues

- #1 ModelInterface trait (this implements it)
- #25 CacheProvider trait (AnthropicCacheProvider implements it)
- #37 Request-hash matching (unblocked by this)
- #38 RecordingModelInterface (unblocked by this)
- #20 Testing strategy (integration test tagging)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement AnthropicModelInterface and AnthropicCacheProvider #39

Overview

When to Build

AnthropicModelInterface

Constructor

ModelInterface implementation

AnthropicCacheProvider

Implementation Notes

Test Structure

Checklist

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement AnthropicModelInterface and AnthropicCacheProvider #39

Description

Overview

When to Build

AnthropicModelInterface

Constructor

ModelInterface implementation

AnthropicCacheProvider

Implementation Notes

Test Structure

Checklist

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions