Skip to content

Implement AnthropicModelInterface and AnthropicCacheProvider #39

@sbeardsley

Description

@sbeardsley

Overview

Implement AnthropicModelInterface — the real provider implementation of the ModelInterface trait that makes actual Anthropic API calls. Also implements AnthropicCacheProvider as a natural companion since both are Anthropic-specific and share the same API client.

This is a leaf-node implementation — it only depends on the ModelInterface and CacheProvider traits being defined (#1, #25).

When to Build

Implement before ContextManager (#7). Accurate token counting stops being an approximation and starts being a correctness issue when compaction decisions, cache block hashing, and budget tracking all depend on it. The bytes / 4 heuristic in ReplayModelInterface must be replaced with the real Anthropic token counting API before #7 is implemented.

Also unblocks:


AnthropicModelInterface

Constructor

AnthropicModelInterface {
  api_key: String,
  model_id: String,        // "claude-sonnet-4-6", "claude-opus-4-6", etc.
  base_url: Option<String>, // default: "https://api.anthropic.com"
  timeout: Duration,        // default: 120s
  max_retries: u32,         // default: 3 (exponential backoff for 429, 529)
  http_client: HttpClient,  // injected for testability
}

// Constructors
AnthropicModelInterface::new(api_key, model_id) -> Self
AnthropicModelInterface::from_env(env_var, model_id) -> Result<Self>
  // reads API key from env_var (e.g. "ANTHROPIC_API_KEY")
  // returns error if env var not set

ModelInterface implementation

call(request)

  • Translate ModelRequest → Anthropic Messages API JSON body
  • POST to https://api.anthropic.com/v1/messages
  • Translate Anthropic response → ModelResponse
  • Extract TokenUsage from response usage field
  • Handle provider errors as typed ModelError variants:
    • 429 → ModelError::RateLimited { retry_after }
    • 529 → ModelError::RateLimited { retry_after: None }
    • 408/504 → ModelError::Timeout
    • 4xx → ModelError::ProviderError { code, message }
  • Retry on RateLimited and Timeout with exponential backoff (handled internally, caller never sees these)
  • ContextLimitExceeded checked before the API call using count_tokens
  • BudgetExceeded checked before the API call using request.params.max_tokens

call_streaming(request, on_token)

  • POST with stream: true
  • Parse Anthropic SSE event stream:
    • message_start → extract input token count
    • content_block_start { type: "text" } → begin text block
    • content_block_start { type: "thinking" } → begin thinking block
    • content_block_start { type: "tool_use" } → begin tool use block
    • content_block_delta { type: "text_delta" } → fire StreamEvent::TextDelta
    • content_block_delta { type: "thinking_delta" } → fire StreamEvent::ThinkingDelta
    • content_block_delta { type: "input_json_delta" } → fire StreamEvent::ToolCallDelta
    • content_block_stop → close current block
    • message_delta { stop_reason, usage } → extract output tokens
    • message_stop → fire StreamEvent::Done, return complete ModelResponse

count_tokens(request)

  • POST to https://api.anthropic.com/v1/messages/count_tokens
  • Returns exact token count from Anthropic's tokenizer
  • Replaces the bytes / 4 heuristic used in ReplayModelInterface
  • Note: this endpoint has its own rate limits — cache results where possible

provider()

ProviderInfo {
  name: "anthropic",
  model_id: self.model_id,
  context_window: model_context_window(self.model_id),
  // claude-sonnet-4-6: 200_000
  // claude-opus-4-6:   200_000
  // claude-haiku-4-5:  200_000
}

AnthropicCacheProvider

Implements CacheProvider for the Anthropic prefix caching API.

AnthropicCacheProvider {
  max_cache_anchors: u32,   // default 4
}

annotate(context)

  • Insert cache_control: { type: "ephemeral" } on the last message/block in each cache block:
    • After the last Static chunk in the system prompt (Block 1 breakpoint)
    • After the last PerSession segment in the system prompt (Block 2 breakpoint)
    • After the last tool result in message history
    • Fourth anchor available for large tool schemas if needed
  • Anthropic API format: cache control goes on individual content blocks, not on the system string

parse_cache_stats(response)

  • Extract from response usage:
    • cache_read_input_tokensCacheStats.cache_read_tokens
    • cache_creation_input_tokensCacheStats.cache_write_tokens
  • Compute costs using Anthropic's published cache pricing for the model

supports_caching()true

provider_name()"anthropic"


Implementation Notes

  • All four languages must implement both AnthropicModelInterface and AnthropicCacheProvider
  • HTTP client: use the language's standard async HTTP library (reqwest in Rust, fetch in TypeScript, httpx in Python, net/http in Go)
  • API key must never appear in logs or traces — redact in ObservabilityProvider spans
  • The AnthropicModelInterface is not a mock — integration tests that use it make real API calls and are tagged accordingly (Level 3 tests per Decision: Testing Strategy #20, excluded from default CI)
  • Retry logic lives inside the implementation, not in the harness loop — the caller never sees RateLimited from a transient 429

Test Structure

Unit tests (no real API calls):

  • Request serialization: verify ModelRequest translates to correct Anthropic JSON body
  • Response deserialization: verify Anthropic JSON response translates to correct ModelResponse
  • Error mapping: verify each HTTP error code maps to the correct ModelError variant
  • Cache annotation: verify AnthropicCacheProvider.annotate() inserts markers in the right positions
  • Cache stats parsing: verify parse_cache_stats() extracts correct values from mock response JSON

Integration tests (real API, Level 3, tagged #[ignore] / skipped by default):

  • One real call() — verifies the response shape and token usage fields
  • One real call_streaming() — verifies all SSE event types are handled
  • One real count_tokens() — verifies it returns a non-zero count

Checklist

  • Rust: AnthropicModelInterface + AnthropicCacheProvider
  • TypeScript: AnthropicModelInterface + AnthropicCacheProvider
  • Python: AnthropicModelInterface + AnthropicCacheProvider
  • Go: AnthropicModelInterface + AnthropicCacheProvider
  • All unit tests pass in all four languages
  • Integration tests tagged and passing against real API
  • count_tokens heuristic replaced in ReplayModelInterface
  • Fixture fixtures/model_responses/model_interface/basic_text.jsonl regenerated by recording against real API

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions