Skip to content

v0.5.0

Choose a tag to compare

@skundu42 skundu42 released this 10 Jun 09:25
· 1 commit to main since this release
  • Native SSE streaming for Anthropic & OpenAI — the single-shot fallback is
    gone. Text/thinking deltas stream as partial chunks, tool-call arguments are
    reassembled across fragments, and the final chunk carries the stop reason and
    usage. All three providers now share one chunk contract.
  • Multimodal input for Anthropic & OpenAI — inline images map to base64
    image blocks / data: URI image_url parts, inline PDFs to Anthropic
    document blocks, https:// file references to URL sources. Unsupported
    parts are dropped with a warning — never silently.
  • Anthropic prompt caching — the same ContextCacheConfig that drives
    Gemini's explicit cache becomes a cache_control breakpoint on the system
    block (or last tool); ttl_seconds >= 3600 selects the 1-hour tier. Cache
    activity surfaces via cache_metadata and cached_content_token_count.
  • Reasoning-model fixmax_output_tokens is sent as
    max_completion_tokens for o-series / gpt-5 models (which reject the
    deprecated max_tokens), keeping older OpenAI-compatible servers on the
    legacy field.
  • examples/compat_check — live wire-compatibility smoke test against the
    real Anthropic and OpenAI APIs (generation, streaming, tools, structured
    output, images, caching). Run it with your keys; exits non-zero on failure.

Full Changelog: v0.4.0...v0.5.0