Skip to content

v0.7.18

Latest

Choose a tag to compare

@sherifkozman sherifkozman released this 10 Jun 00:38
c195527

LLM Council 0.7.18 Release Notes

Highlights

  • Added provider-specific prompt-cache support without pretending every provider
    uses the same cache model.
  • Added Gemini API and Vertex Gemini cached-content lifecycle handling:
    create, TTL refresh, expiry recreation, best-effort cleanup, billing warnings,
    and response metadata.
  • Added OpenRouter route-gated prompt-cache passthrough for eligible Anthropic
    routes.
  • Added OpenAI cache telemetry parsing without sending unsupported request
    controls.

Live Proof

  • OpenRouter live proof returned cache_read_tokens=8116 on the second
    identical call through an Anthropic route.
  • Gemini API live proof created cachedContents/..., returned
    cache_read_tokens=12603, and deleted the resource successfully.
  • Vertex Gemini live proof created projects/.../cachedContents/..., returned
    cache_read_tokens=14002, and deleted the resource successfully.

Live proof currently covers non-streaming generation paths. Streaming
cached-content lifecycle behavior is covered by regression tests and documented
with a stricter safety contract: retry expiry only before any chunk is yielded,
then emit final cleanup metadata on successful streams.

Upgrade Notes

  • Install all provider SDKs with:
pip install 'the-llm-council[all]>=0.7.18'
  • Gemini/Vertex cached-content creation requires explicit stable source_text;
    Council does not infer cacheable source content from dynamic prompts.
  • Created or refreshed cached-content resources can incur storage-duration
    billing until TTL expiry or successful cleanup.

Validation

  • Focused cached-content lifecycle tests: 28 passed.
  • Broader prompt-cache suite: 157 passed.
  • Full suite: 431 passed.
  • Final council review: approved with no blocking issues.