LLM Council 0.7.18 Release Notes
Highlights
- Added provider-specific prompt-cache support without pretending every provider
uses the same cache model. - Added Gemini API and Vertex Gemini cached-content lifecycle handling:
create, TTL refresh, expiry recreation, best-effort cleanup, billing warnings,
and response metadata. - Added OpenRouter route-gated prompt-cache passthrough for eligible Anthropic
routes. - Added OpenAI cache telemetry parsing without sending unsupported request
controls.
Live Proof
- OpenRouter live proof returned
cache_read_tokens=8116on the second
identical call through an Anthropic route. - Gemini API live proof created
cachedContents/..., returned
cache_read_tokens=12603, and deleted the resource successfully. - Vertex Gemini live proof created
projects/.../cachedContents/..., returned
cache_read_tokens=14002, and deleted the resource successfully.
Live proof currently covers non-streaming generation paths. Streaming
cached-content lifecycle behavior is covered by regression tests and documented
with a stricter safety contract: retry expiry only before any chunk is yielded,
then emit final cleanup metadata on successful streams.
Upgrade Notes
- Install all provider SDKs with:
pip install 'the-llm-council[all]>=0.7.18'- Gemini/Vertex cached-content creation requires explicit stable
source_text;
Council does not infer cacheable source content from dynamic prompts. - Created or refreshed cached-content resources can incur storage-duration
billing until TTL expiry or successful cleanup.
Validation
- Focused cached-content lifecycle tests:
28 passed. - Broader prompt-cache suite:
157 passed. - Full suite:
431 passed. - Final council review: approved with no blocking issues.