Release v2.8.1 — prompt caching for the AiNative runtime · mabou7agar/laravel-ai-support

Makes repeated AiNative planner steps faster/cheaper by caching the stable instruction prefix.

Caching

Anthropic — the planner sends its stable instruction prefix as a cache_control: ephemeral system block, so Claude orchestration caches the ~5 KB rules block across a turn's steps (faster first token, cheaper input) instead of re-reading it every call. Gated by ai-engine.engines.anthropic.prompt_caching (default true). (#60)
OpenAI — already auto-caches the longest common prompt prefix (no cache_control needed); the single-message prompt is left byte-identical, so the default path is unchanged. (Live-proven: 1408/1455 input tokens served from cache on a repeat call.)

Observability (#61)

AIResponse->getUsage() now surfaces cache token counts:

cached_tokens — OpenAI prompt-prefix hit / Anthropic cache_read
cache_creation_tokens — Anthropic cache write
normalized for Gemini and the openai-family drivers (deepseek, perplexity, nvidia) too

Note

Why no tool-search subagent: tool selection is already handled by the keyword/semantic selectors + find_tools (the ToolSearch-equivalent); caching is the orthogonal speed lever, now wired.

Full suite green: 1924 tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.8.1 — prompt caching for the AiNative runtime

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Caching

Observability (#61)

Note

Uh oh!