Makes repeated AiNative planner steps faster/cheaper by caching the stable instruction prefix.
Caching
- Anthropic — the planner sends its stable instruction prefix as a
cache_control: ephemeralsystem block, so Claude orchestration caches the ~5 KB rules block across a turn's steps (faster first token, cheaper input) instead of re-reading it every call. Gated byai-engine.engines.anthropic.prompt_caching(default true). (#60) - OpenAI — already auto-caches the longest common prompt prefix (no
cache_controlneeded); the single-message prompt is left byte-identical, so the default path is unchanged. (Live-proven: 1408/1455 input tokens served from cache on a repeat call.)
Observability (#61)
AIResponse->getUsage() now surfaces cache token counts:
cached_tokens— OpenAI prompt-prefix hit / Anthropiccache_readcache_creation_tokens— Anthropic cache write- normalized for Gemini and the openai-family drivers (deepseek, perplexity, nvidia) too
Note
Why no tool-search subagent: tool selection is already handled by the keyword/semantic selectors + find_tools (the ToolSearch-equivalent); caching is the orthogonal speed lever, now wired.
Full suite green: 1924 tests.