What's Fixed
This release fixes a silent, structural bug that blocked caching for 100% of Claude Code, Claude Desktop, and Codex CLI traffic since v3.6.0.
Critical fixes
- Cache bypassed for all tool-bearing clients — The proxy had a
has_toolsguard (if not has_tools and cache:) on every cache path. Claude Code and Codex CLI always send atoolsarray. Zero savings, regardless of how many identical prompts were sent. Guard removed. - SSE parser returned None for tool_use responses —
_parse_sse_to_jsondidn't handletool_usecontent blocks (onlytext). Even without the has_tools guard, the parser would reject every Claude Code response. Fixed to accumulateinput_json_deltaevents and store full{id, name, input}blocks. - Gemini SSE never triggered cache store —
_stream_and_cache_forwardused amessage_stop/[DONE]sentinel to fire the cache callback. Gemini SSE has neither — stream ends by connection close. Sentinel removed; completeness delegated to per-surface parser. - CompressRouter never instantiated — Was defined but never wired into
ProxyApp.startup(). Dead code since v3.6.0. Fixed. - Gemini surface was pure pass-through — No cache, no compression at all. Full cache + compress pipeline added for both Gemini native (
/v1beta/models/*) and Gemini OpenAI-compat (/v1beta/openai/chat/completions) surfaces.
E2E verification
Real Anthropic API through proxy, same prompt twice:
- Call 1: cache MISS → stored in
llmcache_entries(312 bytes) - Call 2: cache HIT → served from cache in <5ms, no Anthropic API call
- Live metrics after session:
hits=6 misses=11 tokens_saved_output=596
New
docs/proxy-setup.md— Complete wiring guide for all clients (Claude Code CLI, Claude Desktop, Cursor, Windsurf, Codex CLI, Gemini CLI, AGY/Antigravity, Python/Node SDKs, LangChain, LlamaIndex, raw curl,install.shMac/Linux,install.ps1Windows)
Install
pip install -U superlocalmemory
# or
npm install -g superlocalmemory
# or clone and run:
bash scripts/install.sh # Mac/Linux
.\scripts\install.ps1 # Windows PowerShellUpgrade path
slm restart # restart daemon to pick up 3.6.3Part of Qualixar — AI Reliability Engineering | @varunPbhardwaj