Skip to content

v3.6.3 — Cache + compression now work for Claude Code, Codex, Gemini

Latest

Choose a tag to compare

@varun369 varun369 released this 07 Jun 21:03
· 8 commits to main since this release
Immutable release. Only release title and notes can be modified.

What's Fixed

This release fixes a silent, structural bug that blocked caching for 100% of Claude Code, Claude Desktop, and Codex CLI traffic since v3.6.0.

Critical fixes

  • Cache bypassed for all tool-bearing clients — The proxy had a has_tools guard (if not has_tools and cache:) on every cache path. Claude Code and Codex CLI always send a tools array. Zero savings, regardless of how many identical prompts were sent. Guard removed.
  • SSE parser returned None for tool_use responses_parse_sse_to_json didn't handle tool_use content blocks (only text). Even without the has_tools guard, the parser would reject every Claude Code response. Fixed to accumulate input_json_delta events and store full {id, name, input} blocks.
  • Gemini SSE never triggered cache store_stream_and_cache_forward used a message_stop/[DONE] sentinel to fire the cache callback. Gemini SSE has neither — stream ends by connection close. Sentinel removed; completeness delegated to per-surface parser.
  • CompressRouter never instantiated — Was defined but never wired into ProxyApp.startup(). Dead code since v3.6.0. Fixed.
  • Gemini surface was pure pass-through — No cache, no compression at all. Full cache + compress pipeline added for both Gemini native (/v1beta/models/*) and Gemini OpenAI-compat (/v1beta/openai/chat/completions) surfaces.

E2E verification

Real Anthropic API through proxy, same prompt twice:

  • Call 1: cache MISS → stored in llmcache_entries (312 bytes)
  • Call 2: cache HIT → served from cache in <5ms, no Anthropic API call
  • Live metrics after session: hits=6 misses=11 tokens_saved_output=596

New

  • docs/proxy-setup.md — Complete wiring guide for all clients (Claude Code CLI, Claude Desktop, Cursor, Windsurf, Codex CLI, Gemini CLI, AGY/Antigravity, Python/Node SDKs, LangChain, LlamaIndex, raw curl, install.sh Mac/Linux, install.ps1 Windows)

Install

pip install -U superlocalmemory
# or
npm install -g superlocalmemory
# or clone and run:
bash scripts/install.sh       # Mac/Linux
.\scripts\install.ps1         # Windows PowerShell

Upgrade path

slm restart   # restart daemon to pick up 3.6.3

Part of Qualixar — AI Reliability Engineering | @varunPbhardwaj