First tagged release. This locks in two big shifts that landed this week.
🔒 Verified offline — not just claimed offline
Tadrianonet (@tadrianonet) confirmed via lsof that Claude Code 2.1 reaches out to api.anthropic.com on startup even with ANTHROPIC_BASE_URL set — telemetry, statsig feature flags, marketplace auto-install, and the autoupdater all bypass the configured base URL. That means setups following the old README were not actually offline.
Fixed in PR #32 by setting four env vars in the launchers:
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
DISABLE_AUTOUPDATER=1
CLAUDE_CODE_DISABLE_OFFICIAL_MARKETPLACE_AUTOINSTALL=1
CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1After the patch, lsof on the claude process shows only localhost:4000. Validated end-to-end. This is the foundation the project needs to be honestly used in air-gapped legal/medical/compliance environments.
💻 Mac base/Pro 16GB now works out of the box
The previous default (Gemma 4 31B, 18GB weights) crashed on 16GB unified memory with kIOGPUCommandBufferCallbackErrorOutOfMemory. New default for the 16GB tier is Qwen 2.5 Coder 14B 4-bit MLX (~7.8GB weights), with new Claude Chat.command and Claude Agentico.command launchers that:
- automate the macOS keychain auth workaround (
hasCompletedOnboarding=true,ANTHROPIC_AUTH_TOKEN,DISABLE_LOGIN_COMMAND=1) - force
--effort lowso Claude Code 2.1's extended-thinking flow doesn't exhaust small models on the first call - enable 4-bit KV cache from token 0 so the ~5860-token Claude Code system prompt fits in 16GB
Full PT-BR setup guide at docs/MAC-BASE-SETUP.md.
🔧 Other fixes
- ChatML / Llama 3.x stop markers —
clean_responsenow strips<|im_end|>,<|endoftext|>,<|im_start|>,<|end_of_text|>,<|eot_id|>(previously only Gemma 4's<turn|>markers were handled, so any non-Gemma model leaked special tokens into output) - Qwen 2.5
<tools>parser (Format 3.5) — Qwen 2.5 fine-tunes emit tool calls inside<tools>...</tools>instead of<tool_call>. Without this, tool calls were returned as plain text and Claude Code never executed them. - Markdown-fenced JSON tool calls (Format 3.6) — handles
\``json` blocks with single or multiple back-to-back JSON objects - Anthropic SSE streaming — proxy now responds with
text/event-streamwhen the client requestsstream: true. Claude Code 2.1 silently discards non-streaming responses for any tool-using request, so this was a hidden cause of "(No output)" failures. - Input-key filtering against
input_schema— Claude Code 2.1 silently rejectstool_useblocks with extra fields, which Qwen 2.5 likes to add. Now stripped automatically. - Empty text block bug — fixed an issue where an empty
{"type":"text","text":""}was prepended to tool_use responses, causing Claude Code 2.1 to read it as "(No output)" and discard the actual tool calls uninstall.sh— proper reverse ofsetup.sh(#23 by @tripathiprateek)
Contributors
Thank you to @tadrianonet (PR #32 — the bulk of this release), @tripathiprateek (PR #23), and @kevinbarns on Discord for independently identifying the same SSE streaming issue and digging into the prefill bottleneck (follow-up PR coming).