Skip to content

v0.1.0 — verified-offline + 16GB Mac support

Latest

Choose a tag to compare

@nicedreamzapp nicedreamzapp released this 08 May 16:13
· 13 commits to main since this release
751caae

First tagged release. This locks in two big shifts that landed this week.

🔒 Verified offline — not just claimed offline

Tadrianonet (@tadrianonet) confirmed via lsof that Claude Code 2.1 reaches out to api.anthropic.com on startup even with ANTHROPIC_BASE_URL set — telemetry, statsig feature flags, marketplace auto-install, and the autoupdater all bypass the configured base URL. That means setups following the old README were not actually offline.

Fixed in PR #32 by setting four env vars in the launchers:

CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1
DISABLE_AUTOUPDATER=1
CLAUDE_CODE_DISABLE_OFFICIAL_MARKETPLACE_AUTOINSTALL=1
CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1

After the patch, lsof on the claude process shows only localhost:4000. Validated end-to-end. This is the foundation the project needs to be honestly used in air-gapped legal/medical/compliance environments.

💻 Mac base/Pro 16GB now works out of the box

The previous default (Gemma 4 31B, 18GB weights) crashed on 16GB unified memory with kIOGPUCommandBufferCallbackErrorOutOfMemory. New default for the 16GB tier is Qwen 2.5 Coder 14B 4-bit MLX (~7.8GB weights), with new Claude Chat.command and Claude Agentico.command launchers that:

  • automate the macOS keychain auth workaround (hasCompletedOnboarding=true, ANTHROPIC_AUTH_TOKEN, DISABLE_LOGIN_COMMAND=1)
  • force --effort low so Claude Code 2.1's extended-thinking flow doesn't exhaust small models on the first call
  • enable 4-bit KV cache from token 0 so the ~5860-token Claude Code system prompt fits in 16GB

Full PT-BR setup guide at docs/MAC-BASE-SETUP.md.

🔧 Other fixes

  • ChatML / Llama 3.x stop markersclean_response now strips <|im_end|>, <|endoftext|>, <|im_start|>, <|end_of_text|>, <|eot_id|> (previously only Gemma 4's <turn|> markers were handled, so any non-Gemma model leaked special tokens into output)
  • Qwen 2.5 <tools> parser (Format 3.5) — Qwen 2.5 fine-tunes emit tool calls inside <tools>...</tools> instead of <tool_call>. Without this, tool calls were returned as plain text and Claude Code never executed them.
  • Markdown-fenced JSON tool calls (Format 3.6) — handles \``json` blocks with single or multiple back-to-back JSON objects
  • Anthropic SSE streaming — proxy now responds with text/event-stream when the client requests stream: true. Claude Code 2.1 silently discards non-streaming responses for any tool-using request, so this was a hidden cause of "(No output)" failures.
  • Input-key filtering against input_schema — Claude Code 2.1 silently rejects tool_use blocks with extra fields, which Qwen 2.5 likes to add. Now stripped automatically.
  • Empty text block bug — fixed an issue where an empty {"type":"text","text":""} was prepended to tool_use responses, causing Claude Code 2.1 to read it as "(No output)" and discard the actual tool calls
  • uninstall.sh — proper reverse of setup.sh (#23 by @tripathiprateek)

Contributors

Thank you to @tadrianonet (PR #32 — the bulk of this release), @tripathiprateek (PR #23), and @kevinbarns on Discord for independently identifying the same SSE streaming issue and digging into the prefill bottleneck (follow-up PR coming).