QwenCode is a local compatibility layer that lets Claude Code talk to an Ollama-hosted Qwen model instead of Anthropic's cloud API.
Claude Code is Anthropic's terminal-based AI coding agent. It reads your files, runs shell commands, and edits code through a structured tool-calling protocol. Normally it requires an Anthropic API key and sends your code to the cloud. QwenCode removes both of those requirements by routing the same interface through a local model.
QwenCode is actively maintained. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. It is not a completed, maintenance-only project — new Claude Code releases can require shim updates, and those updates are issued on a best-effort basis.
Current state: Tested and working against Claude Code versions available as of April 2026.
Maintenance model: There is no fixed release schedule. Updates are issued when Claude Code introduces breaking changes to its API contract or when reliability improvements are warranted. The test suite is designed to catch regressions introduced by upstream changes. Watch the GitHub repository for releases to stay current.
Long-term viability caveat: QwenCode depends on Claude Code as an external dependency. If Anthropic significantly changes Claude Code's internal protocol, the shim will need to follow. There is no committed SLA for how quickly a breaking upstream release will be addressed. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.
- Initial public release
- Tested against Claude Code versions available as of April 2026
- Supports
POST /v1/messages(streaming and non-streaming),GET /v1/models,GET /health - Synthetic fallback layer covering: create, overwrite, append, replace, insert, rename, delete, create directory, list directory, multi-file scaffold
- Post-execution verification for all synthetic file operations
- Full error-path test suite covering timeouts, malformed JSON, interrupted streams, rate limits, and large-context continuation
- Windows support via PowerShell launchers (requires Git Bash or WSL for synthetic Bash execution)
- Zero Node.js dependencies
Releases are tagged on the GitHub repository. If you are evaluating whether QwenCode is current relative to a recent Claude Code release, the release tags are the authoritative reference.
Claude Code expects Messages-style request and streaming semantics. Local models exposed through Ollama speak a different protocol and often fail to emit reliable native tool calls under Claude Code's tool-heavy prompt.
QwenCode bridges that gap by:
- exposing a subset of a Messages-style API on localhost
- translating Claude Code requests into Ollama
/api/chatcalls - translating Ollama responses back into compatible responses and SSE events
- rescuing common coding workflows with verified synthetic Bash tool calls when the model describes an action instead of calling a tool
The result: you keep Claude Code's full interface (file editing, bash commands, tool orchestration) running entirely on your machine with no API key and no per-token cost.
If you are evaluating local terminal coding agents, you have likely also looked at Aider and Continue. Here is how they differ from QwenCode:
Aider is its own standalone agent with its own interface, edit formats, and model abstraction layer. It supports many local and cloud models directly. If you want a fully self-contained agent that does not depend on Claude Code, Aider is a reasonable choice. QwenCode is not trying to replace Aider — it is a different bet: you get Claude Code's specific tool-calling loop and interface, running against a local model instead.
Continue is primarily an IDE extension (VS Code, JetBrains) rather than a terminal agent. It integrates local models into your editor's chat and autocomplete. If you prefer staying inside your editor, Continue covers that use case. QwenCode is terminal-first and does not require an IDE.
QwenCode's specific tradeoff: You are preserving Claude Code's exact agent loop — its tool orchestration, file-editing protocol, and bash execution flow — while replacing only the backend. This means:
- You get Claude Code's structured tool use rather than a different agent's edit conventions
- The synthetic fallback layer means more of Claude Code's workflows succeed even when the local model does not emit a native tool call
- Post-execution verification means file operations are confirmed correct rather than silently assumed
- There is no separate CLI to learn; if you already use Claude Code, the interface is identical
The honest limitation: Claude Code is the dependency. QwenCode does not work without it, and local model tool-calling is still less reliable than Claude-hosted models. Aider's own model abstraction may be more forgiving of model-specific quirks if you plan to switch models frequently. QwenCode is optimized specifically for Qwen models running through Ollama under Claude Code's protocol.
QwenCode runs on Windows, macOS, and Linux. The shim itself is plain Node.js with zero dependencies.
| Platform | Shim Launcher | Client Launcher |
|---|---|---|
| Windows | launch-shim.ps1 |
launch-client.ps1 |
| macOS | launch-shim.sh |
launch-client.sh |
| Linux | launch-shim.sh |
launch-client.sh |
On macOS/Linux, make the scripts executable after cloning:
chmod +x launch-shim.sh launch-client.shWindows note: Synthetic Bash tool execution on Windows requires a working bash executable in PATH. Git Bash or WSL both satisfy this requirement. This is easy to miss — if Claude Code's bash operations fail on Windows, this is the first thing to check. See Runtime Requirements below.
- Node.js 18+
- Ollama running locally (or on a reachable host)
- A Qwen-compatible model pulled in Ollama
- Claude Code installed and runnable as
claude - On Windows, a working
bashexecutable in PATH for synthetic Bash tool execution (Git Bash or WSL work)
Recommended model: qwen2.5-coder:14b (default in the launchers). This performed better for constrained coding tasks than larger general-purpose models in testing.
GPU memory requirement: qwen2.5-coder:14b requires approximately 9GB of VRAM to run fully on-GPU. Cards with 8GB of VRAM (such as the RTX 3070 8GB or RTX 4060 8GB) will not fit the model entirely in GPU memory and will fall back to partial CPU offload, which substantially reduces throughput. If you have an 8GB card, consider qwen2.5-coder:7b as an alternative, or accept CPU-offloaded performance. See the Performance Expectations table for context.
ollama pull qwen2.5-coder:14bgit clone https://github.com/strifero/QwenCode.git
cd QwenCodeNo npm install needed. Zero dependencies.
macOS / Linux:
./launch-shim.shWindows (PowerShell):
.\launch-shim.ps1In a second terminal:
macOS / Linux:
./launch-client.shWindows (PowerShell):
.\launch-client.ps1You can pass Claude Code arguments through:
./launch-client.sh -- --dangerously-skip-permissionsQwenCode adds minimal overhead. The shim itself is a thin translation layer. What you will notice is that local model inference is slower than the Anthropic API.
Rough expectations with qwen2.5-coder:14b:
| Hardware | First token | Throughput | Practical feel |
|---|---|---|---|
| RTX 3060 (12GB) | 2-4s | ~25 tok/s | Usable for focused tasks |
| RTX 4070 (12GB) | 1-3s | ~35 tok/s | Comfortable daily driver |
| RTX 4090 (24GB) | <1s | ~50 tok/s | Near-cloud feel |
| M2 Pro (16GB) | 2-5s | ~20 tok/s | Usable, patience helps |
| CPU only | 10-30s | ~3 tok/s | Not recommended |
The benchmarks above are from macOS and Linux systems. Windows performance with the same hardware is expected to be similar when the model runs fully on GPU, but has not been independently benchmarked. If you are evaluating QwenCode on Windows for long-running use, treat the figures above as directional rather than exact.
Cards with less than 9GB of VRAM will not appear in this table because qwen2.5-coder:14b requires approximately 9GB of VRAM for full GPU inference. Running with CPU offload on such cards will produce throughput closer to the CPU-only row.
The main latency you will feel is on the first response of each turn, because Claude Code's system prompt is large and the model must process it. Subsequent exchanges in the same session are faster.
For comparison, native Claude Code against the Anthropic API typically responds in under 2 seconds with throughput limited only by network speed.
QwenCode is built and tested against Claude Code CLI (the claude command).
Tested with: Claude Code versions available as of April 2026. The shim translates the Messages API format and tool-calling protocol, which has been stable across Claude Code releases.
Maintenance cadence: QwenCode does not follow a fixed release schedule tied to Claude Code releases. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. The test suite is designed to catch these regressions. If you are evaluating QwenCode for long-term or production use, the practical risk is that a Claude Code update could require a shim update before things work again. Watching the GitHub repository for releases is the best way to stay current.
There is no committed SLA for how quickly a breaking Claude Code release will be addressed. Fixes are issued on a best-effort basis. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.
What could break: If Anthropic changes Claude Code's internal API contract (new required fields, new tool formats, new streaming event types), the shim may need updates. The test suite is designed to catch these regressions.
What is not supported:
- Multimodal content (image uploads)
- Provider-specific beta features
- Extended thinking / streaming thinking blocks
- MCP server passthrough from Claude Code
Ollama defaults to a context window of 2048 tokens for most models, which is much smaller than what Claude Code expects for a typical session. Claude Code's system prompt alone is large, and a multi-turn session accumulates conversation history quickly. If you run QwenCode without adjusting this, Ollama will silently truncate the conversation once it exceeds 2048 tokens. The model will still respond, but it will have lost earlier context — edits may become inconsistent, the model may forget files it has already read, and tool-calling reliability will degrade without any obvious error.
QwenCode exposes the OLLAMA_NUM_CTX environment variable to set the context window size passed to Ollama on every request. The launch scripts do not set a default value for this variable, so Ollama's own default applies unless you override it.
Recommended approach: Set OLLAMA_NUM_CTX explicitly before starting the shim. A value of 8192 is a reasonable starting point for most Claude Code sessions. Higher values consume more VRAM; the right number depends on your available GPU memory and session length.
# macOS / Linux
OLLAMA_NUM_CTX=8192 ./launch-shim.sh
# Windows (PowerShell)
$env:OLLAMA_NUM_CTX = "8192"
.\launch-shim.ps1If you notice response quality degrading mid-session — especially on tasks that reference earlier conversation turns — context truncation is the first thing to investigate. Enable SHIM_LOG=debug to see what context size is being sent to Ollama on each request.
The most common failure mode with local models is: Claude Code asks the model to call a tool, and instead the model writes out what it would do in plain text. For example, instead of calling the Write tool, the model says "I'll create a file called app.py with the following content..."
QwenCode handles this with a synthetic fallback layer:
-
Intent parsing. When the model returns text instead of a tool call, the shim parses the text against a set of known intent patterns (create file, read file, append, replace, insert, delete, rename, create directory, list directory, multi-file scaffold).
-
Command synthesis. If a pattern matches, the shim builds the equivalent bash command using
perlfor content injection and verification. Content is passed through environment variables, never interpolated into the command string. -
Post-execution verification. After every synthetic file operation, the shim runs a verification step:
- Create/write: Reads the file back and compares byte-for-byte against expected content
- Append: Verifies the appended text appears in the file
- Replace: Verifies the new string is present and the old string is gone
- Insert: Verifies the combined anchor+insertion text appears in the file
- Delete: Verifies the file no longer exists
- Rename: Verifies the source is gone and the destination exists
-
Failure on mismatch. If verification fails, the shim returns a non-zero exit code with a specific error (e.g.,
verification failed: create, exit 91). It never silently claims success.
QwenCode handles upstream failures gracefully:
| Failure | What happens |
|---|---|
| Ollama is unreachable | Returns a 502 error with the upstream error message |
| Ollama request times out | Returns a 504 with timeout details (default: 120s, configurable) |
| Model returns malformed JSON | Returns a 502 "invalid upstream response" error |
| Streaming is interrupted mid-response | Cancels the upstream request, closes the SSE stream cleanly |
| Model ignores tool calls entirely | Falls back to text-only response (no crash, no loop) |
| Model's tool call has malformed arguments | Coerces arguments to a valid object; proceeds rather than crashing |
| Synthetic fallback verification fails | Returns the verification error; Claude Code sees a failed tool result and can retry |
| Client disconnects mid-stream | Upstream request is cancelled, resources are cleaned up |
| Request body exceeds 50MB | Returns 413 immediately |
The shim never retries automatically. If something fails, it surfaces the error clearly and lets Claude Code decide what to do next.
The shim supports:
POST /v1/messages(streaming and non-streaming)GET /v1/modelsGET /health
system+messagesarraytext,tool_use, andtool_resultcontent blocks- Usage fields and stop reasons
- Ollama tool calls mapped back to
tool_useblocks
- Create file / overwrite file / read file
- Append text / replace exact string
- Insert text before or after an exact string
- Rename or move file / delete file
- Create directory / list directory
- Multi-file scaffold from a fenced bash script
- Upstream request timeout handling (configurable)
- Malformed upstream JSON handling
- Interrupted-stream cancellation with cleanup
- Rate-limit error mapping
- Binary file detection for reads
- Large-file truncation (configurable)
- Client disconnect detection and upstream cancellation
- Continuation-turn handling
- Large-context conversation handling
QwenCode is strongest when the request is explicit and operationally narrow.
Works well:
- "create a file called
xwith the texty" - "append
ytox" - "replace the exact string
awithbinx" - "insert
yafterxin filez" - "create a Python script and a README for a demo app"
- "read
package.jsonand summarize it" - "what files are in this directory?"
Still weaker:
- Broad open-ended refactors across many files
- Complicated multi-file architectural edits from vague prose
- Tasks where the model must choose many precise tool calls without rescue
| Variable | Default | Description |
|---|---|---|
HOST |
127.0.0.1 |
Bind host |
PORT |
8000 |
Bind port |
OLLAMA_BASE_URL |
http://127.0.0.1:11434 |
Ollama endpoint |
OLLAMA_MODEL |
qwen2.5-coder:14b |
Model name |
OLLAMA_AUTH_TOKEN |
Optional bearer token for Ollama gateways | |
OLLAMA_NUM_CTX |
Context window override — strongly recommended; see Context Window Handling | |
SHIM_MAX_TOOLS |
8 |
Max tools forwarded to the model |
SHIM_API_KEY |
Optional shared secret | |
SHIM_USE_REQUESTED_MODEL |
Honor incoming model field instead of forcing OLLAMA_MODEL |
|
SHIM_LOG |
info |
Set to debug for verbose logging |
SHIM_MAX_READ_BYTES |
200000 |
Max bytes for synthetic file reads |
SHIM_REQUEST_TIMEOUT_MS |
120000 |
Upstream request timeout |
QwenCode/
├── src/
│ └── server.mjs # The entire shim (single file, zero dependencies)
├── scripts/
│ ├── smoke-tests.mjs # Happy-path integration tests
│ ├── error-smoke-tests.mjs # Error/reliability tests
│ └── mock-ollama.mjs # Mock upstream for error tests
├── launch-shim.ps1 # Windows shim launcher
├── launch-shim.sh # macOS/Linux shim launcher
├── launch-client.ps1 # Windows client launcher
├── launch-client.sh # macOS/Linux client launcher
├── package.json
├── LICENSE # MIT
└── README.md
npm run smoke exercises the live shim against a real Ollama backend and validates: create, append, replace, multiline content preservation, empty-string replace, binary reads, large-file truncation, directory creation, nested file creation, unicode content, rename, streaming text, streaming tool use, and continuation-turn fallback behavior.
npm run smoke:error uses a mock Ollama server to simulate bad upstream behavior and validates: request timeouts, malformed JSON, malformed streaming output, interrupted streaming cancellation, rate limits, fenced multi-file scaffold rescue, and large-context continuation behavior.
Both test suites are self-contained and use a local .tmp-smoke folder.
This is not a full Anthropic API implementation. It supports the subset needed to make Claude Code work against a local Qwen backend.
- Native model tool-calling is still less reliable than Claude-hosted models
- Many successful flows depend on synthetic Bash fallback
- Broad open-ended editing is less trustworthy than constrained edits
- Multimodal content and provider-specific beta features are not supported
- Claude Code's system prompt is large, so local latency is noticeable even with a fast GPU
- Claude Code must be installed separately; QwenCode does not work without it
qwen2.5-coder:14brequires approximately 9GB of VRAM; 8GB cards will fall back to CPU offload- On Windows, synthetic Bash execution requires Git Bash or WSL in PATH
- Ollama defaults to a 2048-token context window; long sessions will silently truncate without setting
OLLAMA_NUM_CTX - Shim updates are reactive to Claude Code changes rather than proactively versioned alongside Claude Code releases; there is no committed SLA for how quickly a breaking Claude Code release will be addressed
- Bug reports and feature requests: GitHub Issues
- Questions: Open a Discussion or file an issue
MIT. See LICENSE.
Built by Strife Technologies