QwenCode

QwenCode is a local compatibility layer that lets Claude Code talk to an Ollama-hosted Qwen model instead of Anthropic's cloud API.

Claude Code is Anthropic's terminal-based AI coding agent. It reads your files, runs shell commands, and edits code through a structured tool-calling protocol. Normally it requires an Anthropic API key and sends your code to the cloud. QwenCode removes both of those requirements by routing the same interface through a local model.

Project Status

QwenCode is actively maintained. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. It is not a completed, maintenance-only project — new Claude Code releases can require shim updates, and those updates are issued on a best-effort basis.

Current state: Tested and working against Claude Code versions available as of April 2026.

Maintenance model: There is no fixed release schedule. Updates are issued when Claude Code introduces breaking changes to its API contract or when reliability improvements are warranted. The test suite is designed to catch regressions introduced by upstream changes. Watch the GitHub repository for releases to stay current.

Long-term viability caveat: QwenCode depends on Claude Code as an external dependency. If Anthropic significantly changes Claude Code's internal protocol, the shim will need to follow. There is no committed SLA for how quickly a breaking upstream release will be addressed. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.

Changelog

April 2026

Initial public release
Tested against Claude Code versions available as of April 2026
Supports POST /v1/messages (streaming and non-streaming), GET /v1/models, GET /health
Synthetic fallback layer covering: create, overwrite, append, replace, insert, rename, delete, create directory, list directory, multi-file scaffold
Post-execution verification for all synthetic file operations
Full error-path test suite covering timeouts, malformed JSON, interrupted streams, rate limits, and large-context continuation
Windows support via PowerShell launchers (requires Git Bash or WSL for synthetic Bash execution)
Zero Node.js dependencies

Releases are tagged on the GitHub repository. If you are evaluating whether QwenCode is current relative to a recent Claude Code release, the release tags are the authoritative reference.

Why This Exists

Claude Code expects Messages-style request and streaming semantics. Local models exposed through Ollama speak a different protocol and often fail to emit reliable native tool calls under Claude Code's tool-heavy prompt.

QwenCode bridges that gap by:

exposing a subset of a Messages-style API on localhost
translating Claude Code requests into Ollama /api/chat calls
translating Ollama responses back into compatible responses and SSE events
rescuing common coding workflows with verified synthetic Bash tool calls when the model describes an action instead of calling a tool

The result: you keep Claude Code's full interface (file editing, bash commands, tool orchestration) running entirely on your machine with no API key and no per-token cost.

How QwenCode Compares to Aider and Continue

If you are evaluating local terminal coding agents, you have likely also looked at Aider and Continue. Here is how they differ from QwenCode:

Aider is its own standalone agent with its own interface, edit formats, and model abstraction layer. It supports many local and cloud models directly. If you want a fully self-contained agent that does not depend on Claude Code, Aider is a reasonable choice. QwenCode is not trying to replace Aider — it is a different bet: you get Claude Code's specific tool-calling loop and interface, running against a local model instead.

Continue is primarily an IDE extension (VS Code, JetBrains) rather than a terminal agent. It integrates local models into your editor's chat and autocomplete. If you prefer staying inside your editor, Continue covers that use case. QwenCode is terminal-first and does not require an IDE.

QwenCode's specific tradeoff: You are preserving Claude Code's exact agent loop — its tool orchestration, file-editing protocol, and bash execution flow — while replacing only the backend. This means:

You get Claude Code's structured tool use rather than a different agent's edit conventions
The synthetic fallback layer means more of Claude Code's workflows succeed even when the local model does not emit a native tool call
Post-execution verification means file operations are confirmed correct rather than silently assumed
There is no separate CLI to learn; if you already use Claude Code, the interface is identical

The honest limitation: Claude Code is the dependency. QwenCode does not work without it, and local model tool-calling is still less reliable than Claude-hosted models. Aider's own model abstraction may be more forgiving of model-specific quirks if you plan to switch models frequently. QwenCode is optimized specifically for Qwen models running through Ollama under Claude Code's protocol.

Platform Support

QwenCode runs on Windows, macOS, and Linux. The shim itself is plain Node.js with zero dependencies.

Platform	Shim Launcher	Client Launcher
Windows	`launch-shim.ps1`	`launch-client.ps1`
macOS	`launch-shim.sh`	`launch-client.sh`
Linux	`launch-shim.sh`	`launch-client.sh`

On macOS/Linux, make the scripts executable after cloning:

chmod +x launch-shim.sh launch-client.sh

Windows note: Synthetic Bash tool execution on Windows requires a working bash executable in PATH. Git Bash or WSL both satisfy this requirement. This is easy to miss — if Claude Code's bash operations fail on Windows, this is the first thing to check. See Runtime Requirements below.

Runtime Requirements

Node.js 18+
Ollama running locally (or on a reachable host)
A Qwen-compatible model pulled in Ollama
Claude Code installed and runnable as claude
On Windows, a working bash executable in PATH for synthetic Bash tool execution (Git Bash or WSL work)

Recommended model: qwen2.5-coder:14b (default in the launchers). This performed better for constrained coding tasks than larger general-purpose models in testing.

GPU memory requirement: qwen2.5-coder:14b requires approximately 9GB of VRAM to run fully on-GPU. Cards with 8GB of VRAM (such as the RTX 3070 8GB or RTX 4060 8GB) will not fit the model entirely in GPU memory and will fall back to partial CPU offload, which substantially reduces throughput. If you have an 8GB card, consider qwen2.5-coder:7b as an alternative, or accept CPU-offloaded performance. See the Performance Expectations table for context.

Quick Start

1. Pull the model

ollama pull qwen2.5-coder:14b

2. Clone and install

git clone https://github.com/strifero/QwenCode.git
cd QwenCode

No npm install needed. Zero dependencies.

3. Start the shim

macOS / Linux:

./launch-shim.sh

Windows (PowerShell):

.\launch-shim.ps1

4. Launch Claude Code against it

In a second terminal:

macOS / Linux:

./launch-client.sh

Windows (PowerShell):

.\launch-client.ps1

You can pass Claude Code arguments through:

./launch-client.sh -- --dangerously-skip-permissions

Performance Expectations

QwenCode adds minimal overhead. The shim itself is a thin translation layer. What you will notice is that local model inference is slower than the Anthropic API.

Rough expectations with qwen2.5-coder:14b:

Hardware	First token	Throughput	Practical feel
RTX 3060 (12GB)	2-4s	~25 tok/s	Usable for focused tasks
RTX 4070 (12GB)	1-3s	~35 tok/s	Comfortable daily driver
RTX 4090 (24GB)	<1s	~50 tok/s	Near-cloud feel
M2 Pro (16GB)	2-5s	~20 tok/s	Usable, patience helps
CPU only	10-30s	~3 tok/s	Not recommended

The benchmarks above are from macOS and Linux systems. Windows performance with the same hardware is expected to be similar when the model runs fully on GPU, but has not been independently benchmarked. If you are evaluating QwenCode on Windows for long-running use, treat the figures above as directional rather than exact.

Cards with less than 9GB of VRAM will not appear in this table because qwen2.5-coder:14b requires approximately 9GB of VRAM for full GPU inference. Running with CPU offload on such cards will produce throughput closer to the CPU-only row.

The main latency you will feel is on the first response of each turn, because Claude Code's system prompt is large and the model must process it. Subsequent exchanges in the same session are faster.

For comparison, native Claude Code against the Anthropic API typically responds in under 2 seconds with throughput limited only by network speed.

Claude Code Compatibility

QwenCode is built and tested against Claude Code CLI (the claude command).

Tested with: Claude Code versions available as of April 2026. The shim translates the Messages API format and tool-calling protocol, which has been stable across Claude Code releases.

Maintenance cadence: QwenCode does not follow a fixed release schedule tied to Claude Code releases. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. The test suite is designed to catch these regressions. If you are evaluating QwenCode for long-term or production use, the practical risk is that a Claude Code update could require a shim update before things work again. Watching the GitHub repository for releases is the best way to stay current.

There is no committed SLA for how quickly a breaking Claude Code release will be addressed. Fixes are issued on a best-effort basis. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.

What could break: If Anthropic changes Claude Code's internal API contract (new required fields, new tool formats, new streaming event types), the shim may need updates. The test suite is designed to catch these regressions.

What is not supported:

Multimodal content (image uploads)
Provider-specific beta features
Extended thinking / streaming thinking blocks
MCP server passthrough from Claude Code

Context Window Handling

Ollama defaults to a context window of 2048 tokens for most models, which is much smaller than what Claude Code expects for a typical session. Claude Code's system prompt alone is large, and a multi-turn session accumulates conversation history quickly. If you run QwenCode without adjusting this, Ollama will silently truncate the conversation once it exceeds 2048 tokens. The model will still respond, but it will have lost earlier context — edits may become inconsistent, the model may forget files it has already read, and tool-calling reliability will degrade without any obvious error.

QwenCode exposes the OLLAMA_NUM_CTX environment variable to set the context window size passed to Ollama on every request. The launch scripts do not set a default value for this variable, so Ollama's own default applies unless you override it.

Recommended approach: Set OLLAMA_NUM_CTX explicitly before starting the shim. A value of 8192 is a reasonable starting point for most Claude Code sessions. Higher values consume more VRAM; the right number depends on your available GPU memory and session length.

# macOS / Linux
OLLAMA_NUM_CTX=8192 ./launch-shim.sh

# Windows (PowerShell)
$env:OLLAMA_NUM_CTX = "8192"
.\launch-shim.ps1

If you notice response quality degrading mid-session — especially on tasks that reference earlier conversation turns — context truncation is the first thing to investigate. Enable SHIM_LOG=debug to see what context size is being sent to Ollama on each request.

How Synthetic Fallback Works

The most common failure mode with local models is: Claude Code asks the model to call a tool, and instead the model writes out what it would do in plain text. For example, instead of calling the Write tool, the model says "I'll create a file called app.py with the following content..."

QwenCode handles this with a synthetic fallback layer:

Intent parsing. When the model returns text instead of a tool call, the shim parses the text against a set of known intent patterns (create file, read file, append, replace, insert, delete, rename, create directory, list directory, multi-file scaffold).
Command synthesis. If a pattern matches, the shim builds the equivalent bash command using perl for content injection and verification. Content is passed through environment variables, never interpolated into the command string.
Post-execution verification. After every synthetic file operation, the shim runs a verification step:
- Create/write: Reads the file back and compares byte-for-byte against expected content
- Append: Verifies the appended text appears in the file
- Replace: Verifies the new string is present and the old string is gone
- Insert: Verifies the combined anchor+insertion text appears in the file
- Delete: Verifies the file no longer exists
- Rename: Verifies the source is gone and the destination exists
Failure on mismatch. If verification fails, the shim returns a non-zero exit code with a specific error (e.g., verification failed: create, exit 91). It never silently claims success.

When Things Go Wrong

QwenCode handles upstream failures gracefully:

Failure	What happens
Ollama is unreachable	Returns a 502 error with the upstream error message
Ollama request times out	Returns a 504 with timeout details (default: 120s, configurable)
Model returns malformed JSON	Returns a 502 "invalid upstream response" error
Streaming is interrupted mid-response	Cancels the upstream request, closes the SSE stream cleanly
Model ignores tool calls entirely	Falls back to text-only response (no crash, no loop)
Model's tool call has malformed arguments	Coerces arguments to a valid object; proceeds rather than crashing
Synthetic fallback verification fails	Returns the verification error; Claude Code sees a failed tool result and can retry
Client disconnects mid-stream	Upstream request is cancelled, resources are cleaned up
Request body exceeds 50MB	Returns 413 immediately

The shim never retries automatically. If something fails, it surfaces the error clearly and lets Claude Code decide what to do next.

Core Capabilities

API compatibility

The shim supports:

POST /v1/messages (streaming and non-streaming)
GET /v1/models
GET /health

Message translation

system + messages array
text, tool_use, and tool_result content blocks
Usage fields and stop reasons
Ollama tool calls mapped back to tool_use blocks

Synthetic fallback operations

Create file / overwrite file / read file
Append text / replace exact string
Insert text before or after an exact string
Rename or move file / delete file
Create directory / list directory
Multi-file scaffold from a fenced bash script

Reliability features

Upstream request timeout handling (configurable)
Malformed upstream JSON handling
Interrupted-stream cancellation with cleanup
Rate-limit error mapping
Binary file detection for reads
Large-file truncation (configurable)
Client disconnect detection and upstream cancellation
Continuation-turn handling
Large-context conversation handling

Supported Workflow Profile

QwenCode is strongest when the request is explicit and operationally narrow.

Works well:

"create a file called x with the text y"
"append y to x"
"replace the exact string a with b in x"
"insert y after x in file z"
"create a Python script and a README for a demo app"
"read package.json and summarize it"
"what files are in this directory?"

Still weaker:

Broad open-ended refactors across many files
Complicated multi-file architectural edits from vague prose
Tasks where the model must choose many precise tool calls without rescue

Configuration

Variable	Default	Description
`HOST`	`127.0.0.1`	Bind host
`PORT`	`8000`	Bind port
`OLLAMA_BASE_URL`	`http://127.0.0.1:11434`	Ollama endpoint
`OLLAMA_MODEL`	`qwen2.5-coder:14b`	Model name
`OLLAMA_AUTH_TOKEN`		Optional bearer token for Ollama gateways
`OLLAMA_NUM_CTX`		Context window override — strongly recommended; see Context Window Handling
`SHIM_MAX_TOOLS`	`8`	Max tools forwarded to the model
`SHIM_API_KEY`		Optional shared secret
`SHIM_USE_REQUESTED_MODEL`		Honor incoming model field instead of forcing `OLLAMA_MODEL`
`SHIM_LOG`	`info`	Set to `debug` for verbose logging
`SHIM_MAX_READ_BYTES`	`200000`	Max bytes for synthetic file reads
`SHIM_REQUEST_TIMEOUT_MS`	`120000`	Upstream request timeout

Project Layout

QwenCode/
├── src/
│   └── server.mjs          # The entire shim (single file, zero dependencies)
├── scripts/
│   ├── smoke-tests.mjs      # Happy-path integration tests
│   ├── error-smoke-tests.mjs # Error/reliability tests
│   └── mock-ollama.mjs      # Mock upstream for error tests
├── launch-shim.ps1          # Windows shim launcher
├── launch-shim.sh           # macOS/Linux shim launcher
├── launch-client.ps1        # Windows client launcher
├── launch-client.sh         # macOS/Linux client launcher
├── package.json
├── LICENSE                  # MIT
└── README.md

Tests

Happy-path smoke tests

npm run smoke exercises the live shim against a real Ollama backend and validates: create, append, replace, multiline content preservation, empty-string replace, binary reads, large-file truncation, directory creation, nested file creation, unicode content, rename, streaming text, streaming tool use, and continuation-turn fallback behavior.

Error-path smoke tests

npm run smoke:error uses a mock Ollama server to simulate bad upstream behavior and validates: request timeouts, malformed JSON, malformed streaming output, interrupted streaming cancellation, rate limits, fenced multi-file scaffold rescue, and large-context continuation behavior.

Both test suites are self-contained and use a local .tmp-smoke folder.

Limitations

This is not a full Anthropic API implementation. It supports the subset needed to make Claude Code work against a local Qwen backend.

Native model tool-calling is still less reliable than Claude-hosted models
Many successful flows depend on synthetic Bash fallback
Broad open-ended editing is less trustworthy than constrained edits
Multimodal content and provider-specific beta features are not supported
Claude Code's system prompt is large, so local latency is noticeable even with a fast GPU
Claude Code must be installed separately; QwenCode does not work without it
qwen2.5-coder:14b requires approximately 9GB of VRAM; 8GB cards will fall back to CPU offload
On Windows, synthetic Bash execution requires Git Bash or WSL in PATH
Ollama defaults to a 2048-token context window; long sessions will silently truncate without setting OLLAMA_NUM_CTX
Shim updates are reactive to Claude Code changes rather than proactively versioned alongside Claude Code releases; there is no committed SLA for how quickly a breaking Claude Code release will be addressed

Support

Bug reports and feature requests: GitHub Issues
Questions: Open a Discussion or file an issue

License

MIT. See LICENSE.

Built by Strife Technologies

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
launch-client.ps1		launch-client.ps1
launch-client.sh		launch-client.sh
launch-shim.ps1		launch-shim.ps1
launch-shim.sh		launch-shim.sh
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

QwenCode

Project Status

Changelog

April 2026

Why This Exists

How QwenCode Compares to Aider and Continue

Platform Support

Runtime Requirements

Quick Start

1. Pull the model

2. Clone and install

3. Start the shim

4. Launch Claude Code against it

Performance Expectations

Claude Code Compatibility

Context Window Handling

How Synthetic Fallback Works

When Things Go Wrong

Core Capabilities

API compatibility

Message translation

Synthetic fallback operations

Reliability features

Supported Workflow Profile

Configuration

Project Layout

Tests

Happy-path smoke tests

Error-path smoke tests

Limitations

Support

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages