Skip to content

strifero/QwenCode

Repository files navigation

QwenCode

QwenCode is a local compatibility layer that lets Claude Code talk to an Ollama-hosted Qwen model instead of Anthropic's cloud API.

Claude Code is Anthropic's terminal-based AI coding agent. It reads your files, runs shell commands, and edits code through a structured tool-calling protocol. Normally it requires an Anthropic API key and sends your code to the cloud. QwenCode removes both of those requirements by routing the same interface through a local model.

Project Status

QwenCode is actively maintained. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. It is not a completed, maintenance-only project — new Claude Code releases can require shim updates, and those updates are issued on a best-effort basis.

Current state: Tested and working against Claude Code versions available as of April 2026.

Maintenance model: There is no fixed release schedule. Updates are issued when Claude Code introduces breaking changes to its API contract or when reliability improvements are warranted. The test suite is designed to catch regressions introduced by upstream changes. Watch the GitHub repository for releases to stay current.

Long-term viability caveat: QwenCode depends on Claude Code as an external dependency. If Anthropic significantly changes Claude Code's internal protocol, the shim will need to follow. There is no committed SLA for how quickly a breaking upstream release will be addressed. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.

Changelog

April 2026

  • Initial public release
  • Tested against Claude Code versions available as of April 2026
  • Supports POST /v1/messages (streaming and non-streaming), GET /v1/models, GET /health
  • Synthetic fallback layer covering: create, overwrite, append, replace, insert, rename, delete, create directory, list directory, multi-file scaffold
  • Post-execution verification for all synthetic file operations
  • Full error-path test suite covering timeouts, malformed JSON, interrupted streams, rate limits, and large-context continuation
  • Windows support via PowerShell launchers (requires Git Bash or WSL for synthetic Bash execution)
  • Zero Node.js dependencies

Releases are tagged on the GitHub repository. If you are evaluating whether QwenCode is current relative to a recent Claude Code release, the release tags are the authoritative reference.

Why This Exists

Claude Code expects Messages-style request and streaming semantics. Local models exposed through Ollama speak a different protocol and often fail to emit reliable native tool calls under Claude Code's tool-heavy prompt.

QwenCode bridges that gap by:

  • exposing a subset of a Messages-style API on localhost
  • translating Claude Code requests into Ollama /api/chat calls
  • translating Ollama responses back into compatible responses and SSE events
  • rescuing common coding workflows with verified synthetic Bash tool calls when the model describes an action instead of calling a tool

The result: you keep Claude Code's full interface (file editing, bash commands, tool orchestration) running entirely on your machine with no API key and no per-token cost.

How QwenCode Compares to Aider and Continue

If you are evaluating local terminal coding agents, you have likely also looked at Aider and Continue. Here is how they differ from QwenCode:

Aider is its own standalone agent with its own interface, edit formats, and model abstraction layer. It supports many local and cloud models directly. If you want a fully self-contained agent that does not depend on Claude Code, Aider is a reasonable choice. QwenCode is not trying to replace Aider — it is a different bet: you get Claude Code's specific tool-calling loop and interface, running against a local model instead.

Continue is primarily an IDE extension (VS Code, JetBrains) rather than a terminal agent. It integrates local models into your editor's chat and autocomplete. If you prefer staying inside your editor, Continue covers that use case. QwenCode is terminal-first and does not require an IDE.

QwenCode's specific tradeoff: You are preserving Claude Code's exact agent loop — its tool orchestration, file-editing protocol, and bash execution flow — while replacing only the backend. This means:

  • You get Claude Code's structured tool use rather than a different agent's edit conventions
  • The synthetic fallback layer means more of Claude Code's workflows succeed even when the local model does not emit a native tool call
  • Post-execution verification means file operations are confirmed correct rather than silently assumed
  • There is no separate CLI to learn; if you already use Claude Code, the interface is identical

The honest limitation: Claude Code is the dependency. QwenCode does not work without it, and local model tool-calling is still less reliable than Claude-hosted models. Aider's own model abstraction may be more forgiving of model-specific quirks if you plan to switch models frequently. QwenCode is optimized specifically for Qwen models running through Ollama under Claude Code's protocol.

Platform Support

QwenCode runs on Windows, macOS, and Linux. The shim itself is plain Node.js with zero dependencies.

Platform Shim Launcher Client Launcher
Windows launch-shim.ps1 launch-client.ps1
macOS launch-shim.sh launch-client.sh
Linux launch-shim.sh launch-client.sh

On macOS/Linux, make the scripts executable after cloning:

chmod +x launch-shim.sh launch-client.sh

Windows note: Synthetic Bash tool execution on Windows requires a working bash executable in PATH. Git Bash or WSL both satisfy this requirement. This is easy to miss — if Claude Code's bash operations fail on Windows, this is the first thing to check. See Runtime Requirements below.

Runtime Requirements

  • Node.js 18+
  • Ollama running locally (or on a reachable host)
  • A Qwen-compatible model pulled in Ollama
  • Claude Code installed and runnable as claude
  • On Windows, a working bash executable in PATH for synthetic Bash tool execution (Git Bash or WSL work)

Recommended model: qwen2.5-coder:14b (default in the launchers). This performed better for constrained coding tasks than larger general-purpose models in testing.

GPU memory requirement: qwen2.5-coder:14b requires approximately 9GB of VRAM to run fully on-GPU. Cards with 8GB of VRAM (such as the RTX 3070 8GB or RTX 4060 8GB) will not fit the model entirely in GPU memory and will fall back to partial CPU offload, which substantially reduces throughput. If you have an 8GB card, consider qwen2.5-coder:7b as an alternative, or accept CPU-offloaded performance. See the Performance Expectations table for context.

Quick Start

1. Pull the model

ollama pull qwen2.5-coder:14b

2. Clone and install

git clone https://github.com/strifero/QwenCode.git
cd QwenCode

No npm install needed. Zero dependencies.

3. Start the shim

macOS / Linux:

./launch-shim.sh

Windows (PowerShell):

.\launch-shim.ps1

4. Launch Claude Code against it

In a second terminal:

macOS / Linux:

./launch-client.sh

Windows (PowerShell):

.\launch-client.ps1

You can pass Claude Code arguments through:

./launch-client.sh -- --dangerously-skip-permissions

Performance Expectations

QwenCode adds minimal overhead. The shim itself is a thin translation layer. What you will notice is that local model inference is slower than the Anthropic API.

Rough expectations with qwen2.5-coder:14b:

Hardware First token Throughput Practical feel
RTX 3060 (12GB) 2-4s ~25 tok/s Usable for focused tasks
RTX 4070 (12GB) 1-3s ~35 tok/s Comfortable daily driver
RTX 4090 (24GB) <1s ~50 tok/s Near-cloud feel
M2 Pro (16GB) 2-5s ~20 tok/s Usable, patience helps
CPU only 10-30s ~3 tok/s Not recommended

The benchmarks above are from macOS and Linux systems. Windows performance with the same hardware is expected to be similar when the model runs fully on GPU, but has not been independently benchmarked. If you are evaluating QwenCode on Windows for long-running use, treat the figures above as directional rather than exact.

Cards with less than 9GB of VRAM will not appear in this table because qwen2.5-coder:14b requires approximately 9GB of VRAM for full GPU inference. Running with CPU offload on such cards will produce throughput closer to the CPU-only row.

The main latency you will feel is on the first response of each turn, because Claude Code's system prompt is large and the model must process it. Subsequent exchanges in the same session are faster.

For comparison, native Claude Code against the Anthropic API typically responds in under 2 seconds with throughput limited only by network speed.

Claude Code Compatibility

QwenCode is built and tested against Claude Code CLI (the claude command).

Tested with: Claude Code versions available as of April 2026. The shim translates the Messages API format and tool-calling protocol, which has been stable across Claude Code releases.

Maintenance cadence: QwenCode does not follow a fixed release schedule tied to Claude Code releases. The shim is updated reactively when Claude Code changes its internal API contract in ways that break the translation layer. The test suite is designed to catch these regressions. If you are evaluating QwenCode for long-term or production use, the practical risk is that a Claude Code update could require a shim update before things work again. Watching the GitHub repository for releases is the best way to stay current.

There is no committed SLA for how quickly a breaking Claude Code release will be addressed. Fixes are issued on a best-effort basis. If you depend on QwenCode in a time-sensitive environment, pin your Claude Code version and upgrade only after confirming shim compatibility.

What could break: If Anthropic changes Claude Code's internal API contract (new required fields, new tool formats, new streaming event types), the shim may need updates. The test suite is designed to catch these regressions.

What is not supported:

  • Multimodal content (image uploads)
  • Provider-specific beta features
  • Extended thinking / streaming thinking blocks
  • MCP server passthrough from Claude Code

Context Window Handling

Ollama defaults to a context window of 2048 tokens for most models, which is much smaller than what Claude Code expects for a typical session. Claude Code's system prompt alone is large, and a multi-turn session accumulates conversation history quickly. If you run QwenCode without adjusting this, Ollama will silently truncate the conversation once it exceeds 2048 tokens. The model will still respond, but it will have lost earlier context — edits may become inconsistent, the model may forget files it has already read, and tool-calling reliability will degrade without any obvious error.

QwenCode exposes the OLLAMA_NUM_CTX environment variable to set the context window size passed to Ollama on every request. The launch scripts do not set a default value for this variable, so Ollama's own default applies unless you override it.

Recommended approach: Set OLLAMA_NUM_CTX explicitly before starting the shim. A value of 8192 is a reasonable starting point for most Claude Code sessions. Higher values consume more VRAM; the right number depends on your available GPU memory and session length.

# macOS / Linux
OLLAMA_NUM_CTX=8192 ./launch-shim.sh

# Windows (PowerShell)
$env:OLLAMA_NUM_CTX = "8192"
.\launch-shim.ps1

If you notice response quality degrading mid-session — especially on tasks that reference earlier conversation turns — context truncation is the first thing to investigate. Enable SHIM_LOG=debug to see what context size is being sent to Ollama on each request.

How Synthetic Fallback Works

The most common failure mode with local models is: Claude Code asks the model to call a tool, and instead the model writes out what it would do in plain text. For example, instead of calling the Write tool, the model says "I'll create a file called app.py with the following content..."

QwenCode handles this with a synthetic fallback layer:

  1. Intent parsing. When the model returns text instead of a tool call, the shim parses the text against a set of known intent patterns (create file, read file, append, replace, insert, delete, rename, create directory, list directory, multi-file scaffold).

  2. Command synthesis. If a pattern matches, the shim builds the equivalent bash command using perl for content injection and verification. Content is passed through environment variables, never interpolated into the command string.

  3. Post-execution verification. After every synthetic file operation, the shim runs a verification step:

    • Create/write: Reads the file back and compares byte-for-byte against expected content
    • Append: Verifies the appended text appears in the file
    • Replace: Verifies the new string is present and the old string is gone
    • Insert: Verifies the combined anchor+insertion text appears in the file
    • Delete: Verifies the file no longer exists
    • Rename: Verifies the source is gone and the destination exists
  4. Failure on mismatch. If verification fails, the shim returns a non-zero exit code with a specific error (e.g., verification failed: create, exit 91). It never silently claims success.

When Things Go Wrong

QwenCode handles upstream failures gracefully:

Failure What happens
Ollama is unreachable Returns a 502 error with the upstream error message
Ollama request times out Returns a 504 with timeout details (default: 120s, configurable)
Model returns malformed JSON Returns a 502 "invalid upstream response" error
Streaming is interrupted mid-response Cancels the upstream request, closes the SSE stream cleanly
Model ignores tool calls entirely Falls back to text-only response (no crash, no loop)
Model's tool call has malformed arguments Coerces arguments to a valid object; proceeds rather than crashing
Synthetic fallback verification fails Returns the verification error; Claude Code sees a failed tool result and can retry
Client disconnects mid-stream Upstream request is cancelled, resources are cleaned up
Request body exceeds 50MB Returns 413 immediately

The shim never retries automatically. If something fails, it surfaces the error clearly and lets Claude Code decide what to do next.

Core Capabilities

API compatibility

The shim supports:

  • POST /v1/messages (streaming and non-streaming)
  • GET /v1/models
  • GET /health

Message translation

  • system + messages array
  • text, tool_use, and tool_result content blocks
  • Usage fields and stop reasons
  • Ollama tool calls mapped back to tool_use blocks

Synthetic fallback operations

  • Create file / overwrite file / read file
  • Append text / replace exact string
  • Insert text before or after an exact string
  • Rename or move file / delete file
  • Create directory / list directory
  • Multi-file scaffold from a fenced bash script

Reliability features

  • Upstream request timeout handling (configurable)
  • Malformed upstream JSON handling
  • Interrupted-stream cancellation with cleanup
  • Rate-limit error mapping
  • Binary file detection for reads
  • Large-file truncation (configurable)
  • Client disconnect detection and upstream cancellation
  • Continuation-turn handling
  • Large-context conversation handling

Supported Workflow Profile

QwenCode is strongest when the request is explicit and operationally narrow.

Works well:

  • "create a file called x with the text y"
  • "append y to x"
  • "replace the exact string a with b in x"
  • "insert y after x in file z"
  • "create a Python script and a README for a demo app"
  • "read package.json and summarize it"
  • "what files are in this directory?"

Still weaker:

  • Broad open-ended refactors across many files
  • Complicated multi-file architectural edits from vague prose
  • Tasks where the model must choose many precise tool calls without rescue

Configuration

Variable Default Description
HOST 127.0.0.1 Bind host
PORT 8000 Bind port
OLLAMA_BASE_URL http://127.0.0.1:11434 Ollama endpoint
OLLAMA_MODEL qwen2.5-coder:14b Model name
OLLAMA_AUTH_TOKEN Optional bearer token for Ollama gateways
OLLAMA_NUM_CTX Context window override — strongly recommended; see Context Window Handling
SHIM_MAX_TOOLS 8 Max tools forwarded to the model
SHIM_API_KEY Optional shared secret
SHIM_USE_REQUESTED_MODEL Honor incoming model field instead of forcing OLLAMA_MODEL
SHIM_LOG info Set to debug for verbose logging
SHIM_MAX_READ_BYTES 200000 Max bytes for synthetic file reads
SHIM_REQUEST_TIMEOUT_MS 120000 Upstream request timeout

Project Layout

QwenCode/
├── src/
│   └── server.mjs          # The entire shim (single file, zero dependencies)
├── scripts/
│   ├── smoke-tests.mjs      # Happy-path integration tests
│   ├── error-smoke-tests.mjs # Error/reliability tests
│   └── mock-ollama.mjs      # Mock upstream for error tests
├── launch-shim.ps1          # Windows shim launcher
├── launch-shim.sh           # macOS/Linux shim launcher
├── launch-client.ps1        # Windows client launcher
├── launch-client.sh         # macOS/Linux client launcher
├── package.json
├── LICENSE                  # MIT
└── README.md

Tests

Happy-path smoke tests

npm run smoke exercises the live shim against a real Ollama backend and validates: create, append, replace, multiline content preservation, empty-string replace, binary reads, large-file truncation, directory creation, nested file creation, unicode content, rename, streaming text, streaming tool use, and continuation-turn fallback behavior.

Error-path smoke tests

npm run smoke:error uses a mock Ollama server to simulate bad upstream behavior and validates: request timeouts, malformed JSON, malformed streaming output, interrupted streaming cancellation, rate limits, fenced multi-file scaffold rescue, and large-context continuation behavior.

Both test suites are self-contained and use a local .tmp-smoke folder.

Limitations

This is not a full Anthropic API implementation. It supports the subset needed to make Claude Code work against a local Qwen backend.

  • Native model tool-calling is still less reliable than Claude-hosted models
  • Many successful flows depend on synthetic Bash fallback
  • Broad open-ended editing is less trustworthy than constrained edits
  • Multimodal content and provider-specific beta features are not supported
  • Claude Code's system prompt is large, so local latency is noticeable even with a fast GPU
  • Claude Code must be installed separately; QwenCode does not work without it
  • qwen2.5-coder:14b requires approximately 9GB of VRAM; 8GB cards will fall back to CPU offload
  • On Windows, synthetic Bash execution requires Git Bash or WSL in PATH
  • Ollama defaults to a 2048-token context window; long sessions will silently truncate without setting OLLAMA_NUM_CTX
  • Shim updates are reactive to Claude Code changes rather than proactively versioned alongside Claude Code releases; there is no committed SLA for how quickly a breaking Claude Code release will be addressed

Support

License

MIT. See LICENSE.


Built by Strife Technologies

About

QwenCode is a local compatibility layer that lets Claude Code talk to an Ollama-hosted Qwen model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors