Skip to content

quantum-encoding/claude-code-proxy

Repository files navigation

Claude Code Proxy

Run any Cloudflare Workers AI model in Claude Code — Nemotron 120B, Gemma 4, Llama 3.3, and more.

A lightweight Cloudflare Worker that translates between Anthropic's Messages API (what Claude Code speaks) and OpenAI's Chat Completions API (what Workers AI speaks). Supports streaming, tool calling, and thinking/reasoning tokens.

How it works

Claude Code CLI (Anthropic format)
  → This proxy (translates Anthropic ↔ OpenAI)
    → Cloudflare AI Gateway (routing, caching, analytics)
      → Workers AI (Nemotron, Gemma, Llama, etc.)

Quick start

1. Clone and configure

git clone https://github.com/quantum-encoding/claude-code-proxy.git
cd claude-code-proxy
npm install

Edit src/models.ts to add/remove models.

2. Set up Cloudflare

You need:

3. Deploy

# Set your secrets
echo "YOUR_GATEWAY_URL" | npx wrangler secret put CF_AI_GATEWAY_URL
echo "YOUR_AIG_TOKEN" | npx wrangler secret put CF_AIG_TOKEN
echo "YOUR_PROXY_PASSWORD" | npx wrangler secret put PROXY_AUTH_TOKEN

# Deploy
npx wrangler deploy

The gateway URL format is: https://gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}

4. Create a wrapper script

cp run-model.sh.example ~/.local/bin/run-nemo
chmod +x ~/.local/bin/run-nemo

Edit the script and fill in your proxy URL and auth token. Then:

run-nemo

Supported features

Feature Status
Text generation Working
Streaming (SSE) Working
Tool calling Working
Thinking/reasoning Working (as Anthropic thinking blocks)
System prompts Working
Multi-turn conversation Working
Vision/images Not supported (Workers AI limitation)

Model configuration

Edit src/models.ts:

export const MODEL_MAP: Record<string, string> = {
  'nemotron': 'workers-ai/@cf/nvidia/nemotron-3-120b-a12b',
  'gemma4': 'workers-ai/@cf/google/gemma-4-26b-a4b-it',
  'llama': 'workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast',
  // Add your own:
  // 'my-model': 'workers-ai/@cf/provider/model-name',
};

Then redeploy: npx wrangler deploy

Pricing

The proxy itself runs on Cloudflare Workers free tier (100k requests/day). You only pay for Workers AI inference:

Model Input Output
Nemotron 3 120B $0.50/M tokens $1.50/M tokens
Gemma 4 26B $0.10/M tokens $0.30/M tokens
Llama 3.3 70B $0.20/M tokens $0.60/M tokens

How the translation works

The proxy handles two critical translations:

Request: Anthropic → OpenAI

  • system prompt → messages[0].role: "system"
  • tool_choice.type: "any""auto" (some models don't support "required")
  • input_schemaparameters
  • Content blocks (text, tool_use, tool_result) → flat messages

Response: OpenAI → Anthropic (streaming)

  • delta.contentcontent_block_delta with text_delta
  • delta.reasoningcontent_block_delta with thinking_delta
  • delta.tool_calls → buffered, then emitted as content_block_start + input_json_delta
  • finish_reason: "tool_calls"stop_reason: "tool_use"

The streaming translation uses a TransformStream to pipe events in real-time.

Known quirks

  • Nemotron uses reasoning for all output even with enable_thinking: false. The proxy maps this to Anthropic thinking blocks.
  • tool_choice: "required" crashes Nemotron. The proxy maps Anthropic's "any" to "auto" instead.
  • No vision support. Workers AI models don't support image inputs through the gateway.

License

MIT

About

Run any Cloudflare Workers AI model in Claude Code — Anthropic ↔ OpenAI translation proxy

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors