Claude Code Proxy

Run any Cloudflare Workers AI model in Claude Code — Nemotron 120B, Gemma 4, Llama 3.3, and more.

A lightweight Cloudflare Worker that translates between Anthropic's Messages API (what Claude Code speaks) and OpenAI's Chat Completions API (what Workers AI speaks). Supports streaming, tool calling, and thinking/reasoning tokens.

How it works

Claude Code CLI (Anthropic format)
  → This proxy (translates Anthropic ↔ OpenAI)
    → Cloudflare AI Gateway (routing, caching, analytics)
      → Workers AI (Nemotron, Gemma, Llama, etc.)

Quick start

1. Clone and configure

git clone https://github.com/quantum-encoding/claude-code-proxy.git
cd claude-code-proxy
npm install

Edit src/models.ts to add/remove models.

2. Set up Cloudflare

You need:

A Cloudflare account
An AI Gateway (free, takes 30 seconds)
An AI Gateway API token with "Run" permission

3. Deploy

# Set your secrets
echo "YOUR_GATEWAY_URL" | npx wrangler secret put CF_AI_GATEWAY_URL
echo "YOUR_AIG_TOKEN" | npx wrangler secret put CF_AIG_TOKEN
echo "YOUR_PROXY_PASSWORD" | npx wrangler secret put PROXY_AUTH_TOKEN

# Deploy
npx wrangler deploy

The gateway URL format is: https://gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}

4. Create a wrapper script

cp run-model.sh.example ~/.local/bin/run-nemo
chmod +x ~/.local/bin/run-nemo

Edit the script and fill in your proxy URL and auth token. Then:

run-nemo

Supported features

Feature	Status
Text generation	Working
Streaming (SSE)	Working
Tool calling	Working
Thinking/reasoning	Working (as Anthropic thinking blocks)
System prompts	Working
Multi-turn conversation	Working
Vision/images	Not supported (Workers AI limitation)

Model configuration

Edit src/models.ts:

export const MODEL_MAP: Record<string, string> = {
  'nemotron': 'workers-ai/@cf/nvidia/nemotron-3-120b-a12b',
  'gemma4': 'workers-ai/@cf/google/gemma-4-26b-a4b-it',
  'llama': 'workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast',
  // Add your own:
  // 'my-model': 'workers-ai/@cf/provider/model-name',
};

Then redeploy: npx wrangler deploy

Pricing

The proxy itself runs on Cloudflare Workers free tier (100k requests/day). You only pay for Workers AI inference:

Model	Input	Output
Nemotron 3 120B	$0.50/M tokens	$1.50/M tokens
Gemma 4 26B	$0.10/M tokens	$0.30/M tokens
Llama 3.3 70B	$0.20/M tokens	$0.60/M tokens

How the translation works

The proxy handles two critical translations:

Request: Anthropic → OpenAI

system prompt → messages[0].role: "system"
tool_choice.type: "any" → "auto" (some models don't support "required")
input_schema → parameters
Content blocks (text, tool_use, tool_result) → flat messages

Response: OpenAI → Anthropic (streaming)

delta.content → content_block_delta with text_delta
delta.reasoning → content_block_delta with thinking_delta
delta.tool_calls → buffered, then emitted as content_block_start + input_json_delta
finish_reason: "tool_calls" → stop_reason: "tool_use"

The streaming translation uses a TransformStream to pipe events in real-time.

Known quirks

Nemotron uses reasoning for all output even with enable_thinking: false. The proxy maps this to Anthropic thinking blocks.
tool_choice: "required" crashes Nemotron. The proxy maps Anthropic's "any" to "auto" instead.
No vision support. Workers AI models don't support image inputs through the gateway.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
run-model.sh.example		run-model.sh.example
setup.sh		setup.sh
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Proxy

How it works

Quick start

1. Clone and configure

2. Set up Cloudflare

3. Deploy

4. Create a wrapper script

Supported features

Model configuration

Pricing

How the translation works

Request: Anthropic → OpenAI

Response: OpenAI → Anthropic (streaming)

Known quirks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code Proxy

How it works

Quick start

1. Clone and configure

2. Set up Cloudflare

3. Deploy

4. Create a wrapper script

Supported features

Model configuration

Pricing

How the translation works

Request: Anthropic → OpenAI

Response: OpenAI → Anthropic (streaming)

Known quirks

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages