Claude Code Proxy for Nebius

This repository contains a Nebius-focused Claude API proxy plus bundled MCP servers for local tool integration.

The proxy accepts Claude-compatible requests from Claude Code, translates them into OpenAI-compatible requests for Nebius-backed models, and converts responses back into Claude format. The repository also includes bundled MCP servers under MCP/.

Repository Layout

claude-code-proxy/
├── src/                      # Proxy implementation
├── tests/                    # Automated tests
├── docs/                     # Architecture and integration docs
├── MCP/                      # Bundled MCP servers
├── scripts/                  # Developer utilities
├── start_proxy.py            # Local convenience launcher
├── .mcp.json                 # Project-level Claude Code MCP config
├── pyproject.toml            # Python package metadata
└── README.md

Features

Claude /v1/messages proxying to Nebius OpenAI-compatible endpoints
Claude-to-OpenAI request conversion and OpenAI-to-Claude response conversion
Streaming SSE support
Schema-less Claude Code tool conversion
Image-aware routing to a vision model
Bundled MCP support with repo-relative launchers
Deterministic prefix-cache discipline for vLLM/SGLang KV reuse on Nebius
Anthropic-compatible /v1/messages/count_tokens (counts tools too)
Pair-aware context auto-truncation (never orphans tool_results)
Tool-call JSON repair (trailing commas, unescaped newlines) and duplicate tool-call dedup for open models — always on, no configuration needed

Quick Start

Prerequisites

Python 3.9+
Claude Code
Nebius API credentials
uv optional but recommended

Install

Using uv:

uv sync

Using pip:

python -m pip install -e ".[dev]"

Configure

cp .env.example .env

Required values:

OPENAI_API_KEY="your-nebius-api-key"
OPENAI_BASE_URL="https://api.tokenfactory.nebius.com/v1"

Common model settings:

BIG_MODEL="zai-org/GLM-4.7-FP8"
MIDDLE_MODEL="zai-org/GLM-4.7-FP8"
SMALL_MODEL="zai-org/GLM-4.7-FP8"
VISION_MODEL="Qwen/Qwen2.5-VL-72B-Instruct"
STRIP_IMAGE_CONTEXT="true"

Reasoning models

Several Nebius-hosted models emit hidden reasoning tokens before producing visible output. These tokens count against max_tokens, so very small budgets can return empty content. Known reasoning-style models on Nebius:

moonshotai/Kimi-K2.5
deepseek-ai/DeepSeek-V3.2
zai-org/GLM-5
Qwen/Qwen3-Next-80B-A3B-Thinking
Qwen/Qwen3-235B-A22B-Thinking-2507-fast

Implication: keep MAX_TOKENS_LIMIT and per-request max_tokens generous (>=4096 is recommended; 16k+ is safer for agentic tool-use loops). If a reasoning model returns empty text with a non-zero output_tokens count, the budget was exhausted by reasoning before any visible output was produced — raise the limit and retry.

Verify model availability and pick alternatives at:

curl -s https://api.tokenfactory.nebius.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[].id'

Run

python start_proxy.py

Or:

uv run claude-code-proxy-nebius

Use with Claude Code

Claude Code talks to the proxy via two environment variables: ANTHROPIC_BASE_URL (where to send requests) and ANTHROPIC_API_KEY (by default, the proxy ignores the client key and accepts any non-empty string).

To wire this up permanently, add the following to your shell rc (~/.zshrc or ~/.bashrc), then open a new terminal:

export ANTHROPIC_BASE_URL=http://localhost:8083
export ANTHROPIC_API_KEY=claude-local

Or run as a one-off, prefixing the env vars on the command line:

ANTHROPIC_BASE_URL=http://localhost:8083 ANTHROPIC_API_KEY=claude-local claude

If IGNORE_CLIENT_API_KEY=false, the client key must match ANTHROPIC_API_KEY.

Statusline indicator (optional)

Claude Code displays the model it requested (e.g. claude-sonnet-4-5), not the backend model the proxy actually served (e.g. moonshotai/Kimi-K2.5), so by default there is no in-UI indicator that you're routed through this proxy. A custom statusline fixes that. Add to ~/.claude/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "[ -z \"$ANTHROPIC_BASE_URL\" ] && exit 0; env_file=\"/Users/kiran/Desktop/git/claude-code-proxy/.env\"; model=$(grep -m1 '^BIG_MODEL=' \"$env_file\" 2>/dev/null | cut -d= -f2-); obs=$(grep -m1 '^OBSERVABILITY_ENABLED=' \"$env_file\" 2>/dev/null | cut -d= -f2-); port=$(grep -m1 '^PORT=' \"$env_file\" 2>/dev/null | cut -d= -f2-); port=${port:-8083}; if [ \"$obs\" = \"true\" ] && [ -n \"$model\" ]; then echo \"[nebius://$model] http://localhost:$port/dashboard\"; elif [ -n \"$model\" ]; then echo \"[nebius://$model]\"; else echo \"[proxy://$ANTHROPIC_BASE_URL]\"; fi"
  }
}

Replace /path/to/claude-code-proxy/.env with the absolute path to your .env. Behavior:

Bare claude (no proxy) → statusline is blank, no clutter.
Proxy-routed + observability enabled → statusline shows e.g. [nebius://MiniMax-M2.5] http://localhost:8083/dashboard.
Proxy-routed + observability disabled → statusline shows e.g. [nebius://MiniMax-M2.5].
If the .env path is unreadable → falls back to [proxy://<ANTHROPIC_BASE_URL>] so you still know an interceptor is active.

The command is read at session start, so re-open Claude Code after editing settings.json.

MCP Support

Bundled MCP servers live under MCP/.

Current MCPs:

MCP/macoscontrol-mcp: local macOS screen-control MCP

The project-level .mcp.json is checked in with repo-relative paths so the bundled MCP can be launched from a fresh clone without machine-specific absolute paths.

Testing

Run the full suite:

pytest -q

Useful targeted runs:

pytest -q tests/test_request_converter.py tests/test_response_converter.py
pytest -q tests/test_image_routing.py
RUN_PROXY_INTEGRATION_TESTS=1 pytest -q tests/test_main.py

Observability

The proxy serves a local dashboard at:

http://localhost:8083/dashboard

It tracks configured provider/model routing, token usage, estimated cost from MODEL_PRICES_JSON, latency, failures, and tool calls. Docker Compose persists the dashboard database under ./data/observability.sqlite3.

Development

Common commands:

uv run black src tests
uv run isort src tests
uv run mypy src

Documentation

Tracked project documentation lives in docs/:

Scope

This project is designed and tested specifically for Nebius token factory infrastructure. The current proxy behavior, defaults, and troubleshooting guidance are Nebius-centric rather than provider-agnostic.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
.vscode		.vscode
MCP		MCP
contrib		contrib
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
start_proxy.py		start_proxy.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Proxy for Nebius

Table of Contents

Repository Layout

Features

Quick Start

Prerequisites

Install

Configure

Reasoning models

Run

Use with Claude Code

Statusline indicator (optional)

MCP Support

Testing

Observability

Development

Documentation

Scope

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code Proxy for Nebius

Table of Contents

Repository Layout

Features

Quick Start

Prerequisites

Install

Configure

Reasoning models

Run

Use with Claude Code

Statusline indicator (optional)

MCP Support

Testing

Observability

Development

Documentation

Scope

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages