Skip to content

raghavx03/proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

486 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Proxy

Use Claude Code CLI & VSCode for free. No Anthropic API key required.

License: MIT Python 3.14 uv Tested with Pytest Type checking: Ty Code style: Ruff Logging: Loguru

A lightweight proxy that routes Claude Code's Anthropic API calls to NVIDIA NIM (40 req/min free), OpenRouter (hundreds of models), LM Studio (fully local), or llama.cpp (local with Anthropic endpoints).

Quick Start · Providers · Discord Bot · Configuration · Development


Free Claude Code in action

Claude Code running via NVIDIA NIM, completely free

Features

Feature Description
Zero Cost 40 req/min free on NVIDIA NIM. Free models on OpenRouter. Fully local with LM Studio
Drop-in Replacement Set 2 env vars. No modifications to Claude Code CLI or VSCode extension needed
4 Providers NVIDIA NIM, OpenRouter (hundreds of models), LM Studio (local), llama.cpp (llama-server)
Per-Model Mapping Route Opus / Sonnet / Haiku to different models and providers. Mix providers freely
Thinking Token Support Parses <think> tags and reasoning_content into native Claude thinking blocks
Heuristic Tool Parser Models outputting tool calls as text are auto-parsed into structured tool use
Request Optimization 5 categories of trivial API calls intercepted locally, saving quota and latency
Smart Rate Limiting Proactive rolling-window throttle + reactive 429 exponential backoff + optional concurrency cap
Discord / Telegram Bot Remote autonomous coding with tree-based threading, session persistence, and live progress
Subagent Control Task tool interception forces run_in_background=False. No runaway subagents
Extensible Clean BaseProvider and MessagingPlatform ABCs. Add new providers or platforms easily

Quick Start

Prerequisites

  1. Get an API key (or use LM Studio / llama.cpp locally):
  2. Install Claude Code
  3. Install uv (or uv self update if already installed)

Clone & Configure

git clone https://github.com/raghavx03/proxy.git
cd proxy
cp .env.example .env

Choose your provider and edit .env:

NVIDIA NIM (40 req/min free, recommended)
NVIDIA_NIM_API_KEY="nvapi-your-key-here"

MODEL_OPUS="nvidia_nim/z-ai/glm4.7"
MODEL_SONNET="nvidia_nim/moonshotai/kimi-k2-thinking"
MODEL_HAIKU="nvidia_nim/stepfun-ai/step-3.5-flash"
MODEL="nvidia_nim/z-ai/glm4.7"                     # fallback
OpenRouter (hundreds of models)
OPENROUTER_API_KEY="sk-or-your-key-here"

MODEL_OPUS="open_router/deepseek/deepseek-r1-0528:free"
MODEL_SONNET="open_router/openai/gpt-oss-120b:free"
MODEL_HAIKU="open_router/stepfun/step-3.5-flash:free"
MODEL="open_router/stepfun/step-3.5-flash:free"     # fallback
LM Studio (fully local, no API key)
MODEL_OPUS="lmstudio/unsloth/MiniMax-M2.5-GGUF"
MODEL_SONNET="lmstudio/unsloth/Qwen3.5-35B-A3B-GGUF"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="lmstudio/unsloth/GLM-4.7-Flash-GGUF"         # fallback
llama.cpp (fully local, no API key)
LLAMACPP_BASE_URL="http://localhost:8080/v1"

MODEL_OPUS="llamacpp/local-model"
MODEL_SONNET="llamacpp/local-model"
MODEL_HAIKU="llamacpp/local-model"
MODEL="llamacpp/local-model"
Mix providers

Each MODEL_* variable can use a different provider. MODEL is the fallback for unrecognized Claude models.

NVIDIA_NIM_API_KEY="nvapi-your-key-here"
OPENROUTER_API_KEY="sk-or-your-key-here"

MODEL_OPUS="nvidia_nim/moonshotai/kimi-k2.5"
MODEL_SONNET="open_router/deepseek/deepseek-r1-0528:free"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="nvidia_nim/z-ai/glm4.7"                      # fallback

Run It

Terminal 1: Start the proxy server:

uv run uvicorn server:app --host 0.0.0.0 --port 8082

Terminal 2: Run Claude Code:

ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude

That's it! Claude Code now uses your configured provider for free.

VSCode Extension Setup
  1. Start the proxy server (same as above).
  2. Open Settings (Ctrl + ,) and search for claude-code.environmentVariables.
  3. Click Edit in settings.json and add:
"claudeCode.environmentVariables": [
  { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
  { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]
  1. Reload extensions.
  2. If you see the login screen: Click Anthropic Console, then authorize. The extension will start working. You may be redirected to buy credits in the browser; ignore it — the extension already works.

To switch back to Anthropic models, comment out the added block and reload extensions.

Multi-Model Support (Model Picker)

claude-pick is an interactive model selector that lets you choose any model from your active provider each time you launch Claude, without editing MODEL in .env.

Screen.Recording.2026-02-18.at.5.48.41.PM.mov

1. Install fzf:

brew install fzf        # macOS/Linux

2. Add the alias to ~/.zshrc or ~/.bashrc:

alias claude-pick="/absolute/path/to/proxy/claude-pick"

Then reload your shell (source ~/.zshrc or source ~/.bashrc) and run claude-pick.

Or use a fixed model alias (no picker needed):

alias claude-kimi='ANTHROPIC_BASE_URL="http://localhost:8082" ANTHROPIC_AUTH_TOKEN="freecc:moonshotai/kimi-k2.5" claude'

Install as a Package (no clone needed)

uv tool install git+https://github.com/raghavx03/proxy.git
fcc-init        # creates ~/.config/proxy/.env from the built-in template

Edit ~/.config/proxy/.env with your API keys and model names, then:

free-claude-code    # starts the server

To update: uv tool upgrade proxy


How It Works

┌─────────────────┐        ┌──────────────────────┐        ┌──────────────────┐
│  Claude Code    │───────>│  Proxy               │───────>│  LLM Provider    │
│  CLI / VSCode   │<───────│  Server (:8082)      │<───────│  NIM / OR / LMS  │
└─────────────────┘        └──────────────────────┘        └──────────────────┘
   Anthropic API                                             OpenAI-compatible
   format (SSE)                                             format (SSE)
  • Transparent proxy: Claude Code sends standard Anthropic API requests; the proxy forwards them to your configured provider
  • Per-model routing: Opus / Sonnet / Haiku requests resolve to their model-specific backend, with MODEL as fallback
  • Request optimization: 5 categories of trivial requests (quota probes, title generation, prefix detection, suggestions, filepath extraction) are intercepted and responded to locally without using API quota
  • Format translation: Requests are translated from Anthropic format to the provider's OpenAI-compatible format and streamed back
  • Thinking tokens: <think> tags and reasoning_content fields are converted into native Claude thinking blocks

Providers

Provider Cost Rate Limit Best For
NVIDIA NIM Free 40 req/min Daily driver, generous free tier
OpenRouter Free / Paid Varies Model variety, fallback options
LM Studio Free (local) Unlimited Privacy, offline use, no rate limits
llama.cpp Free (local) Unlimited Lightweight local inference engine

Models use a prefix format: provider_prefix/model/name. An invalid prefix causes an error.

Provider MODEL prefix API Key Variable Default Base URL
NVIDIA NIM nvidia_nim/... NVIDIA_NIM_API_KEY integrate.api.nvidia.com/v1
OpenRouter open_router/... OPENROUTER_API_KEY openrouter.ai/api/v1
LM Studio lmstudio/... (none) localhost:1234/v1
llama.cpp llamacpp/... (none) localhost:8080/v1
NVIDIA NIM models

Popular models (full list in nvidia_nim_models.json):

  • nvidia_nim/minimaxai/minimax-m2.5
  • nvidia_nim/qwen/qwen3.5-397b-a17b
  • nvidia_nim/z-ai/glm5
  • nvidia_nim/moonshotai/kimi-k2.5
  • nvidia_nim/stepfun-ai/step-3.5-flash

Browse: build.nvidia.com · Update list: curl "https://integrate.api.nvidia.com/v1/models" > nvidia_nim_models.json

OpenRouter models

Popular free models:

  • open_router/arcee-ai/trinity-large-preview:free
  • open_router/stepfun/step-3.5-flash:free
  • open_router/deepseek/deepseek-r1-0528:free
  • open_router/openai/gpt-oss-120b:free

Browse: openrouter.ai/models · Free models

LM Studio models

Run models locally with LM Studio. Load a model in the Chat or Developer tab, then set MODEL to its identifier.

Examples with native tool-use support:

  • LiquidAI/LFM2-24B-A2B-GGUF
  • unsloth/MiniMax-M2.5-GGUF
  • unsloth/GLM-4.7-Flash-GGUF
  • unsloth/Qwen3.5-35B-A3B-GGUF

Browse: model.lmstudio.ai

llama.cpp models

Run models locally using llama-server. Ensure you have a tool-capable GGUF. Set MODEL to whatever arbitrary name you'd like (e.g. llamacpp/my-model), as llama-server ignores the model name when run via /v1/messages.

See the Unsloth docs for detailed instructions and capable models: https://unsloth.ai/docs/models/qwen3.5#qwen3.5-small-0.8b-2b-4b-9b


Discord Bot

Control Claude Code remotely from Discord (or Telegram). Send tasks, watch live progress, and manage multiple concurrent sessions.

Capabilities:

  • Tree-based message threading: reply to a message to fork the conversation
  • Session persistence across server restarts
  • Live streaming of thinking tokens, tool calls, and results
  • Unlimited concurrent Claude CLI sessions (concurrency controlled by PROVIDER_MAX_CONCURRENCY)
  • Voice notes: send voice messages; they are transcribed and processed as regular prompts
  • Commands: /stop (cancel a task; reply to a message to stop only that task), /clear (reset all sessions, or reply to clear a branch), /stats

Setup

  1. Create a Discord Bot: Go to Discord Developer Portal, create an application, add a bot, and copy the token. Enable Message Content Intent under Bot settings.

  2. Edit .env:

MESSAGING_PLATFORM="discord"
DISCORD_BOT_TOKEN="your_discord_bot_token"
ALLOWED_DISCORD_CHANNELS="123456789,987654321"

Enable Developer Mode in Discord (Settings → Advanced), then right-click a channel and "Copy ID". Comma-separate multiple channels. If empty, no channels are allowed.

  1. Configure the workspace (where Claude will operate):
CLAUDE_WORKSPACE="./agent_workspace"
ALLOWED_DIR="C:/Users/yourname/projects"
  1. Start the server:
uv run uvicorn server:app --host 0.0.0.0 --port 8082
  1. Invite the bot via OAuth2 URL Generator (scopes: bot, permissions: Read Messages, Send Messages, Manage Messages, Read Message History).

Telegram

Set MESSAGING_PLATFORM=telegram and configure:

TELEGRAM_BOT_TOKEN="123456789:ABCdefGHIjklMNOpqrSTUvwxYZ"
ALLOWED_TELEGRAM_USER_ID="your_telegram_user_id"

Get a token from @BotFather; find your user ID via @userinfobot.

Voice Notes

Send voice messages on Discord or Telegram; they are transcribed and processed as regular prompts.

Backend Description API Key
Local Whisper (default) Hugging Face Whisper — free, offline, CUDA compatible not required
NVIDIA NIM Whisper/Parakeet models via gRPC NVIDIA_NIM_API_KEY

Install the voice extras:

# If you cloned the repo:
uv sync --extra voice_local          # Local Whisper
uv sync --extra voice                # NVIDIA NIM
uv sync --extra voice --extra voice_local  # Both

# If you installed as a package (no clone):
uv tool install "proxy[voice_local] @ git+https://github.com/raghavx03/proxy.git"
uv tool install "proxy[voice] @ git+https://github.com/raghavx03/proxy.git"
uv tool install "proxy[voice,voice_local] @ git+https://github.com/raghavx03/proxy.git"

Configure via WHISPER_DEVICE (cpu | cuda | nvidia_nim) and WHISPER_MODEL. See the Configuration table for all voice variables and supported model values.


Configuration

Core

Variable Description Default
MODEL Fallback model (provider/model/name format; invalid prefix → error) nvidia_nim/stepfun-ai/step-3.5-flash
MODEL_OPUS Model for Claude Opus requests (falls back to MODEL) nvidia_nim/z-ai/glm4.7
MODEL_SONNET Model for Claude Sonnet requests (falls back to MODEL) open_router/arcee-ai/trinity-large-preview:free
MODEL_HAIKU Model for Claude Haiku requests (falls back to MODEL) open_router/stepfun/step-3.5-flash:free
NVIDIA_NIM_API_KEY NVIDIA API key required for NIM
OPENROUTER_API_KEY OpenRouter API key required for OpenRouter
LM_STUDIO_BASE_URL LM Studio server URL http://localhost:1234/v1
LLAMACPP_BASE_URL llama.cpp server URL http://localhost:8080/v1

Rate Limiting & Timeouts

Variable Description Default
PROVIDER_RATE_LIMIT LLM API requests per window 40
PROVIDER_RATE_WINDOW Rate limit window (seconds) 60
PROVIDER_MAX_CONCURRENCY Max simultaneous open provider streams 5
HTTP_READ_TIMEOUT Read timeout for provider requests (s) 120
HTTP_WRITE_TIMEOUT Write timeout for provider requests (s) 10
HTTP_CONNECT_TIMEOUT Connect timeout for provider requests (s) 2

Messaging & Voice

Variable Description Default
MESSAGING_PLATFORM discord or telegram discord
DISCORD_BOT_TOKEN Discord bot token ""
ALLOWED_DISCORD_CHANNELS Comma-separated channel IDs (empty = none allowed) ""
TELEGRAM_BOT_TOKEN Telegram bot token ""
ALLOWED_TELEGRAM_USER_ID Allowed Telegram user ID ""
CLAUDE_WORKSPACE Directory where the agent operates ./agent_workspace
ALLOWED_DIR Allowed directories for the agent ""
MESSAGING_RATE_LIMIT Messaging messages per window 1
MESSAGING_RATE_WINDOW Messaging window (seconds) 1
VOICE_NOTE_ENABLED Enable voice note handling true
WHISPER_DEVICE cpu | cuda | nvidia_nim cpu
WHISPER_MODEL Whisper model (local: tiny/base/small/medium/large-v2/large-v3/large-v3-turbo; NIM: openai/whisper-large-v3, nvidia/parakeet-ctc-1.1b-asr, etc.) base
HF_TOKEN Hugging Face token for faster downloads (local Whisper, optional)
Advanced: Request optimization flags

These are enabled by default and intercept trivial Claude Code requests locally to save API quota.

Variable Description Default
FAST_PREFIX_DETECTION Enable fast prefix detection true
ENABLE_NETWORK_PROBE_MOCK Mock network probe requests true
ENABLE_TITLE_GENERATION_SKIP Skip title generation requests true
ENABLE_SUGGESTION_MODE_SKIP Skip suggestion mode requests true
ENABLE_FILEPATH_EXTRACTION_MOCK Mock filepath extraction true

See .env.example for all supported parameters.


Development

Project Structure

proxy/
├── server.py              # Entry point
├── api/                   # FastAPI routes, request detection, optimization handlers
├── providers/             # BaseProvider, OpenAICompatibleProvider, NIM, OpenRouter, LM Studio, llamacpp
│   └── common/            # Shared utils (SSE builder, message converter, parsers, error mapping)
├── messaging/             # MessagingPlatform ABC + Discord/Telegram bots, session management
├── config/                # Settings, NIM config, logging
├── cli/                   # CLI session and process management
└── tests/                 # Pytest test suite

Commands

uv run ruff format     # Format code
uv run ruff check      # Lint
uv run ty check        # Type checking
uv run pytest          # Run tests

Extending

Adding an OpenAI-compatible provider (Groq, Together AI, etc.) — extend OpenAICompatibleProvider:

from providers.openai_compat import OpenAICompatibleProvider
from providers.base import ProviderConfig

class MyProvider(OpenAICompatibleProvider):
    def __init__(self, config: ProviderConfig):
        super().__init__(config, provider_name="MYPROVIDER",
                         base_url="https://api.example.com/v1", api_key=config.api_key)

Adding a fully custom provider — extend BaseProvider directly and implement stream_response().

Adding a messaging platform — extend MessagingPlatform in messaging/ and implement start(), stop(), send_message(), edit_message(), and on_message().



License

MIT License. See LICENSE for details.

Built with FastAPI, OpenAI Python SDK, discord.py, and python-telegram-bot.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors