English | 中文
A lightweight local proxy that lets any AI CLI tool (Claude Code, Codex CLI, etc.) use third-party LLM APIs through a unified local endpoint.
One binary, one config, any AI CLI → any LLM API.
- Multi-protocol: Auto-detects client protocol (Responses API, Anthropic Messages, Chat Completions) and converts transparently
- Zero-intrusion: No changes to your CLI config files, just point
base_urlto the local proxy - Scene routing: Route Claude Code requests to different models based on request type (thinking, web search, background tasks)
- Model mapping: Map client model names to upstream model names at the route level
- Cross-provider routing: Route different scenes to different providers (e.g. thinking → DeepSeek, web search → Zhipu)
- Hot reload: Update config without restart (
POST /api/reloadorkill -HUP) - Admin UI: Built-in web dashboard for managing providers, routes, viewing usage statistics, and debugging requests
- Request tracing: Inspect every request/response pair with raw viewer, Diff view, and TTFB waterfall chart
- Usage statistics: Track token usage (input, output, cache) by provider and model with dashboard charts
- Multi-key fallback: Automatically switch to fallback API keys on 429/529 rate limiting or service overload
- Context compaction: Support compact endpoint for context window management, with LLM-based summarization for non-OpenAI upstreams
- Lightweight: Pure Go, single binary, no CGO
Linux / macOS:
curl -sL https://raw.githubusercontent.com/keepmind9/ai-switch/main/scripts/install.sh | bashWindows (PowerShell):
irm https://raw.githubusercontent.com/keepmind9/ai-switch/main/scripts/install.ps1 | iexThis downloads the latest release for your platform, installs to ~/.local/bin, and adds it to PATH.
git clone https://github.com/keepmind9/ai-switch.git
cd ai-switch
make build-all # build frontend + Go binary (includes Admin UI)If you don't need the Admin UI, use
make buildinstead (Go only, faster).
ais serveNo config file needed — it auto-creates ~/.ai-switch/config.yaml with defaults on first run.
Open http://localhost:12345 in your browser to add providers and routes.
Claude Code:
export ANTHROPIC_BASE_URL=http://localhost:12345
export ANTHROPIC_API_KEY=<route-key>Codex CLI:
[model_providers.proxy]
name = "ai-switch"
base_url = "http://localhost:12345/v1"
api_key = "ais-default"
wire_api = "responses"Any OpenAI-compatible tool:
export OPENAI_BASE_URL=http://localhost:12345/v1
export OPENAI_API_KEY=<route-key>That's it — your CLI tool will now route requests through ai-switch to your configured provider.
Claude Code ──→ ai-switch ──→ DeepSeek (chat)
Codex CLI ──→ ──→ Zhipu (anthropic)
Any tool ──→ ──→ Gemini (gemini)
Any tool ──→ ──→ MiniMax (chat)
ai-switch sits between your CLI tool and upstream LLM providers. It:
- Detects the client protocol automatically (Anthropic / Responses / Chat)
- Routes requests to the correct provider based on the API key (route key)
- Converts between protocols when needed (e.g. Anthropic → Chat Completions)
- Detects request scenes (thinking, web search, etc.) for smart routing
The route key (<route-key> in the example above) serves as both the API key for authentication and the routing identifier.
Define your upstream LLM vendor connections:
providers:
deepseek:
name: "DeepSeek"
base_url: "https://api.deepseek.com/v1"
api_key: "${DEEPSEEK_API_KEY}" # supports ${ENV_VAR} expansion
format: "chat" # chat (default) | responses | anthropic | gemini
think_tag: "think" # optional: strip reasoning tags from responses
fallback_keys: # optional: fallback API keys on 429 rate limiting
- "${DEEPSEEK_API_KEY_2}"
- "${DEEPSEEK_API_KEY_3}"
models: # optional: for validation warnings
- "deepseek-chat"
- "deepseek-reasoner"Use Google Gemini as upstream:
providers:
google:
name: "Google Gemini"
base_url: "https://generativelanguage.googleapis.com"
api_key: "${GOOGLE_API_KEY}"
format: "gemini"No path needed — ai-switch automatically builds /v1beta/models/{model}:generateContent.
Routes map API keys to providers and models:
routes:
"ais-default":
provider: "deepseek"
default_model: "deepseek-chat"Route Claude Code requests to different models based on what it's doing:
routes:
"ais-claude":
provider: "zhipu"
default_model: "glm-5.1"
long_context_threshold: 60000
scene_map:
default: "glm-5.1"
think: "glm-5.1"
websearch: "glm-4.7"
background: "glm-4.5-air"
longContext: "glm-5.1"| Scene | Key | Detection |
|---|---|---|
| Long Context | longContext |
Token count exceeds long_context_threshold |
| Background | background |
Model name contains "haiku" |
| Web Search | websearch |
Tools contain web_search_* type |
| Thinking | think |
thinking field present |
| Image | image |
User messages contain image blocks |
| Default | default |
Fallback |
Priority: longContext > background > websearch > think > image > default
Control which route is used when a request has no matching API key:
default_route: "ais-default" # global fallback
default_anthropic_route: "ais-zhipu" # /v1/messages (Claude Code)
default_responses_route: "ais-default" # /v1/responses (Codex CLI)
default_chat_route: "ais-default" # /v1/chat/completionsRouting priority: route key match > protocol-specific default > global default_route
All fields are optional. Protocol-specific defaults fall back to default_route when not set.
Control how many days of log files to keep (default: 30):
log_retention_days: 7Logs are stored in ~/.ai-switch/logs/.
When binding to a non-localhost address, restrict access to trusted IPs:
server:
host: "0.0.0.0"
port: 12345
allowed_ips:
- "192.168.1.0/24"
- "10.0.0.5"Supports CIDR notation and bare IP addresses. When host is 127.0.0.1 or localhost, the whitelist is ignored (even if configured).
Route upstream LLM API requests through an HTTP/SOCKS5 proxy:
server:
proxy_url: "socks5://127.0.0.1:1080"
providers:
openai:
enable_proxy: trueSet proxy_url globally, then enable per-provider with enable_proxy: true. Supported schemes: http, https, socks5.
Map client model names to upstream models:
routes:
"ais-default":
provider: "deepseek"
default_model: "deepseek-chat"
model_map:
"claude-sonnet-4-5": "deepseek-chat"
"gpt-4o": "deepseek-chat"Use provider|model to route to a different provider within the same route:
routes:
"ais-default":
provider: "minimax"
default_model: "MiniMax-M2.5"
scene_map:
default: "MiniMax-M2.5"
think: "deepseek|deepseek-chat"
websearch: "zhipu|glm-4.7"- ModelMap — exact model name match (case-insensitive)
- SceneMap — scene detection (Anthropic protocol only)
- DefaultModel — fallback
ais serve # Start in foreground
ais serve -d # Start as background daemon
ais serve -c config.yaml # Start with custom config
ais stop # Stop the background daemon
ais check -c config.yaml # Validate config without starting
ais version # Print version info
ais update # Check for updates and download latest version
ais update --apply # Apply the downloaded update
ais shortcut # Create desktop shortcuts to start/stop ais
ais agent <route-key> claude # Launch Claude Code via ais
ais agent <route-key> codex # Launch Codex CLI via aisRunning without a subcommand defaults to serve:
ais -c config.yaml # Same as: ais serve -c config.yamlLaunch AI agents with environment variables auto-configured from a route key:
# Launch Claude Code
ais agent my-route-key claude --continue
# Launch Codex CLI
ais agent my-route-key codex --model o4-miniThis auto-configures environment variables and overrides the agent's own config (via --settings for Claude, -c for Codex) to ensure requests route through ais using the route key. No manual configuration needed.
The route key serves as the API key. Agent args and exit codes are passed through.
$ ais check -c config.yaml
Checking config.yaml ...
Providers: 3
Routes: 3
Default: ais-default
✓ Config is valid.Exit codes: 0 = valid, 1 = has errors, 2 = warnings only.
Open http://localhost:12345 in your browser for a built-in dashboard to manage providers, routes, view usage statistics, and inspect request traces.
Every request is recorded with full request/response details. Click any trace to inspect:
- Raw viewer: See the exact request and response payloads
- Diff view: Side-by-side comparison of request and response
- TTFB waterfall: Visualize time-to-first-byte and upstream latency
The stats page shows token usage broken down by provider and model, including cache token metrics, with daily trend charts.
make build # fmt + vet + compile
make build-all # build frontend + Go binary
make dev # run in dev mode
make test # run tests
make clean # remove binary