A premium, developer-first local gateway that routes traffic from your AI coding tools to 40+ providers with format translation, smart fallback, quota tracking, and 20–40% token savings.
Connect Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, OpenClaw and any OpenAI/Anthropic-compatible client to 100+ models through one endpoint.
Stop wasting money, tokens and hitting limits:
- ❌ Subscription quota expires unused every month
- ❌ Rate limits stop you mid-coding
- ❌ Tool outputs (git diff, grep, ls...) burn tokens fast
- ❌ Expensive APIs ($20-50/month per provider)
- ❌ Manual switching between providers
ExtremeRouter solves this:
- ✅ RTK Token Saver - Auto-compress tool_result content, save 20-40% tokens per request
- ✅ Maximize subscriptions - Track quota, use every bit before reset
- ✅ Auto fallback - Subscription → Cheap → Free, zero downtime
- ✅ Multi-account - Round-robin between accounts per provider
- ✅ Universal - Works with Claude Code, Codex, Cursor, Cline, any CLI tool
┌─────────────┐
│ Your CLI │ (Claude Code, Codex, OpenClaw, Cursor, Cline...)
│ Tool │
└──────┬──────┘
│ http://localhost:20128/v1
↓
┌─────────────────────────────────────────────┐
│ ExtremeRouter (Smart Router) │
│ • RTK Token Saver (cut tool_result tokens) │
│ • Format translation (OpenAI ↔ Claude) │
│ • Quota tracking │
│ • Auto token refresh │
└──────┬──────────────────────────────────────┘
│
├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, GitHub Copilot
│ ↓ quota exhausted
├─→ [Tier 2: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
│ ↓ budget limit
└─→ [Tier 3: FREE] Kiro, OpenCode Free, Vertex ($300 credits)
Result: Never stop coding, minimal cost + 20-40% token savings via RTK
1. Install globally:
npm install -g @rsalmn/extremerouter
extremerouter🎉 Dashboard opens at http://localhost:20128
2. Connect a FREE provider (no signup needed):
Dashboard → Providers → Connect Kiro AI (free Claude unlimited) or OpenCode Free (no auth) → Done!
3. Use in your CLI tool:
Claude Code/Codex/OpenClaw/Cursor/Cline Settings:
Endpoint: http://localhost:20128/v1
API Key: [copy from dashboard]
Model: kr/claude-sonnet-4.5
That's it! Start coding with FREE AI models.
Alternative: run from source (this repository):
This repository package is private (@rsalmn/extremerouter-app), so source/Docker execution is the expected local development path.
cp .env.example .env
npm install
PORT=20128 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run devProduction mode:
npm run build
PORT=20128 HOSTNAME=0.0.0.0 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run startDefault URLs:
- Dashboard:
http://localhost:20128/dashboard - OpenAI-compatible API:
http://localhost:20128/v1
ExtremeRouter works seamlessly with all major AI coding tools:
![]() Kiro AI Claude 4.5 + GLM-5 + MiniMax Unlimited FREE |
![]() OpenCode Free No auth • Auto-fetch models Unlimited FREE |
![]() Vertex AI Gemini 3 Pro + GLM-5 + DeepSeek $300 credits free |
Note: iFlow, Qwen and Gemini CLI free tiers were discontinued in 2026. Use Kiro / OpenCode Free / Vertex instead.
![]() OpenRouter |
![]() GLM |
![]() Kimi |
![]() MiniMax |
![]() OpenAI |
![]() Anthropic |
![]() Gemini |
![]() DeepSeek |
![]() Groq |
![]() xAI |
![]() Mistral |
![]() Perplexity |
![]() Together AI |
![]() Fireworks |
![]() Cerebras |
![]() Cohere |
![]() NVIDIA |
SiliconFlow |
...and 20+ more providers including Nebius, Chutes, Hyperbolic, and custom OpenAI/Anthropic compatible endpoints
| Feature | What It Does | Why It Matters |
|---|---|---|
| 🚀 RTK Token Saver (RTK ⭐40K) | Compress tool outputs (git diff, grep, ls, tree...) before sending to LLM |
Save 20-40% input tokens per request |
| 🧠 Headroom Token Saver (Headroom) | Optional external /v1/compress proxy before provider routing |
Save more context tokens without changing clients |
| 🪨 Caveman Mode (Caveman ⭐52K) | Inject caveman-speak prompt → LLM replies terse, technical substance preserved | Save up to 65% output tokens |
| 🐴 Ponytail (Ponytail) | Inject "lazy senior dev" prompt → LLM writes minimal, YAGNI-first code (Lite/Full/Ultra) | Fewer output tokens, less refactoring |
| 🎯 Smart 3-Tier Fallback | Auto-route: Subscription → Cheap → Free | Never stop coding, zero downtime |
| 📊 Real-Time Quota Tracking | Live token count + reset countdown | Maximize subscription value |
| 🔄 Format Translation | OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro ↔ Vertex | Works with any CLI tool |
| 👥 Multi-Account Support | Multiple accounts per provider | Load balancing + redundancy |
| 🔄 Auto Token Refresh | OAuth tokens refresh automatically | No manual re-login needed |
| 🎨 Custom Combos | Create unlimited model combinations | Tailor fallback to your needs |
| 🐝 Hierarchical Swarm Engine | Multi-agent orchestration: Manager → Staff → Workers → Audit → Synthesis with persona-bleed protection | Tackle complex tasks via parallel agents with a smart gatekeeper |
| 🩺 Health Monitor | Live per-provider health (in-memory sliding window) + SSE dashboard | Spot degraded providers before they break your flow |
| 🔌 Circuit Breaker | Per-provider CLOSED/OPEN/HALF_OPEN state machine that auto-skips failing upstreams | Stop wasting requests on a down provider; auto-recover when it heals |
| 🔐 Per-Key Model ACL | Restrict which models each API key may call (allowedModels) |
Hand out scoped keys to teammates/clients without opening everything |
| 🍪 Cookies Providers | 20 web-chat providers via browser cookies (ChatGLM, DeepSeek, Qwen, Kimi, Blackbox, T3, DuckDuckGo, Venice, DouBao, v0, Poe, Copilot, Meta AI, Adapta, VeoAI, Claude/ChatGPT/Gemini web, Grok, Perplexity) | Use free web tiers with one pasted cookie — no API key needed |
| 🤖 Devin CLI Provider | Session-based adapter for the Devin (Cognition) API | Route to Devin agent modes (normal/fast/lite/ultra) from any OpenAI client |
| 🧪 Model Test All | Search box + "Test All" button on every provider's Available Models | Validate reachability of all models in one click, scoped to search results |
| 📝 Request Logging | Debug mode with full request/response logs | Troubleshoot issues easily |
| 💾 Cloud Sync | Sync config across devices | Same setup everywhere |
| 📊 Usage Analytics | Track tokens, cost, trends over time | Optimize spending |
| 🌐 Deploy Anywhere | Localhost, VPS, Docker, Cloudflare Workers | Flexible deployment options |
📖 Feature Details
Tool outputs (git diff, grep, find, ls, tree, log dumps...) often eat 30-50% of your prompt budget. RTK detects them and applies smart, lossless compression before the request hits the LLM:
- Filters:
git-diff,git-status,grep,find,ls,tree,dedup-log,smart-truncate,read-numbered,search-list - Auto-detect: No config needed — RTK peeks the first 1KB of each
tool_resultand picks the right filter. - Safe by design: If a filter fails, throws, or makes output bigger, RTK silently keeps the original text. Errors never break your request.
- Universal: Works across all formats (OpenAI, Claude, Gemini, Cursor, Kiro, OpenAI Responses) because it runs before any format translation.
- Default ON: Toggle anytime in Dashboard → Endpoint settings.
Without RTK: 47K tokens sent to LLM
With RTK: 28K tokens sent to LLM (40% saved · same context · same answer)
Headroom is optional and runs separately. ExtremeRouter calls Headroom's local /v1/compress endpoint, then keeps normal routing, fallback, auth, and usage tracking:
Client → ExtremeRouter → Headroom /v1/compress → ExtremeRouter → provider
Local setup:
pip install "headroom-ai[proxy]"
headroom proxy --port 8787Enable in Dashboard → Endpoint → Token Saver → Headroom. Default URL: http://localhost:8787.
Docker examples:
# Headroom service in same Docker network
http://headroom:8787
# Headroom running on host machine
http://host.docker.internal:8787If Headroom is down or returns an error, ExtremeRouter fails open and sends the original request.
Ponytail injects a "lazy senior dev" system prompt into every request, biasing the LLM toward minimal, YAGNI-first code — deletion over addition, stdlib over new deps, one-liners over abstractions. Adapted from DietrichGebert/ponytail.
- Lite — Build what's asked, name the lazier alternative.
- Full — YAGNI ladder enforced: stdlib → native → existing deps → one-liner → minimal code.
- Ultra — YAGNI extremist: deletion first, ship the one-liner, challenge the rest of the requirement in the same response.
Without Ponytail: verbose code, extra abstractions, "just in case" scaffolding
With Ponytail: shortest working diff, no unrequested abstractions, fewer tokens
Never trades away: input validation, error handling that prevents data loss, security, accessibility, or anything explicitly requested. Enable in Dashboard → Endpoint → Ponytail. Stacks with Caveman (output terseness) and RTK (input compression).
Create combos with automatic fallback:
Combo: "my-coding-stack"
1. cc/claude-opus-4-6 (your subscription)
2. glm/glm-4.7 (cheap backup, $0.6/1M)
3. if/kimi-k2-thinking (free fallback)
→ Auto switches when quota runs out or errors occur
- Token consumption per provider
- Reset countdown (5-hour, daily, weekly)
- Cost estimation for paid tiers
- Monthly spending reports
Seamless translation between formats:
- OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro ↔ Vertex ↔ Antigravity ↔ Ollama ↔ OpenAI Responses
- Your CLI tool sends OpenAI format → ExtremeRouter translates → Provider receives native format
- Works with any tool that supports custom OpenAI endpoints
- Add multiple accounts per provider
- Auto round-robin or priority-based routing
- Fallback to next account when one hits quota
- OAuth tokens automatically refresh before expiration
- No manual re-authentication needed
- Seamless experience across all providers
- Create unlimited model combinations
- Mix subscription, cheap, and free tiers
- Name your combos for easy access
- Share combos across devices with Cloud Sync
A multi-agent orchestration combo strategy for complex, multi-step tasks. Instead of one model answering, a small team of agents collaborates:
- Manager plans the strategy and splits the task
- Staff workers run subtasks in parallel (load-balanced across providers)
- Audit stage reviews worker output
- Manager synthesizes the final answer
- A Smart Gatekeeper triages the prompt first (simple → single-model fast path; complex → swarm)
- Persona-bleed protection keeps each stage's role isolated
Live telemetry streams to the Dashboard → Swarm page (SSE). Configure in Dashboard → Combos → choose the "Hierarchical Swarm" strategy.
Two reliability layers that keep requests flowing even when a provider degrades:
- Health Monitor records a sliding window of success/failure samples per provider (in-memory ring buffer) and exposes a live SSE feed + the Dashboard → Health page. Spot a provider going red before it ruins your coding session.
- Circuit Breaker wraps every provider in a CLOSED / OPEN / HALF_OPEN state machine. When a provider fails repeatedly, the breaker OPENS and routing auto-skips it (no wasted requests, no timeouts) — then probes with HALF_OPEN to auto-recover the moment it heals.
Both are zero-config; tunable in Settings.
Each API key you mint can carry an allowedModels allow-list. Requests using that key are rejected up-front (403) if they target a model outside the list — so you can hand out scoped keys to teammates, clients, or downstream tools without exposing your full catalog. Managed per-key in Dashboard → Endpoint.
A dedicated Cookies Provider category (above OAuth on the Providers page) for free web-chat services that authenticate via a browser cookie instead of an API key. Paste your cookies (or just the session token) into a multi-line field and ExtremeRouter handles token refresh, WAF cookies, PoW challenges, and SSE translation per site:
- No anti-bot (reliable): ChatGLM, DeepSeek, Qwen, Kimi, Blackbox, T3 Chat, DuckDuckGo, Venice, DouBao, v0, Poe, Copilot, Meta AI (Muse), Adapta, VeoAI
- Anti-bot (best-effort, may be Cloudflare-blocked): Claude Web, ChatGPT Web, Gemini Web
- Plus existing cookie providers: Grok Web, Perplexity Web
Note: Claude/ChatGPT/Gemini web deploy aggressive Cloudflare anti-bot. Plain server-side fetch is often blocked (403) even with valid cookies — these are included best-effort.
Devin (by Cognition) is session-based, not OpenAI-compatible. The Devin adapter bridges it: it creates a Devin session, polls for completion, and synthesizes an OpenAI SSE stream back. Models map to Devin agent modes — devin-normal, devin-fast, devin-lite, devin-ultra. Auth is API-key only (cog_...).
- Enable debug mode for full request/response logs
- Track API calls, headers, and payloads
- Troubleshoot integration issues
- Export logs for analysis
- Sync providers, combos, and settings across devices
- Automatic background sync
- Secure encrypted storage
- Access your setup from anywhere
- Prefer server-side cloud variables in production:
BASE_URL(internal callback URL used by sync scheduler)CLOUD_URL(cloud sync endpoint base)
NEXT_PUBLIC_BASE_URLandNEXT_PUBLIC_CLOUD_URLare still supported for compatibility/UI, but server runtime now prioritizesBASE_URL/CLOUD_URL.- Cloud sync requests now use timeout + fail-fast behavior to avoid UI hanging when cloud DNS/network is unavailable.
- Track token usage per provider and model
- Cost estimation and spending trends
- Monthly reports and insights
- Optimize your AI spending
💡 IMPORTANT - Understanding Dashboard Costs:
The "cost" displayed in Usage Analytics is for tracking and comparison purposes only. ExtremeRouter itself never charges you anything. You only pay providers directly (if using paid services).
Example: If your dashboard shows "$290 total cost" while using iFlow models, this represents what you would have paid using paid APIs directly. Your actual cost = $0 (iFlow is free unlimited).
Think of it as a "savings tracker" showing how much you're saving by using free models or routing through ExtremeRouter!
- 💻 Localhost - Default, works offline
- ☁️ VPS/Cloud - Share across devices
- 🐳 Docker - One-command deployment
- 🚀 Cloudflare Workers - Global edge network
| Tier | Provider | Cost | Quota Reset | Best For |
|---|---|---|---|---|
| 🚀 TOKEN SAVER | RTK (built-in) | FREE | Always on | Save 20-40% tokens on EVERY request |
| 💳 SUBSCRIPTION | Claude Code (Pro/Max) | $20-200/mo | 5h + weekly | Already subscribed |
| Codex (Plus/Pro) | $20-200/mo | 5h + weekly | OpenAI users | |
| GitHub Copilot | $10-19/mo | Monthly | GitHub users | |
| Cursor IDE | $20/mo | Monthly | Cursor users | |
| 💰 CHEAP | GLM-5.1 / GLM-4.7 | $0.6/1M | Daily 10AM | Budget backup |
| MiniMax M2.7 | $0.2/1M | 5-hour rolling | Cheapest option | |
| Kimi K2.5 | $9/mo flat | 10M tokens/mo | Predictable cost | |
| 🆓 FREE | Kiro AI | $0 | Unlimited | Claude 4.5 + GLM-5 + MiniMax free |
| OpenCode Free | $0 | Unlimited | No auth, auto-fetch models | |
| Vertex AI | $300 credits | New GCP accounts | Gemini 3 Pro + DeepSeek + GLM-5 |
💡 Pro Tip: RTK + Kiro AI + OpenCode Free combo = $0 cost + 20-40% token savings!
ExtremeRouter Billing Reality:
✅ ExtremeRouter software = FREE forever (open source, never charges)
✅ Dashboard "costs" = Display/tracking only (not actual bills)
✅ You pay providers directly (subscriptions or API fees)
✅ FREE providers stay FREE (iFlow, Kiro, Qwen = $0 unlimited)
❌ ExtremeRouter never sends invoices or charges your card
How Cost Display Works:
The dashboard shows estimated costs as if you were using paid APIs directly. This is not billing - it's a comparison tool to show your savings.
Example Scenario:
Dashboard Display:
• Total Requests: 1,662
• Total Tokens: 47M
• Display Cost: $290
Reality Check:
• Provider: iFlow (FREE unlimited)
• Actual Payment: $0.00
• What $290 Means: Amount you SAVED by using free models!
Payment Rules:
- Subscription providers (Claude Code, Codex): Pay them directly via their websites
- Cheap providers (GLM, MiniMax): Pay them directly, ExtremeRouter just routes
- FREE providers (iFlow, Kiro, Qwen): Genuinely free forever, no hidden charges
- ExtremeRouter: Never charges anything, ever
Problem: Quota expires unused, rate limits during heavy coding
Solution:
Combo: "maximize-claude"
1. cc/claude-opus-4-7 (use subscription fully)
2. glm/glm-5.1 (cheap backup when quota out)
3. kr/claude-sonnet-4.5 (free emergency fallback)
Monthly cost: $20 (subscription) + ~$5 (backup) = $25 total
vs. $20 + hitting limits = frustration
Problem: Can't afford subscriptions, need reliable AI coding
Solution:
Combo: "free-forever"
1. kr/claude-sonnet-4.5 (Claude 4.5 free unlimited)
2. kr/glm-5 (GLM-5 free via Kiro)
3. oc/<auto> (OpenCode Free, no auth)
Monthly cost: $0
Quality: Production-ready models + RTK saves 20-40% tokens
Problem: Deadlines, can't afford downtime
Solution:
Combo: "always-on"
1. cc/claude-opus-4-7 (best quality)
2. cx/gpt-5.5 (second subscription)
3. glm/glm-5.1 (cheap, resets daily)
4. minimax/MiniMax-M2.7 (cheapest, 5h reset)
5. kr/claude-sonnet-4.5 (free unlimited)
Result: 5 layers of fallback = zero downtime
Monthly cost: $20-200 (subscriptions) + $10-20 (backup)
Problem: Need AI assistant in messaging apps (WhatsApp, Telegram, Slack...), completely free
Solution:
Combo: "openclaw-free"
1. kr/claude-sonnet-4.5 (Claude 4.5 free)
2. kr/glm-5 (GLM-5 free)
3. kr/MiniMax-M2.5 (MiniMax free)
Monthly cost: $0
Access via: WhatsApp, Telegram, Slack, Discord, iMessage, Signal...
📊 Why does my dashboard show high costs?
The dashboard tracks your token usage and displays estimated costs as if you were using paid APIs directly. This is not actual billing - it's a reference to show how much you're saving by using free models or existing subscriptions through ExtremeRouter.
Example:
- Dashboard shows: "$290 total cost"
- Reality: You're using iFlow (FREE unlimited)
- Your actual cost: $0.00
- What $290 means: Amount you saved by using free models instead of paid APIs!
The cost display is a "savings tracker" to help you understand your usage patterns and optimization opportunities.
💳 Will I be charged by ExtremeRouter?
No. ExtremeRouter is free, open-source software that runs on your own computer. It never charges you anything.
You only pay:
- ✅ Subscription providers (Claude Code $20/mo, Codex $20-200/mo) → Pay them directly on their websites
- ✅ Cheap providers (GLM, MiniMax) → Pay them directly, ExtremeRouter just routes your requests
- ❌ ExtremeRouter itself → Never charges anything, ever
ExtremeRouter is a local proxy/router. It doesn't have your credit card, can't send invoices, and has no billing system. It's completely free software.
🆓 Are FREE providers really unlimited?
Yes! The current FREE providers (Kiro, OpenCode Free, Vertex) are genuinely free with no hidden charges.
These are free services offered by those respective companies:
- Kiro AI: Free unlimited Claude 4.5 + GLM-5 + MiniMax via AWS Builder ID / Google / GitHub OAuth
- OpenCode Free: No-auth passthrough proxy, models auto-fetched from
opencode.ai/zen/v1/models - Vertex AI: $300 free credits for new Google Cloud accounts (90 days)
ExtremeRouter just routes your requests to them - there's no "catch" or future billing. They're truly free services, and ExtremeRouter makes them easy to use with fallback support.
Discontinued free tiers (no longer recommended):
- ❌ iFlow: Was free unlimited, now changed to paid (2026)
- ❌ Qwen Code: Free OAuth tier discontinued by Alibaba on 2026-04-15
- ❌ Gemini CLI: Still works, but using it with non-CLI tools (Claude, Codex, Cursor...) may result in account bans — only use if you stick to Gemini CLI itself
💰 How do I minimize my actual AI costs?
Free-First Strategy:
-
Start with 100% free combo:
1. gc/gemini-3-flash (180K/month free from Google) 2. if/kimi-k2-thinking (unlimited free from iFlow) 3. qw/qwen3-coder-plus (unlimited free from Qwen)Cost: $0/month
-
Add cheap backup only if you need it:
4. glm/glm-4.7 ($0.6/1M tokens)Additional cost: Only pay for what you actually use
-
Use subscription providers last:
- Only if you already have them
- ExtremeRouter helps maximize their value through quota tracking
Result: Most users can operate at $0/month using only free tiers!
📈 What if my usage suddenly spikes?
ExtremeRouter's smart fallback prevents surprise charges:
Scenario: You're on a coding sprint and blow through your quotas
Without ExtremeRouter:
- ❌ Hit rate limit → Work stops → Frustration
- ❌ Or: Accidentally rack up huge API bills
With ExtremeRouter:
- ✅ Subscription hits limit → Auto-fallback to cheap tier
- ✅ Cheap tier gets expensive → Auto-fallback to free tier
- ✅ Never stop coding → Predictable costs
You're in control: Set spending limits per provider in dashboard, and ExtremeRouter respects them.
🔐 Subscription Providers (Maximize Value)
Dashboard → Providers → Connect Claude Code
→ OAuth login → Auto token refresh
→ 5-hour + weekly quota tracking
Models:
cc/claude-opus-4-7
cc/claude-opus-4-6
cc/claude-sonnet-4-6
cc/claude-haiku-4-5-20251001Pro Tip: Use Opus for complex tasks, Sonnet for speed. ExtremeRouter tracks quota per model!
Dashboard → Providers → Connect Codex
→ OAuth login (port 1455)
→ 5-hour + weekly reset
Models:
cx/gpt-5.5
cx/gpt-5.4
cx/gpt-5.3-codex
cx/gpt-5.2-codexDashboard → Providers → Connect GitHub
→ OAuth via GitHub
→ Monthly reset (1st of month)
Models:
gh/gpt-5.4
gh/claude-opus-4.7
gh/claude-sonnet-4.6
gh/gemini-3.1-pro-preview
gh/grok-code-fast-1Dashboard → Providers → Connect Cursor
→ OAuth login
→ Monthly subscription
Models:
cu/claude-4.6-opus-max
cu/claude-4.5-sonnet-thinking
cu/gpt-5.3-codex💰 Cheap Providers (Backup)
- Sign up: Zhipu AI
- Get API key from Coding Plan
- Dashboard → Add API Key:
- Provider:
glm - API Key:
your-key
- Provider:
Use: glm/glm-5.1, glm/glm-5, glm/glm-4.7
Pro Tip: Coding Plan offers 3× quota at 1/7 cost! Reset daily 10:00 AM.
- Sign up: MiniMax
- Get API key
- Dashboard → Add API Key
Use: minimax/MiniMax-M2.7, minimax/MiniMax-M2.5
Pro Tip: Cheapest option for long context (1M tokens)!
- Subscribe: Moonshot AI
- Get API key
- Dashboard → Add API Key
Use: kimi/kimi-k2.5, kimi/kimi-k2.5-thinking
Pro Tip: Fixed $9/month for 10M tokens = $0.90/1M effective cost!
🆓 FREE Providers (Recommended)
Dashboard → Connect Kiro
→ AWS Builder ID, AWS IAM Identity Center, Google, or GitHub
→ Unlimited usage
Models:
kr/claude-sonnet-4.5
kr/claude-haiku-4.5
kr/glm-5
kr/MiniMax-M2.5
kr/qwen3-coder-next
kr/deepseek-3.2Pro Tip: Best free option for Claude. No API key, no payment, fully unlimited.
Dashboard → Connect OpenCode Free
→ No login required (passthrough proxy)
→ Models auto-fetched from opencode.ai/zen/v1/modelsPro Tip: Fastest setup. Just connect and start coding.
Dashboard → Connect Vertex AI
→ Upload Google Cloud Service Account JSON
→ Enable Vertex AI API in your GCP project
Models:
vertex/gemini-3.1-pro-preview
vertex/gemini-3-flash-preview
vertex/gemini-2.5-flash
Vertex Partner (Anthropic / DeepSeek / GLM / Qwen via Vertex):
vertex-partner/glm-5-maas
vertex-partner/deepseek-v3.2-maas
vertex-partner/qwen3-next-80b-a3b-thinking-maasPro Tip: New Google Cloud accounts get $300 credits free for 90 days. Plenty for daily coding.
🎨 Create Combos
Dashboard → Combos → Create New
Name: premium-coding
Models:
1. cc/claude-opus-4-7 (Subscription primary)
2. glm/glm-5.1 (Cheap backup, $0.6/1M)
3. minimax/MiniMax-M2.7 (Cheapest fallback, $0.20/1M)
Use in CLI: premium-coding
Monthly cost example (100M tokens):
80M via Claude (subscription): $0 extra
15M via GLM: $9
5M via MiniMax: $1
Total: $10 + your subscription
Name: free-combo
Models:
1. kr/claude-sonnet-4.5 (Claude 4.5 free unlimited)
2. kr/glm-5 (GLM-5 free via Kiro)
3. vertex/gemini-3.1-pro-preview ($300 free credits)
Cost: $0 forever (+ 20-40% token savings via RTK)!
🔧 CLI Integration
Settings → Models → Advanced:
OpenAI API Base URL: http://localhost:20128/v1
OpenAI API Key: [from extremerouter dashboard]
Model: cc/claude-opus-4-7
Or use combo: premium-coding
Edit ~/.claude/config.json:
{
"anthropic_api_base": "http://localhost:20128/v1",
"anthropic_api_key": "your-extremerouter-api-key"
}export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-extremerouter-api-key"
codex "your prompt"Option 1 — Dashboard (recommended):
Dashboard → CLI Tools → OpenClaw → Select Model → Apply
Option 2 — Manual: Edit ~/.openclaw/openclaw.json:
{
"agents": {
"defaults": {
"model": {
"primary": "extremerouter/kr/claude-sonnet-4.5"
}
}
},
"models": {
"providers": {
"@rsalmn/extremerouter": {
"baseUrl": "http://127.0.0.1:20128/v1",
"apiKey": "sk_extremerouter",
"api": "openai-completions",
"models": [
{
"id": "kr/claude-sonnet-4.5",
"name": "Claude Sonnet 4.5 (Kiro Free)"
}
]
}
}
}
}Note: OpenClaw only works with local ExtremeRouter. Use
127.0.0.1instead oflocalhostto avoid IPv6 resolution issues.
Provider: OpenAI Compatible
Base URL: http://localhost:20128/v1
API Key: [from dashboard]
Model: cc/claude-opus-4-7
🚀 Deployment
# Clone and install
git clone https://github.com/rsalmn/extremerouter.git
cd extremerouter
npm install
npm run build
# Configure
export JWT_SECRET="your-secure-secret-change-this"
export INITIAL_PASSWORD="your-password"
export DATA_DIR="/var/lib/extremerouter"
export PORT="20128"
export HOSTNAME="0.0.0.0"
export NODE_ENV="production"
export NEXT_PUBLIC_BASE_URL="http://localhost:20128"
export NEXT_PUBLIC_CLOUD_URL="https://github.com/rsalmn/extremerouter"
export API_KEY_SECRET="endpoint-proxy-api-key-secret"
export MACHINE_ID_SALT="endpoint-proxy-salt"
# Start
npm run start
# Or use PM2
npm install -g pm2
pm2 start npm --name extremerouter -- start
pm2 save
pm2 startupPublished images (multi-platform linux/amd64 + linux/arm64):
- Docker Hub:
rsalmn/extremerouter - GHCR:
ghcr.io/rsalmn/extremerouter
Quick start (use published image):
docker run -d \
--name extremerouter \
-p 20128:20128 \
-v "$HOME/.extremerouter:/app/data" \
-e DATA_DIR=/app/data \
rsalmn/extremerouter:latest→ Open http://localhost:20128
Build from source (dev):
git clone https://github.com/rsalmn/extremerouter.git
cd extremerouter/app
docker build -t extremerouter .
docker run -d --name extremerouter -p 20128:20128 \
-v "$HOME/.extremerouter:/app/data" -e DATA_DIR=/app/data extremerouterContainer defaults:
PORT=20128HOSTNAME=0.0.0.0
Useful commands:
docker logs -f extremerouter
docker restart extremerouter
docker stop extremerouter && docker rm extremerouter
docker pull rsalmn/extremerouter:latest # update to latestData persistence: $HOME/.extremerouter/db/data.sqlite on host ↔ /app/data/db/data.sqlite in container.
| Variable | Default | Description |
|---|---|---|
JWT_SECRET |
Auto-generated (~/.extremerouter/jwt-secret) |
JWT signing secret for dashboard auth cookie (override to share across instances) |
INITIAL_PASSWORD |
123456 |
First login password when no saved hash exists |
DATA_DIR |
~/.extremerouter |
Main app data location (SQLite at $DATA_DIR/db/data.sqlite) |
PORT |
framework default | Service port (20128 in examples) |
HOSTNAME |
framework default | Bind host (Docker defaults to 0.0.0.0) |
NODE_ENV |
runtime default | Set production for deploy |
BASE_URL |
http://localhost:20128 |
Server-side internal base URL used by cloud sync jobs |
CLOUD_URL |
https://github.com/rsalmn/extremerouter |
Server-side cloud sync endpoint base URL |
NEXT_PUBLIC_BASE_URL |
http://localhost:3000 |
Backward-compatible/public base URL (prefer BASE_URL for server runtime) |
NEXT_PUBLIC_CLOUD_URL |
https://github.com/rsalmn/extremerouter |
Backward-compatible/public cloud URL (prefer CLOUD_URL for server runtime) |
API_KEY_SECRET |
endpoint-proxy-api-key-secret |
HMAC secret for generated API keys |
MACHINE_ID_SALT |
endpoint-proxy-salt |
Salt for stable machine ID hashing |
ENABLE_REQUEST_LOGS |
false |
Enables request/response logs under logs/ |
AUTH_COOKIE_SECURE |
false |
Force Secure auth cookie (set true behind HTTPS reverse proxy) |
REQUIRE_API_KEY |
false |
Enforce Bearer API key on /v1/* routes (recommended for internet-exposed deploys) |
HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, NO_PROXY |
empty | Optional outbound proxy for upstream provider calls |
Notes:
- Lowercase proxy variables are also supported:
http_proxy,https_proxy,all_proxy,no_proxy. .envis not baked into Docker image (.dockerignore); inject runtime config with--env-fileor-e.- On Windows,
APPDATAcan be used for local storage path resolution. INSTANCE_NAMEappears in older docs/env templates, but is currently not used at runtime.
- Main app state:
${DATA_DIR}/db/data.sqlite(SQLite — providers, combos, aliases, keys, settings, usage history) - Auto backups:
${DATA_DIR}/db/backups/ - Optional request/translator logs:
<repo>/logs/...whenENABLE_REQUEST_LOGS=true - Both
${DATA_DIR}and~/.extremerouterresolve to the same location in a Docker container — the symlink/root/.extremerouter -> /app/datais created at build time.
View all available models
Claude Code (cc/) - Pro/Max:
cc/claude-opus-4-7cc/claude-opus-4-6cc/claude-sonnet-4-6cc/claude-sonnet-4-5-20250929cc/claude-haiku-4-5-20251001
Codex (cx/) - Plus/Pro:
cx/gpt-5.5cx/gpt-5.4cx/gpt-5.3-codexcx/gpt-5.2-codexcx/gpt-5.1-codex-max
GitHub Copilot (gh/):
gh/gpt-5.4gh/claude-opus-4.7gh/claude-sonnet-4.6gh/gemini-3.1-pro-previewgh/grok-code-fast-1
Cursor (cu/) - Subscription:
cu/claude-4.6-opus-maxcu/claude-4.5-sonnet-thinkingcu/gpt-5.3-codexcu/kimi-k2.5
GLM (glm/) - $0.6/1M:
glm/glm-5.1glm/glm-5glm/glm-4.7
MiniMax (minimax/) - $0.2/1M:
minimax/MiniMax-M2.7minimax/MiniMax-M2.5
Kimi (kimi/) - $9/mo flat:
kimi/kimi-k2.5kimi/kimi-k2.5-thinking
Kiro (kr/) - FREE unlimited:
kr/claude-sonnet-4.5kr/claude-haiku-4.5kr/glm-5kr/MiniMax-M2.5kr/qwen3-coder-nextkr/deepseek-3.2
OpenCode Free (oc/) - FREE no-auth:
- Auto-fetched from
opencode.ai/zen/v1/models
Vertex AI (vertex/) - $300 free credits:
vertex/gemini-3.1-pro-previewvertex/gemini-3-flash-previewvertex/gemini-2.5-flashvertex-partner/glm-5-maasvertex-partner/deepseek-v3.2-maas
"Language model did not provide messages"
- Provider quota exhausted → Check dashboard quota tracker
- Solution: Use combo fallback or switch to cheaper tier
Rate limiting
- Subscription quota out → Fallback to GLM/MiniMax
- Add combo:
cc/claude-opus-4-7 → glm/glm-5.1 → kr/claude-sonnet-4.5
OAuth token expired
- Auto-refreshed by ExtremeRouter
- If issues persist: Dashboard → Provider → Reconnect
High costs
- Enable RTK in Dashboard → Endpoint settings (default ON, saves 20-40% tokens)
- Check usage stats in Dashboard
- Switch primary model to GLM/MiniMax
- Use free tier (Kiro, OpenCode Free, Vertex) for non-critical tasks
Dashboard opens on wrong port
- Set
PORT=20128andNEXT_PUBLIC_BASE_URL=http://localhost:20128
First login not working
- Check
INITIAL_PASSWORDin.env - If unset, fallback password is
123456
No request logs under logs/
- Set
ENABLE_REQUEST_LOGS=true
- Runtime: Node.js 20+
- Framework: Next.js 16
- UI: React 19 + Tailwind CSS 4
- Database: SQLite (better-sqlite3 / node:sqlite / sql.js fallback)
- Streaming: Server-Sent Events (SSE)
- Auth: OAuth 2.0 (PKCE) + JWT + API Keys
POST http://localhost:20128/v1/chat/completions
Authorization: Bearer your-api-key
Content-Type: application/json
{
"model": "cc/claude-opus-4-6",
"messages": [
{"role": "user", "content": "Write a function to..."}
],
"stream": true
}GET http://localhost:20128/v1/models
Authorization: Bearer your-api-key
→ Returns all models + combos in OpenAI format- Website: github.com/rsalmn/extremerouter
- GitHub: github.com/rsalmn/extremerouter
- Issues: github.com/rsalmn/extremerouter/issues
Thanks to all contributors who helped make ExtremeRouter better!
Comunity forks will be listed here. Submit a Pull Request to add yours.
Built on the shoulders of giants:
- CLIProxyAPI — original Go implementation that inspired this JavaScript port.
- RTK
— Rust token-saver. ExtremeRouter ports its compression pipeline to JS → −20-40% input tokens on every request.
- Caveman
by @JuliusBrussee — viral "why use many token when few token do trick". ExtremeRouter adapts its prompt → −65% output tokens.
- Ponytail
by @DietrichGebert — "lazy senior dev" skill. ExtremeRouter injects its YAGNI-first ladder → fewer tokens, less code, shorter diffs.
Huge thanks to these authors — without their work, ExtremeRouter's token-saving features wouldn't exist. ⭐ them on GitHub!
MIT License - see LICENSE for details.































