ExtremeRouter — AI Gateway Control Plane

ExtremeRouter — AI Gateway Control Plane

A premium, developer-first local gateway that routes traffic from your AI coding tools to 40+ providers with format translation, smart fallback, quota tracking, and 20–40% token savings.

Connect Claude Code, Cursor, Antigravity, Copilot, Codex, Gemini, OpenCode, Cline, OpenClaw and any OpenAI/Anthropic-compatible client to 100+ models through one endpoint.

🚀 Quick Start • 💡 Features • 📖 Setup Guide • 🌐 Repository

🤔 Why ExtremeRouter?

Stop wasting money, tokens and hitting limits:

❌ Subscription quota expires unused every month
❌ Rate limits stop you mid-coding
❌ Tool outputs (git diff, grep, ls...) burn tokens fast
❌ Expensive APIs ($20-50/month per provider)
❌ Manual switching between providers

ExtremeRouter solves this:

✅ RTK Token Saver - Auto-compress tool_result content, save 20-40% tokens per request
✅ Maximize subscriptions - Track quota, use every bit before reset
✅ Auto fallback - Subscription → Cheap → Free, zero downtime
✅ Multi-account - Round-robin between accounts per provider
✅ Universal - Works with Claude Code, Codex, Cursor, Cline, any CLI tool

🔄 How It Works

┌─────────────┐
│  Your CLI   │  (Claude Code, Codex, OpenClaw, Cursor, Cline...)
│   Tool      │
└──────┬──────┘
       │ http://localhost:20128/v1
       ↓
┌─────────────────────────────────────────────┐
│           ExtremeRouter (Smart Router)            │
│  • RTK Token Saver (cut tool_result tokens) │
│  • Format translation (OpenAI ↔ Claude)     │
│  • Quota tracking                           │
│  • Auto token refresh                       │
└──────┬──────────────────────────────────────┘
       │
       ├─→ [Tier 1: SUBSCRIPTION] Claude Code, Codex, GitHub Copilot
       │   ↓ quota exhausted
       ├─→ [Tier 2: CHEAP] GLM ($0.6/1M), MiniMax ($0.2/1M)
       │   ↓ budget limit
       └─→ [Tier 3: FREE] Kiro, OpenCode Free, Vertex ($300 credits)

Result: Never stop coding, minimal cost + 20-40% token savings via RTK

⚡ Quick Start

1. Install globally:

npm install -g @rsalmn/extremerouter
extremerouter

🎉 Dashboard opens at http://localhost:20128

2. Connect a FREE provider (no signup needed):

Dashboard → Providers → Connect Kiro AI (free Claude unlimited) or OpenCode Free (no auth) → Done!

3. Use in your CLI tool:

Claude Code/Codex/OpenClaw/Cursor/Cline Settings:
  Endpoint: http://localhost:20128/v1
  API Key: [copy from dashboard]
  Model: kr/claude-sonnet-4.5

That's it! Start coding with FREE AI models.

Alternative: run from source (this repository):

This repository package is private (@rsalmn/extremerouter-app), so source/Docker execution is the expected local development path.

cp .env.example .env
npm install
PORT=20128 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run dev

Production mode:

npm run build
PORT=20128 HOSTNAME=0.0.0.0 NEXT_PUBLIC_BASE_URL=http://localhost:20128 npm run start

Default URLs:

Dashboard: http://localhost:20128/dashboard
OpenAI-compatible API: http://localhost:20128/v1

🛠️ Supported CLI Tools

ExtremeRouter works seamlessly with all major AI coding tools:

Claude-Code	OpenClaw	Codex	OpenCode	Cursor	Antigravity
Cline	Continue	Droid	Roo	Copilot	Kilo Code

🌐 Supported Providers

🔐 OAuth Providers

Claude-Code

Antigravity

Codex

GitHub

Cursor

Kimchi

🆓 Free Providers

Kiro AI
_{Claude 4.5 + GLM-5 + MiniMax
Unlimited FREE}

OpenCode Free
_{No auth • Auto-fetch models
Unlimited FREE}

Vertex AI
_{Gemini 3 Pro + GLM-5 + DeepSeek
$300 credits free}

Note: iFlow, Qwen and Gemini CLI free tiers were discontinued in 2026. Use Kiro / OpenCode Free / Vertex instead.

🔑 API Key Providers (40+)

_OpenRouter	_GLM	_Kimi	_MiniMax	_OpenAI	_Anthropic
_Gemini	_DeepSeek	_Groq	_xAI	_Mistral	_Perplexity
_{Together AI}	_Fireworks	_Cerebras	_Cohere	_NVIDIA	_SiliconFlow

...and 20+ more providers including Nebius, Chutes, Hyperbolic, and custom OpenAI/Anthropic compatible endpoints

💡 Key Features

Feature	What It Does	Why It Matters
🚀 RTK Token Saver (RTK ⭐40K)	Compress tool outputs (`git diff`, `grep`, `ls`, `tree`...) before sending to LLM	Save 20-40% input tokens per request
🧠 Headroom Token Saver (Headroom)	Optional external `/v1/compress` proxy before provider routing	Save more context tokens without changing clients
🪨 Caveman Mode (Caveman ⭐52K)	Inject caveman-speak prompt → LLM replies terse, technical substance preserved	Save up to 65% output tokens
🐴 Ponytail (Ponytail)	Inject "lazy senior dev" prompt → LLM writes minimal, YAGNI-first code (Lite/Full/Ultra)	Fewer output tokens, less refactoring
🎯 Smart 3-Tier Fallback	Auto-route: Subscription → Cheap → Free	Never stop coding, zero downtime
📊 Real-Time Quota Tracking	Live token count + reset countdown	Maximize subscription value
🔄 Format Translation	OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro ↔ Vertex	Works with any CLI tool
👥 Multi-Account Support	Multiple accounts per provider	Load balancing + redundancy
🔄 Auto Token Refresh	OAuth tokens refresh automatically	No manual re-login needed
🎨 Custom Combos	Create unlimited model combinations	Tailor fallback to your needs
🐝 Hierarchical Swarm Engine	Multi-agent orchestration: Manager → Staff → Workers → Audit → Synthesis with persona-bleed protection	Tackle complex tasks via parallel agents with a smart gatekeeper
🩺 Health Monitor	Live per-provider health (in-memory sliding window) + SSE dashboard	Spot degraded providers before they break your flow
🔌 Circuit Breaker	Per-provider CLOSED/OPEN/HALF_OPEN state machine that auto-skips failing upstreams	Stop wasting requests on a down provider; auto-recover when it heals
🔐 Per-Key Model ACL	Restrict which models each API key may call (`allowedModels`)	Hand out scoped keys to teammates/clients without opening everything
🍪 Cookies Providers	20 web-chat providers via browser cookies (ChatGLM, DeepSeek, Qwen, Kimi, Blackbox, T3, DuckDuckGo, Venice, DouBao, v0, Poe, Copilot, Meta AI, Adapta, VeoAI, Claude/ChatGPT/Gemini web, Grok, Perplexity)	Use free web tiers with one pasted cookie — no API key needed
🤖 Devin CLI Provider	Session-based adapter for the Devin (Cognition) API	Route to Devin agent modes (normal/fast/lite/ultra) from any OpenAI client
🧪 Model Test All	Search box + "Test All" button on every provider's Available Models	Validate reachability of all models in one click, scoped to search results
📝 Request Logging	Debug mode with full request/response logs	Troubleshoot issues easily
💾 Cloud Sync	Sync config across devices	Same setup everywhere
📊 Usage Analytics	Track tokens, cost, trends over time	Optimize spending
🌐 Deploy Anywhere	Localhost, VPS, Docker, Cloudflare Workers	Flexible deployment options

📖 Feature Details

🚀 RTK Token Saver

Tool outputs (git diff, grep, find, ls, tree, log dumps...) often eat 30-50% of your prompt budget. RTK detects them and applies smart, lossless compression before the request hits the LLM:

Filters: git-diff, git-status, grep, find, ls, tree, dedup-log, smart-truncate, read-numbered, search-list
Auto-detect: No config needed — RTK peeks the first 1KB of each tool_result and picks the right filter.
Safe by design: If a filter fails, throws, or makes output bigger, RTK silently keeps the original text. Errors never break your request.
Universal: Works across all formats (OpenAI, Claude, Gemini, Cursor, Kiro, OpenAI Responses) because it runs before any format translation.
Default ON: Toggle anytime in Dashboard → Endpoint settings.

Without RTK: 47K tokens sent to LLM
With RTK:    28K tokens sent to LLM   (40% saved · same context · same answer)

🧠 Headroom Token Saver

Headroom is optional and runs separately. ExtremeRouter calls Headroom's local /v1/compress endpoint, then keeps normal routing, fallback, auth, and usage tracking:

Client → ExtremeRouter → Headroom /v1/compress → ExtremeRouter → provider

Local setup:

pip install "headroom-ai[proxy]"
headroom proxy --port 8787

Enable in Dashboard → Endpoint → Token Saver → Headroom. Default URL: http://localhost:8787.

Docker examples:

# Headroom service in same Docker network
http://headroom:8787

# Headroom running on host machine
http://host.docker.internal:8787

If Headroom is down or returns an error, ExtremeRouter fails open and sends the original request.

🐴 Ponytail (Lazy Senior Dev)

Ponytail injects a "lazy senior dev" system prompt into every request, biasing the LLM toward minimal, YAGNI-first code — deletion over addition, stdlib over new deps, one-liners over abstractions. Adapted from DietrichGebert/ponytail.

Lite — Build what's asked, name the lazier alternative.
Full — YAGNI ladder enforced: stdlib → native → existing deps → one-liner → minimal code.
Ultra — YAGNI extremist: deletion first, ship the one-liner, challenge the rest of the requirement in the same response.

Without Ponytail: verbose code, extra abstractions, "just in case" scaffolding
With Ponytail:    shortest working diff, no unrequested abstractions, fewer tokens

Never trades away: input validation, error handling that prevents data loss, security, accessibility, or anything explicitly requested. Enable in Dashboard → Endpoint → Ponytail. Stacks with Caveman (output terseness) and RTK (input compression).

🎯 Smart 3-Tier Fallback

Create combos with automatic fallback:

Combo: "my-coding-stack"
  1. cc/claude-opus-4-6        (your subscription)
  2. glm/glm-4.7               (cheap backup, $0.6/1M)
  3. if/kimi-k2-thinking       (free fallback)

→ Auto switches when quota runs out or errors occur

📊 Real-Time Quota Tracking

Token consumption per provider
Reset countdown (5-hour, daily, weekly)
Cost estimation for paid tiers
Monthly spending reports

🔄 Format Translation

Seamless translation between formats:

OpenAI ↔ Claude ↔ Gemini ↔ Cursor ↔ Kiro ↔ Vertex ↔ Antigravity ↔ Ollama ↔ OpenAI Responses
Your CLI tool sends OpenAI format → ExtremeRouter translates → Provider receives native format
Works with any tool that supports custom OpenAI endpoints

👥 Multi-Account Support

Add multiple accounts per provider
Auto round-robin or priority-based routing
Fallback to next account when one hits quota

🔄 Auto Token Refresh

OAuth tokens automatically refresh before expiration
No manual re-authentication needed
Seamless experience across all providers

🎨 Custom Combos

Create unlimited model combinations
Mix subscription, cheap, and free tiers
Name your combos for easy access
Share combos across devices with Cloud Sync

🐝 Hierarchical Swarm Engine

A multi-agent orchestration combo strategy for complex, multi-step tasks. Instead of one model answering, a small team of agents collaborates:

Manager plans the strategy and splits the task
Staff workers run subtasks in parallel (load-balanced across providers)
Audit stage reviews worker output
Manager synthesizes the final answer
A Smart Gatekeeper triages the prompt first (simple → single-model fast path; complex → swarm)
Persona-bleed protection keeps each stage's role isolated

Live telemetry streams to the Dashboard → Swarm page (SSE). Configure in Dashboard → Combos → choose the "Hierarchical Swarm" strategy.

🩺 Health Monitor + 🔌 Circuit Breaker

Two reliability layers that keep requests flowing even when a provider degrades:

Health Monitor records a sliding window of success/failure samples per provider (in-memory ring buffer) and exposes a live SSE feed + the Dashboard → Health page. Spot a provider going red before it ruins your coding session.
Circuit Breaker wraps every provider in a CLOSED / OPEN / HALF_OPEN state machine. When a provider fails repeatedly, the breaker OPENS and routing auto-skips it (no wasted requests, no timeouts) — then probes with HALF_OPEN to auto-recover the moment it heals.

Both are zero-config; tunable in Settings.

🔐 Per-Key Model Access Control (ACL)

Each API key you mint can carry an allowedModels allow-list. Requests using that key are rejected up-front (403) if they target a model outside the list — so you can hand out scoped keys to teammates, clients, or downstream tools without exposing your full catalog. Managed per-key in Dashboard → Endpoint.

🍪 Cookies Providers

A dedicated Cookies Provider category (above OAuth on the Providers page) for free web-chat services that authenticate via a browser cookie instead of an API key. Paste your cookies (or just the session token) into a multi-line field and ExtremeRouter handles token refresh, WAF cookies, PoW challenges, and SSE translation per site:

No anti-bot (reliable): ChatGLM, DeepSeek, Qwen, Kimi, Blackbox, T3 Chat, DuckDuckGo, Venice, DouBao, v0, Poe, Copilot, Meta AI (Muse), Adapta, VeoAI
Anti-bot (best-effort, may be Cloudflare-blocked): Claude Web, ChatGPT Web, Gemini Web
Plus existing cookie providers: Grok Web, Perplexity Web

Note: Claude/ChatGPT/Gemini web deploy aggressive Cloudflare anti-bot. Plain server-side fetch is often blocked (403) even with valid cookies — these are included best-effort.

🤖 Devin CLI Provider

Devin (by Cognition) is session-based, not OpenAI-compatible. The Devin adapter bridges it: it creates a Devin session, polls for completion, and synthesizes an OpenAI SSE stream back. Models map to Devin agent modes — devin-normal, devin-fast, devin-lite, devin-ultra. Auth is API-key only (cog_...).

📝 Request Logging

Enable debug mode for full request/response logs
Track API calls, headers, and payloads
Troubleshoot integration issues
Export logs for analysis

💾 Cloud Sync

Sync providers, combos, and settings across devices
Automatic background sync
Secure encrypted storage
Access your setup from anywhere

Cloud Runtime Notes

Prefer server-side cloud variables in production:
- BASE_URL (internal callback URL used by sync scheduler)
- CLOUD_URL (cloud sync endpoint base)
NEXT_PUBLIC_BASE_URL and NEXT_PUBLIC_CLOUD_URL are still supported for compatibility/UI, but server runtime now prioritizes BASE_URL/CLOUD_URL.
Cloud sync requests now use timeout + fail-fast behavior to avoid UI hanging when cloud DNS/network is unavailable.

📊 Usage Analytics

Track token usage per provider and model
Cost estimation and spending trends
Monthly reports and insights
Optimize your AI spending

💡 IMPORTANT - Understanding Dashboard Costs:

The "cost" displayed in Usage Analytics is for tracking and comparison purposes only. ExtremeRouter itself never charges you anything. You only pay providers directly (if using paid services).

Example: If your dashboard shows "$290 total cost" while using iFlow models, this represents what you would have paid using paid APIs directly. Your actual cost = $0 (iFlow is free unlimited).

Think of it as a "savings tracker" showing how much you're saving by using free models or routing through ExtremeRouter!

🌐 Deploy Anywhere

💻 Localhost - Default, works offline
☁️ VPS/Cloud - Share across devices
🐳 Docker - One-command deployment
🚀 Cloudflare Workers - Global edge network

💰 Pricing at a Glance

Tier	Provider	Cost	Quota Reset	Best For
🚀 TOKEN SAVER	RTK (built-in)	FREE	Always on	Save 20-40% tokens on EVERY request
💳 SUBSCRIPTION	Claude Code (Pro/Max)	$20-200/mo	5h + weekly	Already subscribed
	Codex (Plus/Pro)	$20-200/mo	5h + weekly	OpenAI users
	GitHub Copilot	$10-19/mo	Monthly	GitHub users
	Cursor IDE	$20/mo	Monthly	Cursor users
💰 CHEAP	GLM-5.1 / GLM-4.7	$0.6/1M	Daily 10AM	Budget backup
	MiniMax M2.7	$0.2/1M	5-hour rolling	Cheapest option
	Kimi K2.5	$9/mo flat	10M tokens/mo	Predictable cost
🆓 FREE	Kiro AI	$0	Unlimited	Claude 4.5 + GLM-5 + MiniMax free
	OpenCode Free	$0	Unlimited	No auth, auto-fetch models
	Vertex AI	$300 credits	New GCP accounts	Gemini 3 Pro + DeepSeek + GLM-5

💡 Pro Tip: RTK + Kiro AI + OpenCode Free combo = $0 cost + 20-40% token savings!

📊 Understanding ExtremeRouter Costs & Billing

ExtremeRouter Billing Reality:

✅ ExtremeRouter software = FREE forever (open source, never charges)
✅ Dashboard "costs" = Display/tracking only (not actual bills)
✅ You pay providers directly (subscriptions or API fees)
✅ FREE providers stay FREE (iFlow, Kiro, Qwen = $0 unlimited)
❌ ExtremeRouter never sends invoices or charges your card

How Cost Display Works:

The dashboard shows estimated costs as if you were using paid APIs directly. This is not billing - it's a comparison tool to show your savings.

Example Scenario:

Dashboard Display:
• Total Requests: 1,662
• Total Tokens: 47M
• Display Cost: $290

Reality Check:
• Provider: iFlow (FREE unlimited)
• Actual Payment: $0.00
• What $290 Means: Amount you SAVED by using free models!

Payment Rules:

Subscription providers (Claude Code, Codex): Pay them directly via their websites
Cheap providers (GLM, MiniMax): Pay them directly, ExtremeRouter just routes
FREE providers (iFlow, Kiro, Qwen): Genuinely free forever, no hidden charges
ExtremeRouter: Never charges anything, ever

🎯 Use Cases

Case 1: "I have Claude Pro subscription"

Problem: Quota expires unused, rate limits during heavy coding

Solution:

Combo: "maximize-claude"
  1. cc/claude-opus-4-7        (use subscription fully)
  2. glm/glm-5.1               (cheap backup when quota out)
  3. kr/claude-sonnet-4.5      (free emergency fallback)

Monthly cost: $20 (subscription) + ~$5 (backup) = $25 total
vs. $20 + hitting limits = frustration

Case 2: "I want zero cost"

Problem: Can't afford subscriptions, need reliable AI coding

Solution:

Combo: "free-forever"
  1. kr/claude-sonnet-4.5      (Claude 4.5 free unlimited)
  2. kr/glm-5                  (GLM-5 free via Kiro)
  3. oc/<auto>                 (OpenCode Free, no auth)

Monthly cost: $0
Quality: Production-ready models + RTK saves 20-40% tokens

Case 3: "I need 24/7 coding, no interruptions"

Problem: Deadlines, can't afford downtime

Solution:

Combo: "always-on"
  1. cc/claude-opus-4-7        (best quality)
  2. cx/gpt-5.5                (second subscription)
  3. glm/glm-5.1               (cheap, resets daily)
  4. minimax/MiniMax-M2.7      (cheapest, 5h reset)
  5. kr/claude-sonnet-4.5      (free unlimited)

Result: 5 layers of fallback = zero downtime
Monthly cost: $20-200 (subscriptions) + $10-20 (backup)

Case 4: "I want FREE AI in OpenClaw"

Problem: Need AI assistant in messaging apps (WhatsApp, Telegram, Slack...), completely free

Solution:

Combo: "openclaw-free"
  1. kr/claude-sonnet-4.5      (Claude 4.5 free)
  2. kr/glm-5                  (GLM-5 free)
  3. kr/MiniMax-M2.5           (MiniMax free)

Monthly cost: $0
Access via: WhatsApp, Telegram, Slack, Discord, iMessage, Signal...

❓ Frequently Asked Questions

📊 Why does my dashboard show high costs?

The dashboard tracks your token usage and displays estimated costs as if you were using paid APIs directly. This is not actual billing - it's a reference to show how much you're saving by using free models or existing subscriptions through ExtremeRouter.

Example:

Dashboard shows: "$290 total cost"
Reality: You're using iFlow (FREE unlimited)
Your actual cost: $0.00
What $290 means: Amount you saved by using free models instead of paid APIs!

The cost display is a "savings tracker" to help you understand your usage patterns and optimization opportunities.

💳 Will I be charged by ExtremeRouter?

No. ExtremeRouter is free, open-source software that runs on your own computer. It never charges you anything.

You only pay:

✅ Subscription providers (Claude Code $20/mo, Codex $20-200/mo) → Pay them directly on their websites
✅ Cheap providers (GLM, MiniMax) → Pay them directly, ExtremeRouter just routes your requests
❌ ExtremeRouter itself → Never charges anything, ever

ExtremeRouter is a local proxy/router. It doesn't have your credit card, can't send invoices, and has no billing system. It's completely free software.

🆓 Are FREE providers really unlimited?

Yes! The current FREE providers (Kiro, OpenCode Free, Vertex) are genuinely free with no hidden charges.

These are free services offered by those respective companies:

Kiro AI: Free unlimited Claude 4.5 + GLM-5 + MiniMax via AWS Builder ID / Google / GitHub OAuth
OpenCode Free: No-auth passthrough proxy, models auto-fetched from opencode.ai/zen/v1/models
Vertex AI: $300 free credits for new Google Cloud accounts (90 days)

ExtremeRouter just routes your requests to them - there's no "catch" or future billing. They're truly free services, and ExtremeRouter makes them easy to use with fallback support.

Discontinued free tiers (no longer recommended):

❌ iFlow: Was free unlimited, now changed to paid (2026)
❌ Qwen Code: Free OAuth tier discontinued by Alibaba on 2026-04-15
❌ Gemini CLI: Still works, but using it with non-CLI tools (Claude, Codex, Cursor...) may result in account bans — only use if you stick to Gemini CLI itself

💰 How do I minimize my actual AI costs?

Free-First Strategy:

Start with 100% free combo:

1. gc/gemini-3-flash (180K/month free from Google)
2. if/kimi-k2-thinking (unlimited free from iFlow)
3. qw/qwen3-coder-plus (unlimited free from Qwen)

Cost: $0/month

Add cheap backup only if you need it:
```
4. glm/glm-4.7 ($0.6/1M tokens)
```
Additional cost: Only pay for what you actually use
Use subscription providers last:
- Only if you already have them
- ExtremeRouter helps maximize their value through quota tracking

Result: Most users can operate at $0/month using only free tiers!

📈 What if my usage suddenly spikes?

ExtremeRouter's smart fallback prevents surprise charges:

Scenario: You're on a coding sprint and blow through your quotas

Without ExtremeRouter:

❌ Hit rate limit → Work stops → Frustration
❌ Or: Accidentally rack up huge API bills

With ExtremeRouter:

✅ Subscription hits limit → Auto-fallback to cheap tier
✅ Cheap tier gets expensive → Auto-fallback to free tier
✅ Never stop coding → Predictable costs

You're in control: Set spending limits per provider in dashboard, and ExtremeRouter respects them.

📖 Setup Guide

🔐 Subscription Providers (Maximize Value)

Claude Code (Pro/Max)

Dashboard → Providers → Connect Claude Code
→ OAuth login → Auto token refresh
→ 5-hour + weekly quota tracking

Models:
  cc/claude-opus-4-7
  cc/claude-opus-4-6
  cc/claude-sonnet-4-6
  cc/claude-haiku-4-5-20251001

Pro Tip: Use Opus for complex tasks, Sonnet for speed. ExtremeRouter tracks quota per model!

OpenAI Codex (Plus/Pro)

Dashboard → Providers → Connect Codex
→ OAuth login (port 1455)
→ 5-hour + weekly reset

Models:
  cx/gpt-5.5
  cx/gpt-5.4
  cx/gpt-5.3-codex
  cx/gpt-5.2-codex

GitHub Copilot

Dashboard → Providers → Connect GitHub
→ OAuth via GitHub
→ Monthly reset (1st of month)

Models:
  gh/gpt-5.4
  gh/claude-opus-4.7
  gh/claude-sonnet-4.6
  gh/gemini-3.1-pro-preview
  gh/grok-code-fast-1

Cursor IDE

Dashboard → Providers → Connect Cursor
→ OAuth login
→ Monthly subscription

Models:
  cu/claude-4.6-opus-max
  cu/claude-4.5-sonnet-thinking
  cu/gpt-5.3-codex

💰 Cheap Providers (Backup)

GLM-5.1 / GLM-4.7 (Daily reset, $0.6/1M)

Sign up: Zhipu AI
Get API key from Coding Plan
Dashboard → Add API Key:
- Provider: glm
- API Key: your-key

Use: glm/glm-5.1, glm/glm-5, glm/glm-4.7

Pro Tip: Coding Plan offers 3× quota at 1/7 cost! Reset daily 10:00 AM.

MiniMax M2.7 (5h reset, $0.20/1M)

Sign up: MiniMax
Get API key
Dashboard → Add API Key

Use: minimax/MiniMax-M2.7, minimax/MiniMax-M2.5

Pro Tip: Cheapest option for long context (1M tokens)!

Kimi K2.5 ($9/month flat)

Subscribe: Moonshot AI
Get API key
Dashboard → Add API Key

Use: kimi/kimi-k2.5, kimi/kimi-k2.5-thinking

Pro Tip: Fixed $9/month for 10M tokens = $0.90/1M effective cost!

🆓 FREE Providers (Recommended)

Kiro AI (Claude 4.5 + GLM-5 + MiniMax FREE)

Dashboard → Connect Kiro
→ AWS Builder ID, AWS IAM Identity Center, Google, or GitHub
→ Unlimited usage

Models:
  kr/claude-sonnet-4.5
  kr/claude-haiku-4.5
  kr/glm-5
  kr/MiniMax-M2.5
  kr/qwen3-coder-next
  kr/deepseek-3.2

Pro Tip: Best free option for Claude. No API key, no payment, fully unlimited.

OpenCode Free (No auth, auto-fetch models)

Dashboard → Connect OpenCode Free
→ No login required (passthrough proxy)
→ Models auto-fetched from opencode.ai/zen/v1/models

Pro Tip: Fastest setup. Just connect and start coding.

Vertex AI ($300 free credits for new GCP accounts)

Dashboard → Connect Vertex AI
→ Upload Google Cloud Service Account JSON
→ Enable Vertex AI API in your GCP project

Models:
  vertex/gemini-3.1-pro-preview
  vertex/gemini-3-flash-preview
  vertex/gemini-2.5-flash

Vertex Partner (Anthropic / DeepSeek / GLM / Qwen via Vertex):
  vertex-partner/glm-5-maas
  vertex-partner/deepseek-v3.2-maas
  vertex-partner/qwen3-next-80b-a3b-thinking-maas

Pro Tip: New Google Cloud accounts get $300 credits free for 90 days. Plenty for daily coding.

🎨 Create Combos

Example 1: Maximize Subscription → Cheap Backup

Dashboard → Combos → Create New

Name: premium-coding
Models:
  1. cc/claude-opus-4-7 (Subscription primary)
  2. glm/glm-5.1 (Cheap backup, $0.6/1M)
  3. minimax/MiniMax-M2.7 (Cheapest fallback, $0.20/1M)

Use in CLI: premium-coding

Monthly cost example (100M tokens):
  80M via Claude (subscription): $0 extra
  15M via GLM: $9
  5M via MiniMax: $1
  Total: $10 + your subscription

Example 2: Free-Only (Zero Cost)

Name: free-combo
Models:
  1. kr/claude-sonnet-4.5 (Claude 4.5 free unlimited)
  2. kr/glm-5 (GLM-5 free via Kiro)
  3. vertex/gemini-3.1-pro-preview ($300 free credits)

Cost: $0 forever (+ 20-40% token savings via RTK)!

🔧 CLI Integration

Cursor IDE

Settings → Models → Advanced:
  OpenAI API Base URL: http://localhost:20128/v1
  OpenAI API Key: [from extremerouter dashboard]
  Model: cc/claude-opus-4-7

Or use combo: premium-coding

Claude Code

Edit ~/.claude/config.json:

{
  "anthropic_api_base": "http://localhost:20128/v1",
  "anthropic_api_key": "your-extremerouter-api-key"
}

Codex CLI

export OPENAI_BASE_URL="http://localhost:20128"
export OPENAI_API_KEY="your-extremerouter-api-key"

codex "your prompt"

OpenClaw

Option 1 — Dashboard (recommended):

Dashboard → CLI Tools → OpenClaw → Select Model → Apply

Option 2 — Manual: Edit ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "extremerouter/kr/claude-sonnet-4.5"
      }
    }
  },
  "models": {
    "providers": {
      "@rsalmn/extremerouter": {
        "baseUrl": "http://127.0.0.1:20128/v1",
        "apiKey": "sk_extremerouter",
        "api": "openai-completions",
        "models": [
          {
            "id": "kr/claude-sonnet-4.5",
            "name": "Claude Sonnet 4.5 (Kiro Free)"
          }
        ]
      }
    }
  }
}

Note: OpenClaw only works with local ExtremeRouter. Use 127.0.0.1 instead of localhost to avoid IPv6 resolution issues.

Cline / Continue / RooCode

Provider: OpenAI Compatible
Base URL: http://localhost:20128/v1
API Key: [from dashboard]
Model: cc/claude-opus-4-7

🚀 Deployment

VPS Deployment

# Clone and install
git clone https://github.com/rsalmn/extremerouter.git
cd extremerouter
npm install
npm run build

# Configure
export JWT_SECRET="your-secure-secret-change-this"
export INITIAL_PASSWORD="your-password"
export DATA_DIR="/var/lib/extremerouter"
export PORT="20128"
export HOSTNAME="0.0.0.0"
export NODE_ENV="production"
export NEXT_PUBLIC_BASE_URL="http://localhost:20128"
export NEXT_PUBLIC_CLOUD_URL="https://github.com/rsalmn/extremerouter"
export API_KEY_SECRET="endpoint-proxy-api-key-secret"
export MACHINE_ID_SALT="endpoint-proxy-salt"

# Start
npm run start

# Or use PM2
npm install -g pm2
pm2 start npm --name extremerouter -- start
pm2 save
pm2 startup

Docker

Published images (multi-platform linux/amd64 + linux/arm64):

Docker Hub: rsalmn/extremerouter
GHCR: ghcr.io/rsalmn/extremerouter

Quick start (use published image):

docker run -d \
  --name extremerouter \
  -p 20128:20128 \
  -v "$HOME/.extremerouter:/app/data" \
  -e DATA_DIR=/app/data \
  rsalmn/extremerouter:latest

→ Open http://localhost:20128

Build from source (dev):

git clone https://github.com/rsalmn/extremerouter.git
cd extremerouter/app
docker build -t extremerouter .
docker run -d --name extremerouter -p 20128:20128 \
  -v "$HOME/.extremerouter:/app/data" -e DATA_DIR=/app/data extremerouter

Container defaults:

PORT=20128
HOSTNAME=0.0.0.0

Useful commands:

docker logs -f extremerouter
docker restart extremerouter
docker stop extremerouter && docker rm extremerouter
docker pull rsalmn/extremerouter:latest   # update to latest

Data persistence: $HOME/.extremerouter/db/data.sqlite on host ↔ /app/data/db/data.sqlite in container.

Environment Variables

Variable	Default	Description
`JWT_SECRET`	Auto-generated (`~/.extremerouter/jwt-secret`)	JWT signing secret for dashboard auth cookie (override to share across instances)
`INITIAL_PASSWORD`	`123456`	First login password when no saved hash exists
`DATA_DIR`	`~/.extremerouter`	Main app data location (SQLite at `$DATA_DIR/db/data.sqlite`)
`PORT`	framework default	Service port (`20128` in examples)
`HOSTNAME`	framework default	Bind host (Docker defaults to `0.0.0.0`)
`NODE_ENV`	runtime default	Set `production` for deploy
`BASE_URL`	`http://localhost:20128`	Server-side internal base URL used by cloud sync jobs
`CLOUD_URL`	`https://github.com/rsalmn/extremerouter`	Server-side cloud sync endpoint base URL
`NEXT_PUBLIC_BASE_URL`	`http://localhost:3000`	Backward-compatible/public base URL (prefer `BASE_URL` for server runtime)
`NEXT_PUBLIC_CLOUD_URL`	`https://github.com/rsalmn/extremerouter`	Backward-compatible/public cloud URL (prefer `CLOUD_URL` for server runtime)
`API_KEY_SECRET`	`endpoint-proxy-api-key-secret`	HMAC secret for generated API keys
`MACHINE_ID_SALT`	`endpoint-proxy-salt`	Salt for stable machine ID hashing
`ENABLE_REQUEST_LOGS`	`false`	Enables request/response logs under `logs/`
`AUTH_COOKIE_SECURE`	`false`	Force `Secure` auth cookie (set `true` behind HTTPS reverse proxy)
`REQUIRE_API_KEY`	`false`	Enforce Bearer API key on `/v1/*` routes (recommended for internet-exposed deploys)
`HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, `NO_PROXY`	empty	Optional outbound proxy for upstream provider calls

Notes:

Lowercase proxy variables are also supported: http_proxy, https_proxy, all_proxy, no_proxy.
.env is not baked into Docker image (.dockerignore); inject runtime config with --env-file or -e.
On Windows, APPDATA can be used for local storage path resolution.
INSTANCE_NAME appears in older docs/env templates, but is currently not used at runtime.

Runtime Files and Storage

Main app state: ${DATA_DIR}/db/data.sqlite (SQLite — providers, combos, aliases, keys, settings, usage history)
Auto backups: ${DATA_DIR}/db/backups/
Optional request/translator logs: <repo>/logs/... when ENABLE_REQUEST_LOGS=true
Both ${DATA_DIR} and ~/.extremerouter resolve to the same location in a Docker container — the symlink /root/.extremerouter -> /app/data is created at build time.

📊 Available Models

View all available models

Claude Code (cc/) - Pro/Max:

cc/claude-opus-4-7
cc/claude-opus-4-6
cc/claude-sonnet-4-6
cc/claude-sonnet-4-5-20250929
cc/claude-haiku-4-5-20251001

Codex (cx/) - Plus/Pro:

cx/gpt-5.5
cx/gpt-5.4
cx/gpt-5.3-codex
cx/gpt-5.2-codex
cx/gpt-5.1-codex-max

GitHub Copilot (gh/):

gh/gpt-5.4
gh/claude-opus-4.7
gh/claude-sonnet-4.6
gh/gemini-3.1-pro-preview
gh/grok-code-fast-1

Cursor (cu/) - Subscription:

cu/claude-4.6-opus-max
cu/claude-4.5-sonnet-thinking
cu/gpt-5.3-codex
cu/kimi-k2.5

GLM (glm/) - $0.6/1M:

glm/glm-5.1
glm/glm-5
glm/glm-4.7

MiniMax (minimax/) - $0.2/1M:

minimax/MiniMax-M2.7
minimax/MiniMax-M2.5

Kimi (kimi/) - $9/mo flat:

kimi/kimi-k2.5
kimi/kimi-k2.5-thinking

Kiro (kr/) - FREE unlimited:

kr/claude-sonnet-4.5
kr/claude-haiku-4.5
kr/glm-5
kr/MiniMax-M2.5
kr/qwen3-coder-next
kr/deepseek-3.2

OpenCode Free (oc/) - FREE no-auth:

Auto-fetched from opencode.ai/zen/v1/models

Vertex AI (vertex/) - $300 free credits:

vertex/gemini-3.1-pro-preview
vertex/gemini-3-flash-preview
vertex/gemini-2.5-flash
vertex-partner/glm-5-maas
vertex-partner/deepseek-v3.2-maas

🐛 Troubleshooting

"Language model did not provide messages"

Provider quota exhausted → Check dashboard quota tracker
Solution: Use combo fallback or switch to cheaper tier

Rate limiting

Subscription quota out → Fallback to GLM/MiniMax
Add combo: cc/claude-opus-4-7 → glm/glm-5.1 → kr/claude-sonnet-4.5

OAuth token expired

Auto-refreshed by ExtremeRouter
If issues persist: Dashboard → Provider → Reconnect

High costs

Enable RTK in Dashboard → Endpoint settings (default ON, saves 20-40% tokens)
Check usage stats in Dashboard
Switch primary model to GLM/MiniMax
Use free tier (Kiro, OpenCode Free, Vertex) for non-critical tasks

Dashboard opens on wrong port

Set PORT=20128 and NEXT_PUBLIC_BASE_URL=http://localhost:20128

First login not working

Check INITIAL_PASSWORD in .env
If unset, fallback password is 123456

No request logs under logs/

Set ENABLE_REQUEST_LOGS=true

🛠️ Tech Stack

Runtime: Node.js 20+
Framework: Next.js 16
UI: React 19 + Tailwind CSS 4
Database: SQLite (better-sqlite3 / node:sqlite / sql.js fallback)
Streaming: Server-Sent Events (SSE)
Auth: OAuth 2.0 (PKCE) + JWT + API Keys

📝 API Reference

Chat Completions

POST http://localhost:20128/v1/chat/completions
Authorization: Bearer your-api-key
Content-Type: application/json

{
  "model": "cc/claude-opus-4-6",
  "messages": [
    {"role": "user", "content": "Write a function to..."}
  ],
  "stream": true
}

List Models

GET http://localhost:20128/v1/models
Authorization: Bearer your-api-key

→ Returns all models + combos in OpenAI format

📧 Support

Website: github.com/rsalmn/extremerouter
GitHub: github.com/rsalmn/extremerouter
Issues: github.com/rsalmn/extremerouter/issues

👥 Contributors

Thanks to all contributors who helped make ExtremeRouter better!

📊 Star Chart

🔀 Forks

Comunity forks will be listed here. Submit a Pull Request to add yours.

🙏 Acknowledgments

Built on the shoulders of giants:

CLIProxyAPI — original Go implementation that inspired this JavaScript port.
RTK — Rust token-saver. ExtremeRouter ports its compression pipeline to JS → −20-40% input tokens on every request.
Caveman by @JuliusBrussee — viral "why use many token when few token do trick". ExtremeRouter adapts its prompt → −65% output tokens.
Ponytail by @DietrichGebert — "lazy senior dev" skill. ExtremeRouter injects its YAGNI-first ladder → fewer tokens, less code, shorter diffs.

Huge thanks to these authors — without their work, ExtremeRouter's token-saving features wouldn't exist. ⭐ them on GitHub!

📄 License

MIT License - see LICENSE for details.

_{Built with ❤️ for developers who code 24/7}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.agents/skills		.agents/skills
.github		.github
.vscode		.vscode
cli		cli
docs		docs
images		images
open-sse		open-sse
public		public
scripts		scripts
skills		skills
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
DOCKER.md		DOCKER.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
captain-definition		captain-definition
custom-server.js		custom-server.js
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
jsconfig.json		jsconfig.json
next.config.mjs		next.config.mjs
package.json		package.json
postcss.config.mjs		postcss.config.mjs
start.sh		start.sh

Folders and files

Latest commit

History

Repository files navigation

ExtremeRouter — AI Gateway Control Plane

🤔 Why ExtremeRouter?

🔄 How It Works

⚡ Quick Start

🛠️ Supported CLI Tools

🌐 Supported Providers

🔐 OAuth Providers

🆓 Free Providers

🔑 API Key Providers (40+)

💡 Key Features

🚀 RTK Token Saver

🧠 Headroom Token Saver

🐴 Ponytail (Lazy Senior Dev)

🎯 Smart 3-Tier Fallback

📊 Real-Time Quota Tracking

🔄 Format Translation

👥 Multi-Account Support

🔄 Auto Token Refresh

🎨 Custom Combos

🐝 Hierarchical Swarm Engine

🩺 Health Monitor + 🔌 Circuit Breaker

🔐 Per-Key Model Access Control (ACL)

🍪 Cookies Providers

🤖 Devin CLI Provider

📝 Request Logging

💾 Cloud Sync

Cloud Runtime Notes

📊 Usage Analytics

🌐 Deploy Anywhere

💰 Pricing at a Glance

📊 Understanding ExtremeRouter Costs & Billing

🎯 Use Cases

Case 1: "I have Claude Pro subscription"

Case 2: "I want zero cost"

Case 3: "I need 24/7 coding, no interruptions"

Case 4: "I want FREE AI in OpenClaw"

❓ Frequently Asked Questions

📖 Setup Guide

Claude Code (Pro/Max)

OpenAI Codex (Plus/Pro)

GitHub Copilot

Cursor IDE

GLM-5.1 / GLM-4.7 (Daily reset, $0.6/1M)

MiniMax M2.7 (5h reset, $0.20/1M)

Kimi K2.5 ($9/month flat)

Kiro AI (Claude 4.5 + GLM-5 + MiniMax FREE)

OpenCode Free (No auth, auto-fetch models)

Vertex AI ($300 free credits for new GCP accounts)

Example 1: Maximize Subscription → Cheap Backup

Example 2: Free-Only (Zero Cost)

Cursor IDE

Claude Code

Codex CLI

OpenClaw

Cline / Continue / RooCode

VPS Deployment

Docker

Environment Variables

Runtime Files and Storage

📊 Available Models

🐛 Troubleshooting

🛠️ Tech Stack

📝 API Reference

Chat Completions

List Models

📧 Support

👥 Contributors

📊 Star Chart

🔀 Forks

🙏 Acknowledgments

📄 License

About

Topics

Resources

License

Uh oh!

Packages