Save your Claude Pro credits by automatically routing coding tasks to free AI models — transparently, with zero changes to your workflow.
Built with Beads for AI-native issue tracking and Graphify for codebase knowledge — both stay active across every claude-mix session automatically.
Tip: For better session management across AI model switches, install Beads and Graphify — see Development Tooling.
Run
claude-mixinstead ofclaude. Everything else stays the same.
ModelRouter is a local HTTP proxy that sits between Claude Code and Anthropic's API. Every request is classified by task type and routed to the cheapest model capable of handling it. Only genuinely complex tasks (architecture, security, system design) ever reach Claude Pro.
You type a message in Claude Code
│
▼
┌──────────────────┐
│ ModelRouter │ ← transparent proxy on localhost:8082
│ Classifier │ ← matches routing rules (YAML, first-match-wins)
└──────┬───────────┘
│
├─ Trivial / Read / Basic Q → Gemma 3 27B (Google free API/ Locall Ollama, ~1s)
├─ Tests / Debugging → Codex CLI (OpenAI free tier)
├─ Feature implementation → Gemini 2.5 Pro (Google free API)
├─ [Fallback chain] → tries next backend if one fails
└─ Architecture / Complex → Claude Pro (your subscription)
Each routing decision is tagged in the response:
`[via: gemma-3-27b (Google) • 1.2s]`
| Backend | Model | Cost | Speed | Used for |
|---|---|---|---|---|
| Gemma | gemma-3-27b-it via Google AI |
Free | ~1s | Trivial, read, basic questions |
| Gemini Flash | gemini-2.5-flash |
Free tier | ~2s | Medium tasks, fallback |
| Gemini Pro | gemini-2.5-pro |
Free tier | ~3s | Feature implementation |
| Codex CLI | OpenAI Codex | Free tier | ~4s | Tests, debugging |
| Ollama | Any local model | Free (local) | varies | Offline fallback |
| Claude Pro | claude-* |
Subscription | ~3s | Complex architecture only |
- Node.js 18+
- Claude Code installed
- A free Google AI Studio API key
- (Optional) Ollama for offline fallback
- Codex CLI (
npm install -g @openai/codex)
git clone https://github.com/your-username/ModelRouter
cd ModelRouter
npm installexport GOOGLE_API_KEY=your_key_here in .env file.
Get a free key at aistudio.google.com — no billing required.
Windows:
npm linkmacOS / Linux:
sudo npm link# Instead of: claude
claude-mix
# Pass any claude flags normally
claude-mix --model claude-opus-4-7The router starts automatically in the background. Claude Code is launched with ANTHROPIC_BASE_URL pointing at the local proxy.
Rules live in config/routing-rules.yaml. They are evaluated top-to-bottom — first match wins.
rules:
- name: "Trivial"
target: gemma
conditions:
patterns: ["syntax error", "typo", "rename", "format", "indent"]
max_length: 300
exclude_patterns: ["architect", "implement", "refactor"]
- name: "Read & Explain"
target: gemma
conditions:
patterns: ["read", "show me", "explain", "summarize", "what does"]
- name: "Testing"
target: codex
conditions:
patterns: ["write test", "unit test", "debug", "stack trace"]
- name: "Features"
target: gemini-pro
conditions:
patterns: ["implement", "create", "build", "add feature", "endpoint"]
max_length: 2000
- name: "Complex"
target: claude
conditions:
patterns: ["architect", "design system", "microservice", "security audit"]Force a specific model for any single message:
[use:gemini-pro] refactor this entire module
[use:claude] design the auth system
[use:gemma] what does this function return?
If a backend fails (rate limited, offline, error), the router automatically tries the next one:
| Primary target | Fallback chain |
|---|---|
gemma |
gemma → ollama → gemini-flash → passthrough |
codex |
codex → gemma → gemini-flash → passthrough |
gemini-flash |
gemini-flash → gemma → gemini-pro → passthrough |
gemini-pro |
gemini-pro → gemini-flash → gemma → passthrough |
claude |
passthrough (direct to Anthropic) |
passthrough means the request goes directly to Claude Pro as normal.
# Start router + Claude Code
claude-mix
# Start router only (background daemon)
node start.js
# Health check
curl http://127.0.0.1:8082/health
# Kill router
npx kill-port 8082 8083Note: start.js need to be started for calude-mix to route requests.
| Variable | Default | Description |
|---|---|---|
GOOGLE_API_KEY |
hardcoded in gemini.js | Google AI Studio API key |
ROUTER_PORT |
8082 |
Port for Anthropic proxy |
OLLAMA_HOST |
http://127.0.0.1:11434 |
Ollama server URL |
OLLAMA_MODEL |
gemma3:1b |
Local Ollama model to use |
OLLAMA_KEEP_ALIVE |
-1 |
Keep model loaded in RAM indefinitely |
ModelRouter/
├── bin/
│ ├── claude-mix.js # Main entry point — starts services + launches Claude Code
│ ├── claude-mix.cmd # Windows shim
│ └── claude-router.js # Stats/management CLI
├── src/
│ ├── router-service.js # HTTP proxy server (port 8082)
│ ├── openai-adapter.js # OpenAI-format proxy for Codex CLI (port 8083)
│ ├── classifier.js # Prompt → routing target
│ ├── stats.js # Usage tracking
│ └── connectors/
│ ├── gemini.js # Google Gemini + Gemma API
│ ├── ollama.js # Local Ollama
│ └── codex.js # Codex CLI via stdin
├── config/
│ └── routing-rules.yaml # Routing rules (editable)
├── start.js # Starts both proxy servers
└── package.json
Stats are saved to ~/.claude-router/usage-stats.json. View a summary:
node -e "const s=require('./src/stats'); console.log(JSON.stringify(s.summary(7*24*60*60*1000), null, 2))"Example output:
{
"total": 312,
"byTarget": {
"gemma→gemma": 187,
"Testing→codex": 43,
"Features→gemini-pro": 61,
"claude": 21
}
}Claude Code sends the full conversation history with every request. ModelRouter does not need to manage sessions — it reads the history Claude Code provides and passes a tailored window to each backend:
- Gemma / Ollama: last 6–8 messages, truncated to 400 chars each
- Gemini: last 8 messages, 500-char system prompt
- Claude: full history, no truncation
This means context is preserved across model switches within a session. Older turns are truncated for cheaper models but the semantic continuity is maintained through Claude Code's accumulated history.
Q: Does claude still work normally?
A: Yes. claude-mix is a separate command. claude is untouched.
Q: What if I don't have Ollama?
A: Ollama is optional. The fallback chain skips it and goes to Gemini if Ollama is unavailable.
Q: Is my code sent to multiple providers?
A: No. Each request goes to exactly ONE backend — whichever the classifier selects.
Q: What if Gemini rate limits me?
A: The fallback chain automatically tries the next backend. You won't notice.
Q: Can I add my own routing rules?
A: Yes — edit config/routing-rules.yaml. Rules are hot-reloaded on each request.
Q: Does this work on macOS/Linux?
A: Yes. The Codex connector uses cmd /c only on Windows; macOS/Linux uses the binary directly.
These tools are optional. The router works with just
npm install. Beads and Graphify are for contributors who want the same AI-native development workflow used to build this project.
This project is built and maintained using two AI-native dev tools that work inside claude-mix sessions:
Beads — Issue Tracker
Beads is a native binary (not an npm package) — install it separately:
# Install Beads CLI
# Download the latest release for your OS from:
# https://github.com/badlogic/beads/releases
# Then initialise in the project root
cd ModelRouter
bd initAll tasks and bugs are tracked with bd:
bd ready # see available work
bd create --title="..." --type=feature --priority=2
bd update <id> --claim # start working
bd close <id> # mark done
bd remember "insight to keep" # persist knowledge across sessionsBeads is automatically primed on every claude-mix session via the UserPromptSubmit hook in ~/.claude/settings.json.
Graphify — Code Knowledge Graph
Graphify is a Python package — install it separately:
pip install git+https://github.com/safishamsi/graphifyThen build the knowledge graph:
cd ModelRouter
python3 -c "from graphify.watch import _rebuild_code; from pathlib import Path; _rebuild_code(Path('.'))"The graph output lives in graphify-out/. Claude reads graphify-out/GRAPH_REPORT.md automatically before architecture decisions.
Both tools are injected via --append-system-prompt in claude-mix so they stay active regardless of context compaction.
Contributions are welcome! Here's how to get started:
- Fork the repo and clone it locally
- Discuss first — open a GitHub issue or a beads issue (
bd create) before starting work on large changes - Make your change — routing rules in
config/routing-rules.yaml, connectors insrc/connectors/, classifier logic insrc/classifier.js - Test it — run
node start.jsand send a test request viacurl(see Commands) - Submit a PR with a clear description of what changed and why
Good first contributions:
- Add a new routing rule to
config/routing-rules.yaml - Add a new backend connector in
src/connectors/ - Improve response time or token usage in an existing connector
- macOS/Linux fixes (most dev was done on Windows)
Girish Sahu — girish.sahu@gmail.com
MIT — see LICENSE
Created by Girish Sahu.
Built with:
- Ollama — local model serving
- Google Gemini API — free hosted inference
- OpenAI Codex CLI — free coding assistant
- Beads — AI-native issue tracker
- Graphify — codebase knowledge graph
- Claude Code — the best coding assistant, used where it matters