ModelRouter

Save your Claude Pro credits by automatically routing coding tasks to free AI models — transparently, with zero changes to your workflow.

Built with Beads for AI-native issue tracking and Graphify for codebase knowledge — both stay active across every claude-mix session automatically.

Tip: For better session management across AI model switches, install Beads and Graphify — see Development Tooling.

Run claude-mix instead of claude. Everything else stays the same.

How It Works

ModelRouter is a local HTTP proxy that sits between Claude Code and Anthropic's API. Every request is classified by task type and routed to the cheapest model capable of handling it. Only genuinely complex tasks (architecture, security, system design) ever reach Claude Pro.

You type a message in Claude Code
        │
        ▼
┌──────────────────┐
│  ModelRouter     │  ← transparent proxy on localhost:8082
│  Classifier      │  ← matches routing rules (YAML, first-match-wins)
└──────┬───────────┘
       │
       ├─ Trivial / Read / Basic Q   →  Gemma 3 27B  (Google free API/ Locall Ollama, ~1s)
       ├─ Tests / Debugging          →  Codex CLI     (OpenAI free tier)
       ├─ Feature implementation     →  Gemini 2.5 Pro (Google free API)
       ├─ [Fallback chain]           →  tries next backend if one fails
       └─ Architecture / Complex     →  Claude Pro    (your subscription)

Each routing decision is tagged in the response:

`[via: gemma-3-27b (Google) • 1.2s]`

Backends

Backend	Model	Cost	Speed	Used for
Gemma	`gemma-3-27b-it` via Google AI	Free	~1s	Trivial, read, basic questions
Gemini Flash	`gemini-2.5-flash`	Free tier	~2s	Medium tasks, fallback
Gemini Pro	`gemini-2.5-pro`	Free tier	~3s	Feature implementation
Codex CLI	OpenAI Codex	Free tier	~4s	Tests, debugging
Ollama	Any local model	Free (local)	varies	Offline fallback
Claude Pro	`claude-*`	Subscription	~3s	Complex architecture only

Quick Start

Prerequisites

Node.js 18+
Claude Code installed
A free Google AI Studio API key
(Optional) Ollama for offline fallback
Codex CLI (npm install -g @openai/codex)

Install

git clone https://github.com/your-username/ModelRouter
cd ModelRouter
npm install

Configure your Gemini API key

export GOOGLE_API_KEY=your_key_here in .env file.

Get a free key at aistudio.google.com — no billing required.

Register the `claude-mix` command

Windows:

npm link

macOS / Linux:

sudo npm link

Run

# Instead of: claude
claude-mix

# Pass any claude flags normally
claude-mix --model claude-opus-4-7

The router starts automatically in the background. Claude Code is launched with ANTHROPIC_BASE_URL pointing at the local proxy.

Routing Rules

Rules live in config/routing-rules.yaml. They are evaluated top-to-bottom — first match wins.

rules:
  - name: "Trivial"
    target: gemma
    conditions:
      patterns: ["syntax error", "typo", "rename", "format", "indent"]
      max_length: 300
    exclude_patterns: ["architect", "implement", "refactor"]

  - name: "Read & Explain"
    target: gemma
    conditions:
      patterns: ["read", "show me", "explain", "summarize", "what does"]

  - name: "Testing"
    target: codex
    conditions:
      patterns: ["write test", "unit test", "debug", "stack trace"]

  - name: "Features"
    target: gemini-pro
    conditions:
      patterns: ["implement", "create", "build", "add feature", "endpoint"]
      max_length: 2000

  - name: "Complex"
    target: claude
    conditions:
      patterns: ["architect", "design system", "microservice", "security audit"]

Inline override

Force a specific model for any single message:

[use:gemini-pro] refactor this entire module
[use:claude] design the auth system
[use:gemma] what does this function return?

Fallback Chains

If a backend fails (rate limited, offline, error), the router automatically tries the next one:

Primary target	Fallback chain
`gemma`	gemma → ollama → gemini-flash → passthrough
`codex`	codex → gemma → gemini-flash → passthrough
`gemini-flash`	gemini-flash → gemma → gemini-pro → passthrough
`gemini-pro`	gemini-pro → gemini-flash → gemma → passthrough
`claude`	passthrough (direct to Anthropic)

passthrough means the request goes directly to Claude Pro as normal.

Commands

# Start router + Claude Code
claude-mix

# Start router only (background daemon)
node start.js

# Health check
curl http://127.0.0.1:8082/health

# Kill router
npx kill-port 8082 8083

Note: start.js need to be started for calude-mix to route requests.

Environment Variables

Variable	Default	Description
`GOOGLE_API_KEY`	hardcoded in gemini.js	Google AI Studio API key
`ROUTER_PORT`	`8082`	Port for Anthropic proxy
`OLLAMA_HOST`	`http://127.0.0.1:11434`	Ollama server URL
`OLLAMA_MODEL`	`gemma3:1b`	Local Ollama model to use
`OLLAMA_KEEP_ALIVE`	`-1`	Keep model loaded in RAM indefinitely

Project Structure

ModelRouter/
├── bin/
│   ├── claude-mix.js        # Main entry point — starts services + launches Claude Code
│   ├── claude-mix.cmd       # Windows shim
│   └── claude-router.js     # Stats/management CLI
├── src/
│   ├── router-service.js    # HTTP proxy server (port 8082)
│   ├── openai-adapter.js    # OpenAI-format proxy for Codex CLI (port 8083)
│   ├── classifier.js        # Prompt → routing target
│   ├── stats.js             # Usage tracking
│   └── connectors/
│       ├── gemini.js        # Google Gemini + Gemma API
│       ├── ollama.js        # Local Ollama
│       └── codex.js         # Codex CLI via stdin
├── config/
│   └── routing-rules.yaml   # Routing rules (editable)
├── start.js                 # Starts both proxy servers
└── package.json

Usage Stats

Stats are saved to ~/.claude-router/usage-stats.json. View a summary:

node -e "const s=require('./src/stats'); console.log(JSON.stringify(s.summary(7*24*60*60*1000), null, 2))"

Example output:

{
  "total": 312,
  "byTarget": {
    "gemma→gemma": 187,
    "Testing→codex": 43,
    "Features→gemini-pro": 61,
    "claude": 21
  }
}

Session Context Across Models

Claude Code sends the full conversation history with every request. ModelRouter does not need to manage sessions — it reads the history Claude Code provides and passes a tailored window to each backend:

Gemma / Ollama: last 6–8 messages, truncated to 400 chars each
Gemini: last 8 messages, 500-char system prompt
Claude: full history, no truncation

This means context is preserved across model switches within a session. Older turns are truncated for cheaper models but the semantic continuity is maintained through Claude Code's accumulated history.

FAQ

Q: Does claude still work normally?
A: Yes. claude-mix is a separate command. claude is untouched.

Q: What if I don't have Ollama?
A: Ollama is optional. The fallback chain skips it and goes to Gemini if Ollama is unavailable.

Q: Is my code sent to multiple providers?
A: No. Each request goes to exactly ONE backend — whichever the classifier selects.

Q: What if Gemini rate limits me?
A: The fallback chain automatically tries the next backend. You won't notice.

Q: Can I add my own routing rules?
A: Yes — edit config/routing-rules.yaml. Rules are hot-reloaded on each request.

Q: Does this work on macOS/Linux?
A: Yes. The Codex connector uses cmd /c only on Windows; macOS/Linux uses the binary directly.

Development Tooling

These tools are optional. The router works with just npm install. Beads and Graphify are for contributors who want the same AI-native development workflow used to build this project.

This project is built and maintained using two AI-native dev tools that work inside claude-mix sessions:

Beads — Issue Tracker

Beads is a native binary (not an npm package) — install it separately:

# Install Beads CLI
# Download the latest release for your OS from:
# https://github.com/badlogic/beads/releases

# Then initialise in the project root
cd ModelRouter
bd init

All tasks and bugs are tracked with bd:

bd ready                        # see available work
bd create --title="..." --type=feature --priority=2
bd update <id> --claim          # start working
bd close <id>                   # mark done
bd remember "insight to keep"   # persist knowledge across sessions

Beads is automatically primed on every claude-mix session via the UserPromptSubmit hook in ~/.claude/settings.json.

Graphify — Code Knowledge Graph

Graphify is a Python package — install it separately:

pip install git+https://github.com/safishamsi/graphify

Then build the knowledge graph:

cd ModelRouter
python3 -c "from graphify.watch import _rebuild_code; from pathlib import Path; _rebuild_code(Path('.'))"

The graph output lives in graphify-out/. Claude reads graphify-out/GRAPH_REPORT.md automatically before architecture decisions.

Both tools are injected via --append-system-prompt in claude-mix so they stay active regardless of context compaction.

Contributing

Contributions are welcome! Here's how to get started:

Fork the repo and clone it locally
Discuss first — open a GitHub issue or a beads issue (bd create) before starting work on large changes
Make your change — routing rules in config/routing-rules.yaml, connectors in src/connectors/, classifier logic in src/classifier.js
Test it — run node start.js and send a test request via curl (see Commands)
Submit a PR with a clear description of what changed and why

Good first contributions:

Add a new routing rule to config/routing-rules.yaml
Add a new backend connector in src/connectors/
Improve response time or token usage in an existing connector
macOS/Linux fixes (most dev was done on Windows)

Author

Girish Sahu — girish.sahu@gmail.com

License

MIT — see LICENSE

Credits

Created by Girish Sahu.

Built with:

Ollama — local model serving
Google Gemini API — free hosted inference
OpenAI Codex CLI — free coding assistant
Beads — AI-native issue tracker
Graphify — codebase knowledge graph
Claude Code — the best coding assistant, used where it matters

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.beads		.beads
.claude		.claude
bin		bin
config		config
src		src
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RULES.md		RULES.md
package-lock.json		package-lock.json
package.json		package.json
start.js		start.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelRouter

How It Works

Backends

Quick Start

Prerequisites

Install

Configure your Gemini API key

Register the `claude-mix` command

Run

Routing Rules

Inline override

Fallback Chains

Commands

Environment Variables

Project Structure

Usage Stats

Session Context Across Models

FAQ

Development Tooling

Beads — Issue Tracker

Graphify — Code Knowledge Graph

Contributing

Author

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ModelRouter

How It Works

Backends

Quick Start

Prerequisites

Install

Configure your Gemini API key

Register the claude-mix command

Run

Routing Rules

Inline override

Fallback Chains

Commands

Environment Variables

Project Structure

Usage Stats

Session Context Across Models

FAQ

Development Tooling

Beads — Issue Tracker

Graphify — Code Knowledge Graph

Contributing

Author

License

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Register the `claude-mix` command

Packages