Skip to content

stmailabs/claude-multi-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

claude-multi-model

Parallel multi-provider CLI dispatcher for Claude Code plugins. Dispatches prompts to codex, gemini, and cursor-agent in parallel, routing each model to its native CLI when possible (separate rate-limit buckets) and falling back to cursor-agent for everything else. Includes a 3-stage peer-review "council" pattern inspired by karpathy/llm-council, adapted for CLI providers with strict JSON parsing.

No API keys required. The library talks to CLI tools (codex, gemini, cursor-agent) which carry their own OAuth flows. There's no OPENAI_API_KEY or ANTHROPIC_API_KEY anywhere.

No Claude Code plugin. This is a Python library with three console scripts. It's designed to be consumed as a dependency by other Python-based Claude Code plugins (e.g. gw, aisci) via uv pip install.

Rate-bucket spreading. When all three CLIs are installed, a 3-way panel dispatches across 3 independent rate buckets: Codex/OpenAI OAuth + Google AI OAuth + Cursor subscription. One hot bucket doesn't throttle the rest.

Install

Via uv from git:

uv pip install "claude-multi-model @ git+https://github.com/stmailabs/claude-multi-model.git"

Or add it as a dependency in a consuming project's pyproject.toml:

dependencies = [
    "claude-multi-model @ git+https://github.com/stmailabs/claude-multi-model.git",
]

Provider CLIs

The library needs at least one of these installed on $PATH:

CLI Install Native models OAuth
codex npm install -g @openai/codex GPT-5.x, o3, o4, Codex family codex login
gemini npm install -g @google/gemini-cli Gemini 3.x family gemini auth
cursor-agent cursor.com — install Cursor, accept CLI prompt 85 models across 7 families (Claude, GPT, Gemini, Grok, Kimi, Composer) Cursor subscription

Check what you have:

uv run mm-detect

Output:

  ✓ codex          codex-cli 0.118.0
        path: /Users/c/.npm-global/bin/codex
  ✓ gemini         0.37.0
        path: /Users/c/.npm-global/bin/gemini
  ✓ cursor-agent   2026.04.08-a41fba1
        path: /Users/c/.local/bin/cursor-agent

Routing philosophy

GPT / Codex family → codex exec      (OpenAI OAuth bucket)
Gemini family      → gemini -p        (Google OAuth bucket)
Grok / Kimi / Composer / anything else → cursor-agent (Cursor bucket)

Claude family → [REFUSED by default]
                From inside Claude Code, use the Agent tool instead.
                Opt-in override: pass --allow-cursor-claude to route via cursor.

Why Claude models are refused: inside Claude Code, you have the Agent tool primitive that spawns a Claude subagent in-process (no subprocess, no auth, shared session budget, structured return value). Routing Claude through cursor-agent would add a subprocess spawn, a separate rate bucket, and output parsing. If the caller really wants that — maybe to use cursor-agent as a second Claude source for parallelism — they pass --allow-cursor-claude and acknowledge the trade-off.

Why cursor-agent is the fallback for everything else: cursor-agent exposes 85 models from 7 provider families via a single CLI with native JSON output. It's the widest reach for any model we can't route to a direct CLI.

Python API

Low-level: parallel dispatch

from multi_model import dispatch

responses = dispatch(
    prompt="Verify these 30 citations and return JSON: [...]",
    models=["gpt-5.4-high", "gemini-3.1-pro", "grok-4-20-thinking"],
    timeout=300,
)

for r in responses:
    print(f"{r.model} via {r.cli_used}: {r.text[:80]}...")
    # All fields: model, text, cli_used, exit_code, duration_s, error, metadata

Dispatch runs the three models in parallel through three different CLIs (and three separate rate buckets) via concurrent.futures.ThreadPoolExecutor. Results come back in the same order as the input models list.

High-level: 3-stage council

from multi_model.council import run_council

result = run_council(
    prompt="What's the best approach to verifying citations in a grant proposal?",
    panel=["gpt-5.4-high", "gemini-3.1-pro", "grok-4-20-thinking"],
    chairman="gpt-5.4-high",
)

print(result.stage3_synthesis)         # Chairman's final synthesized answer
print(result.aggregate_ranks)           # {model: avg_peer_rank_position}
print(result.stage2_rankings)           # {reviewer: [model1, model2, ...]}
print(result.stage1_responses)          # {model: text}

The council protocol:

  1. Stage 1 — Parallel Open Answer. All panel models answer the prompt independently.
  2. Stage 2 — Blind Peer Review. Responses are anonymized as "Response A", "Response B", etc. Each reviewer sees the full anonymized set and ranks them best-to-worst, returning structured JSON. The anonymization prevents models from favoring their own responses.
  3. Stage 3 — Chairman Synthesis. A designated chairman model receives the de-anonymized stage 1 responses + stage 2 rankings and produces a single final answer.

Aggregate rankings are computed across all reviewers: each model's average rank position across every reviewer, lower = better peer-perceived quality.

Polymorphic dispatch for Claude-aware skills

If you're writing a Claude Code skill that wants to mix Claude workers (via the Agent tool) with external workers (via this library), inject a custom dispatch_fn:

from multi_model.council import run_council, run_stage1, run_stage2, run_stage3
from multi_model.types import Response

def skill_dispatch(prompt: str, models: list[str]) -> list[Response]:
    """Hybrid dispatch: Claude via Agent tool, others via multi_model."""
    from multi_model import dispatch
    claude_models = [m for m in models if "claude" in m.lower()]
    other_models = [m for m in models if "claude" not in m.lower()]

    # Claude via Agent tool (skill-side, not in this function — see below)
    claude_responses = [
        Response(
            model=m,
            text=agent_tool_result_for(m),  # filled in by the skill orchestrator
            cli_used="agent-tool",
            exit_code=0,
            duration_s=...,
        )
        for m in claude_models
    ]

    # Other providers via mm-ask
    other_responses = dispatch(prompt, other_models) if other_models else []

    return claude_responses + other_responses

result = run_council(
    prompt="...",
    panel=["claude-opus", "gpt-5.4-high", "gemini-3.1-pro"],
    chairman="claude-opus",
    dispatch_fn=skill_dispatch,
)

The dispatch_fn hook is how gw and aisci skills combine in-process Agent tool workers with subprocess CLI workers.

CLI reference

mm-ask — parallel multi-model dispatch

uv run mm-ask \
    --models gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
    --prompt "Verify these citations: ..." \
    --output results.json \
    --timeout 300

Flags:

  • --models a,b,c — comma-separated model IDs (required)
  • --prompt TEXT / --prompt-file FILE / stdin — prompt input (exactly one)
  • --output FILE / stdout — JSON output destination
  • --timeout N — per-model timeout seconds (default 300)
  • --allow-cursor-claude — opt-in to route Claude via cursor-agent
  • --verbose / -v — print routing summary to stderr

Exit codes: 0 = ok, 2 = bad args, 3 = Claude rejected, 4 = unreachable model, 5 = all dispatches failed.

mm-council — 3-stage council

uv run mm-council \
    --panel gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking \
    --chairman gpt-5.4-high \
    --prompt-file prompt.txt \
    --output council.json

Flags:

  • --panel a,b,c — panel models (default: gpt-5.4-high,gemini-3.1-pro,grok-4-20-thinking)
  • --chairman M — stage-3 synthesizer (default: gpt-5.4-high)
  • --prompt / --prompt-file / stdin
  • --output FILE — JSON output
  • --timeout N — per-model timeout
  • --allow-cursor-claude, --verbose

Output JSON has the full CouncilResult.to_dict() shape: stage1_responses, stage2_rankings, aggregate_ranks, stage3_synthesis, chairman_model, metadata.

mm-detect — show installed CLIs

uv run mm-detect                    # human-readable
uv run mm-detect --json             # JSON
uv run mm-detect --check-auth       # also probe auth state (slower)

Architecture

src/multi_model/
├── __init__.py              # public API
├── types.py                 # Response dataclass
├── constants.py             # DEFAULT_PANEL, DEFAULT_CHAIRMAN, DEFAULT_TIMEOUT
├── detect.py                # CLI availability + auth probe
├── routing.py               # model → CLI routing rules
├── dispatch.py              # ThreadPoolExecutor parallel dispatch
├── council.py               # 3-stage council pattern
├── cli.py                   # mm-ask, mm-council, mm-detect entry points
└── providers/
    ├── __init__.py
    ├── _subprocess.py       # shared subprocess wrapper
    ├── codex.py             # codex exec dispatcher + stdout cleaner
    ├── gemini.py             # gemini -p dispatcher + ANSI/noise strip
    └── cursor_agent.py      # cursor-agent --print + JSON parser

Total core library: ~1,200 lines of Python, stdlib-only (no runtime dependencies). Compare with claude-octopus at ~14,400 lines of bash for the same "parallel multi-provider dispatch" feature.

Why not claude-octopus?

We evaluated forking or depending on claude-octopus and concluded it was the wrong shape for this use case:

  • Footprint: octopus is ~14,400 lines of bash across 52 lib/ files with 855 source files total. We use ~4% of its surface area (the dispatch primitives from lib/dispatch.sh and lib/workflows.sh).
  • Tight coupling: the core probe_single_agent() function depends on 19 helper functions spanning 15 other bash files, plus ~30 global env vars. Extracting it cleanly is impractical.
  • Interactive gates: /octo:research blocks on AskUserQuestion asking "how thorough?" before dispatching — incompatible with headless/CI pipelines.
  • Mandatory banners: every invocation emits 🐙 CLAUDE OCTOPUS ACTIVATED headers.
  • Output location: octopus writes to ~/.claude-octopus/debates/<session>/ — not where our consumers want their state.
  • Plugin coupling: consumers invoke /octo:debate as a slash command, passing prompts through free-form text and parsing free-form responses. A Python library + import is strictly cleaner for Python consumers.

In contrast, claude-multi-model:

  • 1,200 lines of Python, stdlib-only, easy to read
  • Zero interactive gates, fully headless
  • Outputs land wherever the caller writes them
  • Clean Python API with typed dataclasses + CLI entry points
  • Ships as a regular Python dependency via uv pip install

Acknowledgments

The 3-stage council pattern (parallel answer → blind peer review → chairman synthesis) is adapted from karpathy/llm-council. We re-implement the same idea with CLI providers instead of OpenRouter, stricter JSON parsing instead of regex-matching on "FINAL RANKING:" text markers, and polymorphic dispatch so skills can mix in-process Agent tool calls with subprocess dispatches.

The provider CLI knowledge (especially codex exec flags, gemini stdin piping, auth retry patterns, and macOS keychain workarounds) was informed by reading claude-octopus's bash source. We don't copy any code, but octopus's well-commented dispatch logic is a useful reference for the sharp edges of each CLI.

License

MIT. See LICENSE.

About

Parallel multi-provider CLI dispatcher for Claude Code plugins. codex + gemini + cursor-agent, with a 3-stage peer-review council pattern. No API keys, no Claude Code plugin — just a Python library + CLI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors