💡 Complexity-Aware Dynamic Model Routing for Cost-Optimized PR Review #564
Replies: 1 comment
-
|
Cross-link. Related to canonical model-selection discussion #413 and tracking issue #553 (capability/cost-aware routing across dev-lead + pr-review, gated on #195). Keep this proposal's routing direction consolidated with that work. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Implement a PR complexity scorer in the review pipeline that analyzes PR characteristics (lines changed, file count, language diversity, security-label presence, test coverage delta) before model selection. Route simple PRs to cheaper model tiers and reserve expensive models (Opus, Fable) for genuinely complex reviews. Industry data shows 50-80% cost reduction from complexity-based routing.
Market Signal
LLM model routing is a rapidly maturing practice in 2026. Industry research shows well-designed routing systems can outperform even the strongest individual models while reducing costs 50-80% (Model Routing Strategies 2026). Anthropic's adaptive thinking in Opus 4.8 does per-turn reasoning calibration internally. Multiple frameworks (MindStudio, Burnwise) now offer production-grade routing. The typical implementation is 50-100 lines of code with a cost-aware router that picks the cheapest model likely to succeed, escalating only on low-confidence results.
User Signal
engine.shimplements basic tier routing (triage→deep→audit) but the initial tier selection is static — every PR gets the same model at each stage regardless of complexity. Issue #553 requests "capability-aware selection" specifically. The Token Cost Observatory (#332, #464) shows cost is actively monitored, suggesting appetite for optimization. The engine's model chain architecture (CLAUDE_TRIAGE_MODEL_CHAIN, etc.) was designed for exactly this kind of dynamic selection.Technical Opportunity
engine.sh'sset_engine_config()defines per-tier model chains but tier assignment inreview-one-pr.shis fixed. A complexity scorer in the preflight step could set environment variables (e.g.,PR_COMPLEXITY=simple|medium|complex|critical) thatengine.shuses to adjust model selection:The scoring function itself is deterministic shell — ~50-100 lines, no LLM needed. Explicit safety floors prevent dangerous under-routing.
Assessment
Adversarial Review
Strongest objection: Simple heuristics (line count, file count) may not capture actual review difficulty. A subtle security vulnerability in a 3-line change would be routed to Haiku and missed, creating a false sense of security.
Rebuttal: The multi-tier review pipeline already guards against this — triage (Haiku) examines every PR regardless and flags complexity signals for the deep tier. The routing optimizer would adjust the deep/audit tier model selection, not skip tiers entirely. Adding explicit floors ("never route security-labeled PRs below Sonnet", "never skip audit for infrastructure files") prevents the worst case. The existing pipeline preserves defense-in-depth; complexity routing optimizes within it, not around it.
Suggested Next Step
Define a PR complexity heuristic in shell that scores PRs on 5-6 dimensions (lines changed, file count, language count, security labels, CI config touched, agent self-modification). Prototype in a branch, test against recent PR data from
merged_prs_30dto validate the scoring correlates with actual review effort and token consumption.Beta Was this translation helpful? Give feedback.
All reactions