perf(core): skip model routing classification when redundant#25554
perf(core): skip model routing classification when redundant#25554
Conversation
Introduces a new `model.gemma4Variant` setting that allows users to optionally redirect all requests destined for `gemini-pro` and `gemini-flash` (and their related aliases) to the selected Gemma 4 variant (`gemma-4-26b-a4b-it` or `gemma-4-31b-it`). The router model (`flash-lite`) remains unaffected.
|
Hi @akh64bit, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this. We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines. Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed. Thank you for your understanding and for being a part of our community! |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a performance optimization for the model routing service by detecting and bypassing unnecessary classification steps when model tiers resolve to identical targets. Additionally, it expands the CLI's capabilities by adding support for routing requests to specific Gemma 4 model variants, complete with necessary configuration updates, documentation, and validation tests. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Size Change: +8.02 kB (+0.02%) Total Size: 33.6 MB
ℹ️ View Unchanged
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for Gemma 4 models (gemma-4-26b-a4b-it and gemma-4-31b-it) by implementing a routing mechanism that redirects Gemini Pro and Flash requests to a user-configured Gemma 4 variant. The implementation includes updates to the configuration schema, model resolution logic, and documentation. Furthermore, the classifier routing strategies were optimized to skip redundant classification steps when both Pro and Flash tiers resolve to the same model. I have no feedback to provide.
Fixes an issue where the CLI hangs on 'Thinking...' for models (like Gemma 4) that return thought text in the 'thought' field instead of 'text'. Also updates Gemma 4 model definitions to accurately reflect their 'thinking' capabilities.
🛑 Action Required: Evaluation ApprovalSteering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged. Maintainers:
Once approved, the evaluation results will be posted here automatically. |
Summary
This PR introduces an optimization in the
ModelRouterServiceto skip the lightweight model classification step when both theproandflashtiers resolve to the same underlying model. This happens, for example, when the user overrides both tiers using settings likemodel.gemma4Variant. Skipping the redundant API call noticeably improves the Time To First Token (TTFT) for all requests in this scenario.Details
The
ClassifierStrategyandGemmaClassifierStrategynow identify the resolved model for both theproandflashtiers prior to calling the classification LLM. If the two resolved models match, the strategy takes a fast-path, returning the resolved model immediately with zero latency and reasoning indicating the classification was skipped.Related Issues
How to Validate
.gemini/settings.jsonto route to a specific model across both tiers. e.g., Set"gemma4Variant": "gemma-4-31b-it".gemini "hello".Classifier(orGemmaClassifier) source should indicate it skipped classification.classifierStrategy.test.tsandgemmaClassifierStrategy.test.tsto assert this behavior.Pre-Merge Checklist