Skip to content

perf(core): skip model routing classification when redundant#25554

Open
akh64bit wants to merge 6 commits intomainfrom
redirect-to-gemma4
Open

perf(core): skip model routing classification when redundant#25554
akh64bit wants to merge 6 commits intomainfrom
redirect-to-gemma4

Conversation

@akh64bit
Copy link
Copy Markdown
Contributor

Summary

This PR introduces an optimization in the ModelRouterService to skip the lightweight model classification step when both the pro and flash tiers resolve to the same underlying model. This happens, for example, when the user overrides both tiers using settings like model.gemma4Variant. Skipping the redundant API call noticeably improves the Time To First Token (TTFT) for all requests in this scenario.

Details

The ClassifierStrategy and GemmaClassifierStrategy now identify the resolved model for both the pro and flash tiers prior to calling the classification LLM. If the two resolved models match, the strategy takes a fast-path, returning the resolved model immediately with zero latency and reasoning indicating the classification was skipped.

Related Issues

How to Validate

  1. Configure .gemini/settings.json to route to a specific model across both tiers. e.g., Set "gemma4Variant": "gemma-4-31b-it".
  2. Run a command like gemini "hello".
  3. Check the debug logs or DevTools. The routing step should reflect zero latency and the Classifier (or GemmaClassifier) source should indicate it skipped classification.
  4. Tests have been added to classifierStrategy.test.ts and gemmaClassifierStrategy.test.ts to assert this behavior.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

Introduces a new `model.gemma4Variant` setting that allows users to optionally
redirect all requests destined for `gemini-pro` and `gemini-flash` (and their
related aliases) to the selected Gemma 4 variant (`gemma-4-26b-a4b-it` or
`gemma-4-31b-it`). The router model (`flash-lite`) remains unaffected.
@akh64bit akh64bit requested review from a team as code owners April 16, 2026 18:58
@gemini-cli
Copy link
Copy Markdown
Contributor

gemini-cli bot commented Apr 16, 2026

Hi @akh64bit, thank you so much for your contribution to Gemini CLI! We really appreciate the time and effort you've put into this.

We're making some updates to our contribution process to improve how we track and review changes. Please take a moment to review our recent discussion post: Improving Our Contribution Process & Introducing New Guidelines.

Key Update: Starting January 26, 2026, the Gemini CLI project will require all pull requests to be associated with an existing issue. Any pull requests not linked to an issue by that date will be automatically closed.

Thank you for your understanding and for being a part of our community!

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a performance optimization for the model routing service by detecting and bypassing unnecessary classification steps when model tiers resolve to identical targets. Additionally, it expands the CLI's capabilities by adding support for routing requests to specific Gemma 4 model variants, complete with necessary configuration updates, documentation, and validation tests.

Highlights

  • Performance Optimization: Introduced a fast-path in ModelRouterService to skip redundant model classification when both pro and flash tiers resolve to the same model, significantly improving Time To First Token (TTFT).
  • Gemma 4 Support: Added support for routing requests to Gemma 4 models (gemma-4-26b-a4b-it and gemma-4-31b-it) via the model.gemma4Variant configuration setting.
  • Configuration and Documentation: Updated settings schema, CLI configuration, and documentation to support the new Gemma 4 routing options.
  • Testing: Added comprehensive integration and unit tests to verify the new routing logic and the classification skip optimization.
Ignored Files
  • Ignored by pattern: .gemini/** (1)
    • .gemini/settings.json
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 16, 2026

Size Change: +8.02 kB (+0.02%)

Total Size: 33.6 MB

Filename Size Change
./bundle/chunk-B4ZHRZIJ.js 0 B -3.8 kB (removed) 🏆
./bundle/chunk-H6CYASTD.js 0 B -3.43 MB (removed) 🏆
./bundle/chunk-HA6367YL.js 0 B -14.5 MB (removed) 🏆
./bundle/chunk-LV2OUWFA.js 0 B -1.97 MB (removed) 🏆
./bundle/core-7A6JUEEV.js 0 B -46.7 kB (removed) 🏆
./bundle/devtoolsService-EW7GFBVS.js 0 B -28.4 kB (removed) 🏆
./bundle/gemini-ZHRKSWKT.js 0 B -553 kB (removed) 🏆
./bundle/interactiveCli-EGAYNZZX.js 0 B -1.29 MB (removed) 🏆
./bundle/oauth2-provider-BPIBGFXK.js 0 B -9.16 kB (removed) 🏆
./bundle/chunk-CD4XTC63.js 1.97 MB +1.97 MB (new file) 🆕
./bundle/chunk-DL5RXHDU.js 14.5 MB +14.5 MB (new file) 🆕
./bundle/chunk-INX4K5LW.js 3.8 kB +3.8 kB (new file) 🆕
./bundle/chunk-RMJEFT2R.js 3.43 MB +3.43 MB (new file) 🆕
./bundle/core-M4AIFNNX.js 46.8 kB +46.8 kB (new file) 🆕
./bundle/devtoolsService-U4MOSHLI.js 28.4 kB +28.4 kB (new file) 🆕
./bundle/gemini-RZLRC7XS.js 553 kB +553 kB (new file) 🆕
./bundle/interactiveCli-P5S75ZY2.js 1.29 MB +1.29 MB (new file) 🆕
./bundle/oauth2-provider-H6BBWTCF.js 9.16 kB +9.16 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size Change
./bundle/bundled/third_party/index.js 8 MB 0 B
./bundle/chunk-34MYV7JD.js 2.45 kB 0 B
./bundle/chunk-5AUYMPVF.js 858 B 0 B
./bundle/chunk-5PS3AYFU.js 1.18 kB 0 B
./bundle/chunk-664ZODQF.js 124 kB 0 B
./bundle/chunk-DAHVX5MI.js 206 kB 0 B
./bundle/chunk-IUUIT4SU.js 56.5 kB 0 B
./bundle/chunk-RJTRUG2J.js 39.8 kB 0 B
./bundle/cleanup-IX5GZ2QQ.js 0 B -932 B (removed) 🏆
./bundle/devtools-36NN55EP.js 696 kB 0 B
./bundle/dist-T73EYRDX.js 356 B 0 B
./bundle/events-XB7DADIJ.js 418 B 0 B
./bundle/examples/hooks/scripts/on-start.js 188 B 0 B
./bundle/examples/mcp-server/example.js 1.43 kB 0 B
./bundle/gemini.js 4.97 kB 0 B
./bundle/getMachineId-bsd-TXG52NKR.js 1.55 kB 0 B
./bundle/getMachineId-darwin-7OE4DDZ6.js 1.55 kB 0 B
./bundle/getMachineId-linux-SHIFKOOX.js 1.34 kB 0 B
./bundle/getMachineId-unsupported-5U5DOEYY.js 1.06 kB 0 B
./bundle/getMachineId-win-6KLLGOI4.js 1.72 kB 0 B
./bundle/memoryDiscovery-CMNXJICE.js 0 B -980 B (removed) 🏆
./bundle/multipart-parser-KPBZEGQU.js 11.7 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 222 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 229 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 13.4 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B
./bundle/src-QVCVGIUX.js 47 kB 0 B
./bundle/tree-sitter-7U6MW5PS.js 274 kB 0 B
./bundle/tree-sitter-bash-34ZGLXVX.js 1.84 MB 0 B
./bundle/cleanup-UGAIL5OE.js 932 B +932 B (new file) 🆕
./bundle/memoryDiscovery-44SKOJDH.js 980 B +980 B (new file) 🆕

compressed-size-action

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Gemma 4 models (gemma-4-26b-a4b-it and gemma-4-31b-it) by implementing a routing mechanism that redirects Gemini Pro and Flash requests to a user-configured Gemma 4 variant. The implementation includes updates to the configuration schema, model resolution logic, and documentation. Furthermore, the classifier routing strategies were optimized to skip redundant classification steps when both Pro and Flash tiers resolve to the same model. I have no feedback to provide.

@gemini-cli gemini-cli bot added the priority/p1 Important and should be addressed in the near term. label Apr 16, 2026
Fixes an issue where the CLI hangs on 'Thinking...' for models (like Gemma 4) that return thought text in the 'thought' field instead of 'text'. Also updates Gemma 4 model definitions to accurately reflect their 'thinking' capabilities.
@akh64bit akh64bit requested a review from a team as a code owner April 16, 2026 20:36
@github-actions
Copy link
Copy Markdown

🛑 Action Required: Evaluation Approval

Steering changes have been detected in this PR. To prevent regressions, a maintainer must approve the evaluation run before this PR can be merged.

Maintainers:

  1. Go to the Workflow Run Summary.
  2. Click the yellow 'Review deployments' button.
  3. Select the 'eval-gate' environment and click 'Approve'.

Once approved, the evaluation results will be posted here automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority/p1 Important and should be addressed in the near term.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant