common : enable reasoning budget sampler for gemma4#21697
Merged
pwilkin merged 2 commits intoggml-org:masterfrom Apr 10, 2026
Merged
common : enable reasoning budget sampler for gemma4#21697pwilkin merged 2 commits intoggml-org:masterfrom
pwilkin merged 2 commits intoggml-org:masterfrom
Conversation
Add thinking_start_tag and thinking_end_tag to common_chat_params_init_gemma4(). Without these, the reasoning budget sampler never activates for gemma4. Make the newline after "thought" optional in the PEG parser to handle budget=0 (sampler forces end tag before the newline). Add test case for empty thinking block. Fixes ggml-org#21487
Member
|
Any reason |
Contributor
Author
Good point, no particular reason; I was overfitting to the exact pattern I saw (it was the same in all experiments). Changed to p.space(). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
As #21487 also reports, gemma4 thinking budget doesn't work. I noticed that
common_chat_params_init_gemma4()setssupports_thinking = truebut never populatesthinking_start_tag/thinking_end_tag. The budget sampler inserver-common.cppworks conditional onthinking_end_tagbeing non-empty, so it skips gemma4 entirely.So I added the missing tags. The main fix is just two lines (chat.cpp:1087-1088). The rest of the diff is about making budget=0 work cleanly: while testing for my personal use (see the details of the local testing environment below), I found that budget=0 causes a PEG parse error because the sampler forces the end tag before the model emits a newline after "thought". Even though
--reasoning offalready handles the no-thinking case, I didn't want to introduce a parse error at that edge case. I made the newline optional in the parser, and added a test case for it.Fixes #21487
Changes
thinking_start_tag = "<|channel>thought"andthinking_end_tag = "<channel|>"incommon_chat_params_init_gemma4()\noptional afterthoughtin both PEG parser rules (extract and non-extract paths) so the parser handles empty thinking blocksTesting
Unit tests:
test-reasoning-budget(5/5),test-chat(all passed).Integration tested on RTX 5090 with
gemma-4-26B-A4B-it-Q4_K_M.gguf(ggml-org), llama.cpp b8718, CUDA 13.0:--reasoning-budget -1(unlimited)--reasoning-budget 1024--reasoning-budget 0thinking_budget_tokens: 256I didn't run perplexity/bench since my PR simply allows the existing sampler that already works for other models to also apply to gemma4. Default behavior (
--reasoning-budget -1) is unchanged so the only people affected are those who explicitly set a reasoning budget (and don't get it). (Also the CI test uses Qwen3 so my diff wouldn't be on the code path that gets exercised.)Note on the start tag: the tag omits the trailing
\nbecause gemma4 generates a double-newline token (\n\n, token 108) after "thought", not the single-newline token (\n, token 107) that the tokenizer produces for"thought\n". Without this, the sampler's token matcher fails on the third token and never activates.Requirements