Skip to content

common : enable reasoning budget sampler for gemma4#21697

Merged
pwilkin merged 2 commits intoggml-org:masterfrom
berkidem:fix/gemma4-reasoning-budget
Apr 10, 2026
Merged

common : enable reasoning budget sampler for gemma4#21697
pwilkin merged 2 commits intoggml-org:masterfrom
berkidem:fix/gemma4-reasoning-budget

Conversation

@berkidem
Copy link
Copy Markdown
Contributor

@berkidem berkidem commented Apr 9, 2026

Overview

As #21487 also reports, gemma4 thinking budget doesn't work. I noticed that common_chat_params_init_gemma4() sets supports_thinking = true but never populates thinking_start_tag / thinking_end_tag. The budget sampler in server-common.cpp works conditional on thinking_end_tag being non-empty, so it skips gemma4 entirely.

So I added the missing tags. The main fix is just two lines (chat.cpp:1087-1088). The rest of the diff is about making budget=0 work cleanly: while testing for my personal use (see the details of the local testing environment below), I found that budget=0 causes a PEG parse error because the sampler forces the end tag before the model emits a newline after "thought". Even though --reasoning off already handles the no-thinking case, I didn't want to introduce a parse error at that edge case. I made the newline optional in the parser, and added a test case for it.

Fixes #21487

Changes

  • Set thinking_start_tag = "<|channel>thought" and thinking_end_tag = "<channel|>" in common_chat_params_init_gemma4()
  • Make \n optional after thought in both PEG parser rules (extract and non-extract paths) so the parser handles empty thinking blocks
  • Add test case for empty thinking block (budget=0 scenario)

Testing

Unit tests: test-reasoning-budget (5/5), test-chat (all passed).

Integration tested on RTX 5090 with gemma-4-26B-A4B-it-Q4_K_M.gguf (ggml-org), llama.cpp b8718, CUDA 13.0:

Config Thinking tokens
--reasoning-budget -1 (unlimited) 1,447
--reasoning-budget 1024 1,024
--reasoning-budget 0 0
Per-request thinking_budget_tokens: 256 256

I didn't run perplexity/bench since my PR simply allows the existing sampler that already works for other models to also apply to gemma4. Default behavior (--reasoning-budget -1) is unchanged so the only people affected are those who explicitly set a reasoning budget (and don't get it). (Also the CI test uses Qwen3 so my diff wouldn't be on the code path that gets exercised.)

Note on the start tag: the tag omits the trailing \n because gemma4 generates a double-newline token (\n\n, token 108) after "thought", not the single-newline token (\n, token 107) that the tokenizer produces for "thought\n". Without this, the sampler's token matcher fails on the third token and never activates.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: AI assisted with locating the root cause in the source code and suggested a fix. Fix was reviewed and tested manually, and extensively both for this PR and for my own local projects as well.

Add thinking_start_tag and thinking_end_tag to
common_chat_params_init_gemma4(). Without these, the reasoning
budget sampler never activates for gemma4.

Make the newline after "thought" optional in the PEG parser to
handle budget=0 (sampler forces end tag before the newline).

Add test case for empty thinking block.

Fixes ggml-org#21487
@berkidem berkidem requested review from a team and pwilkin as code owners April 9, 2026 22:06
@pwilkin
Copy link
Copy Markdown
Member

pwilkin commented Apr 9, 2026

Any reason p.optional(p.literal("\n")) can't be simply replaced with p.space()?

Copy link
Copy Markdown
Member

@pwilkin pwilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would feel safe if @aldehir took a peek at it too.

@berkidem
Copy link
Copy Markdown
Contributor Author

berkidem commented Apr 9, 2026

Any reason p.optional(p.literal("\n")) can't be simply replaced with p.space()?

Good point, no particular reason; I was overfitting to the exact pattern I saw (it was the same in all experiments). Changed to p.space().

@github-actions github-actions bot added the testing Everything test related label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@aldehir aldehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you.

@pwilkin pwilkin merged commit d7ff074 into ggml-org:master Apr 10, 2026
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Regression: --reasoning-budget N is ignored

3 participants