common : enable reasoning budget sampler for gemma4 by berkidem · Pull Request #21697 · ggml-org/llama.cpp

berkidem · 2026-04-09T22:05:57Z

Overview

As #21487 also reports, gemma4 thinking budget doesn't work. I noticed that common_chat_params_init_gemma4() sets supports_thinking = true but never populates thinking_start_tag / thinking_end_tag. The budget sampler in server-common.cpp works conditional on thinking_end_tag being non-empty, so it skips gemma4 entirely.

So I added the missing tags. The main fix is just two lines (chat.cpp:1087-1088). The rest of the diff is about making budget=0 work cleanly: while testing for my personal use (see the details of the local testing environment below), I found that budget=0 causes a PEG parse error because the sampler forces the end tag before the model emits a newline after "thought". Even though --reasoning off already handles the no-thinking case, I didn't want to introduce a parse error at that edge case. I made the newline optional in the parser, and added a test case for it.

Fixes #21487

Changes

Set thinking_start_tag = "<|channel>thought" and thinking_end_tag = "<channel|>" in common_chat_params_init_gemma4()
Make \n optional after thought in both PEG parser rules (extract and non-extract paths) so the parser handles empty thinking blocks
Add test case for empty thinking block (budget=0 scenario)

Testing

Unit tests: test-reasoning-budget (5/5), test-chat (all passed).

Integration tested on RTX 5090 with gemma-4-26B-A4B-it-Q4_K_M.gguf (ggml-org), llama.cpp b8718, CUDA 13.0:

Config	Thinking tokens
`--reasoning-budget -1` (unlimited)	1,447
`--reasoning-budget 1024`	1,024
`--reasoning-budget 0`	0
Per-request `thinking_budget_tokens: 256`	256

I didn't run perplexity/bench since my PR simply allows the existing sampler that already works for other models to also apply to gemma4. Default behavior (--reasoning-budget -1) is unchanged so the only people affected are those who explicitly set a reasoning budget (and don't get it). (Also the CI test uses Qwen3 so my diff wouldn't be on the code path that gets exercised.)

Note on the start tag: the tag omits the trailing \n because gemma4 generates a double-newline token (\n\n, token 108) after "thought", not the single-newline token (\n, token 107) that the tokenizer produces for "thought\n". Without this, the sampler's token matcher fails on the third token and never activates.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: AI assisted with locating the root cause in the source code and suggested a fix. Fix was reviewed and tested manually, and extensively both for this PR and for my own local projects as well.

Add thinking_start_tag and thinking_end_tag to common_chat_params_init_gemma4(). Without these, the reasoning budget sampler never activates for gemma4. Make the newline after "thought" optional in the PEG parser to handle budget=0 (sampler forces end tag before the newline). Add test case for empty thinking block. Fixes ggml-org#21487

pwilkin · 2026-04-09T22:08:14Z

Any reason p.optional(p.literal("\n")) can't be simply replaced with p.space()?

pwilkin

LGTM, but would feel safe if @aldehir took a peek at it too.

…t parser

berkidem · 2026-04-09T23:18:38Z

Any reason p.optional(p.literal("\n")) can't be simply replaced with p.space()?

Good point, no particular reason; I was overfitting to the exact pattern I saw (it was the same in all experiments). Changed to p.space().

aldehir

Looks good, thank you.

berkidem requested review from a team and pwilkin as code owners April 9, 2026 22:06

pwilkin approved these changes Apr 9, 2026

View reviewed changes

use p.space() instead of p.optional(p.literal("\n")) in gemma4 though…

b6e7201

…t parser

github-actions bot added the testing Everything test related label Apr 9, 2026

aldehir approved these changes Apr 10, 2026

View reviewed changes

pwilkin merged commit d7ff074 into ggml-org:master Apr 10, 2026
45 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : enable reasoning budget sampler for gemma4#21697

common : enable reasoning budget sampler for gemma4#21697
pwilkin merged 2 commits intoggml-org:masterfrom
berkidem:fix/gemma4-reasoning-budget

berkidem commented Apr 9, 2026

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

pwilkin left a comment

Uh oh!

berkidem commented Apr 9, 2026

Uh oh!

aldehir left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

berkidem commented Apr 9, 2026

Overview

Changes

Testing

Requirements

Uh oh!

pwilkin commented Apr 9, 2026

Uh oh!

pwilkin left a comment

Choose a reason for hiding this comment

Uh oh!

berkidem commented Apr 9, 2026

Uh oh!

aldehir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants