webui : store reasoning_content so it is sent back in subsequent requests by aldehir · Pull Request #21249 · ggml-org/llama.cpp

aldehir · 2026-04-01T06:27:22Z

Overview

The reasoning_content should be sent back in the assistant message on subsequent requests. It wasn't being saved in the history and therefore lost.

Additional information

For models that support interleaved thinking, it is important to send back the reasoning during agentic loops.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES to find the gap

aldehir · 2026-04-04T21:15:57Z

Apologies, I did not realize the type check failed.

allozaur · 2026-04-07T10:20:39Z

Hey, @aldehir please re-run npm run build and we'll have this merged.

…ests

aldehir · 2026-04-07T10:29:13Z

@allozaur rebased and rebuilt.

allozaur · 2026-04-07T11:27:37Z

@ServeurpersoCom plz give 2nd approval and let's merge it :)

julmb · 2026-04-07T15:45:25Z

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

ServeurpersoCom · 2026-04-07T16:09:05Z

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

Good point, but this is already covered. On the server side, the chat template is the source of truth: if a model's template doesn't reference reasoning_content, the field is simply ignored. On the WebUI side, there's already a toggle: "Exclude reasoning from context" that lets the user strip it explicitly. So no extra flag needed.

aldehir · 2026-04-07T16:52:23Z

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

Reasoning should be retained for Gemma 4 models between tool calls. https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
Models that don't need it will simply not include it in their templates. We delegate to the chat template, as @ServeurpersoCom mentioned.

julmb · 2026-04-07T18:10:07Z

I was not aware of the template stripping the reasoning as necessary, but that actually makes a lot of sense, thank you for explaining!

…ests (ggml-org#21249) (cherry picked from commit 482192f)

…ests (ggml-org#21249)

aldehir requested a review from a team as a code owner April 1, 2026 06:27

github-actions bot added examples server labels Apr 1, 2026

aldehir mentioned this pull request Apr 1, 2026

server/webui: cleanup dual representation approach, simplify to openai-compat #21090

Merged

allozaur approved these changes Apr 6, 2026

View reviewed changes

allozaur requested a review from ServeurpersoCom April 6, 2026 11:19

aldehir added 2 commits April 7, 2026 05:26

webui : store reasoning_content so it is sent back in subsequent requ…

df61b3d

…ests

fix type issue

a6d6862

aldehir force-pushed the webui-agentic-reasoning-content branch from 05e0e3a to a6d6862 Compare April 7, 2026 10:28

ServeurpersoCom approved these changes Apr 7, 2026

View reviewed changes

allozaur merged commit 482192f into ggml-org:master Apr 7, 2026
6 checks passed

ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Apr 7, 2026

webui : store reasoning_content so it is sent back in subsequent requ…

39a1356

…ests (ggml-org#21249) (cherry picked from commit 482192f)

iamwavecut pushed a commit to iamwavecut/llama-cpp-turboquant that referenced this pull request Apr 8, 2026

webui : store reasoning_content so it is sent back in subsequent requ…

b46e9dc

…ests (ggml-org#21249)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

webui : store reasoning_content so it is sent back in subsequent requests#21249

webui : store reasoning_content so it is sent back in subsequent requests#21249
allozaur merged 2 commits intoggml-org:masterfrom
aldehir:webui-agentic-reasoning-content

aldehir commented Apr 1, 2026

Uh oh!

aldehir commented Apr 4, 2026

Uh oh!

allozaur commented Apr 7, 2026

Uh oh!

aldehir commented Apr 7, 2026

Uh oh!

allozaur commented Apr 7, 2026

Uh oh!

Uh oh!

julmb commented Apr 7, 2026

Uh oh!

ServeurpersoCom commented Apr 7, 2026

Uh oh!

aldehir commented Apr 7, 2026 •

edited

Loading

Uh oh!

julmb commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aldehir commented Apr 1, 2026

Overview

Additional information

Requirements

Uh oh!

aldehir commented Apr 4, 2026

Uh oh!

allozaur commented Apr 7, 2026

Uh oh!

aldehir commented Apr 7, 2026

Uh oh!

allozaur commented Apr 7, 2026

Uh oh!

Uh oh!

julmb commented Apr 7, 2026

Uh oh!

ServeurpersoCom commented Apr 7, 2026

Uh oh!

aldehir commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

julmb commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aldehir commented Apr 7, 2026 •

edited

Loading