Skip to content

webui : store reasoning_content so it is sent back in subsequent requests#21249

Merged
allozaur merged 2 commits intoggml-org:masterfrom
aldehir:webui-agentic-reasoning-content
Apr 7, 2026
Merged

webui : store reasoning_content so it is sent back in subsequent requests#21249
allozaur merged 2 commits intoggml-org:masterfrom
aldehir:webui-agentic-reasoning-content

Conversation

@aldehir
Copy link
Copy Markdown
Contributor

@aldehir aldehir commented Apr 1, 2026

Overview

The reasoning_content should be sent back in the assistant message on subsequent requests. It wasn't being saved in the history and therefore lost.

Additional information

For models that support interleaved thinking, it is important to send back the reasoning during agentic loops.

Requirements

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Apr 4, 2026

Apologies, I did not realize the type check failed.

@allozaur allozaur requested a review from ServeurpersoCom April 6, 2026 11:19
@allozaur
Copy link
Copy Markdown
Contributor

allozaur commented Apr 7, 2026

Hey, @aldehir please re-run npm run build and we'll have this merged.

@aldehir aldehir force-pushed the webui-agentic-reasoning-content branch from 05e0e3a to a6d6862 Compare April 7, 2026 10:28
@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Apr 7, 2026

@allozaur rebased and rebuilt.

@allozaur
Copy link
Copy Markdown
Contributor

allozaur commented Apr 7, 2026

@ServeurpersoCom plz give 2nd approval and let's merge it :)

@allozaur allozaur merged commit 482192f into ggml-org:master Apr 7, 2026
6 checks passed
@julmb
Copy link
Copy Markdown

julmb commented Apr 7, 2026

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

Good point, but this is already covered. On the server side, the chat template is the source of truth: if a model's template doesn't reference reasoning_content, the field is simply ignored. On the WebUI side, there's already a toggle: "Exclude reasoning from context" that lets the user strip it explicitly. So no extra flag needed.

@aldehir
Copy link
Copy Markdown
Contributor Author

aldehir commented Apr 7, 2026

Isn't this model specific? The Gemma 4 model description says:

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

Maybe there should be a flag embedded in the model metadata to specify this?

  1. Reasoning should be retained for Gemma 4 models between tool calls. https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4

  2. Models that don't need it will simply not include it in their templates. We delegate to the chat template, as @ServeurpersoCom mentioned.

@julmb
Copy link
Copy Markdown

julmb commented Apr 7, 2026

I was not aware of the template stripping the reasoning as necessary, but that actually makes a lot of sense, thank you for explaining!

ronaldmannak pushed a commit to PicoMLX/llama.cpp that referenced this pull request Apr 7, 2026
iamwavecut pushed a commit to iamwavecut/llama-cpp-turboquant that referenced this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants