fix(reasoning): stop <think> leaking into content when autoparser is in pure-content mode#9991
Merged
Merged
Conversation
…in pure-content mode When LocalAI templates a thinking model outside of jinja (the default for the qwen3 gallery family), llama.cpp's chat parser falls back to a "pure content" PEG parser that dumps the entire raw response into ChatDelta.Content with an empty ReasoningContent. The Go side then trusted that content verbatim and overrode tokenCallback's correctly-split reasoning, so <think>...</think> blocks ended up in the OpenAI `content` field. Regression from v4.0.0 introduced when the autoparser ChatDeltas path was added (#9224). The override now runs Go-side reasoning extraction defensively when the autoparser delivered content but no reasoning. The streaming worker gains a sticky preferAutoparser flag that flips on the first chunk where the autoparser classified reasoning_content; until then we use the streaming Go-side extractor. Realtime mirrors the non-streaming fallback. When the autoparser already populated ReasoningContent we trust it untouched, so jinja-enabled installs are not regressed. gallery/qwen3.yaml now enables use_jinja, letting the autoparser classify <think> natively for all 20+ qwen3 family entries that share this template. Fixes #9985 Assisted-by: Claude:opus-4-7 [Read] [Edit] [Bash] [Write] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This was referenced May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #9985 — qwen3-4b (and the rest of the qwen3 family) was returning the
<think>...</think>block inside the OpenAIcontentfield instead of in a separatereasoningfield. Regression from v4.0.0, introduced by the C++ autoparser ChatDeltas path (#9224).Root cause
When LocalAI templates a thinking model outside of jinja (the default for the qwen3 gallery), llama.cpp's chat parser falls back to a "pure content" PEG parser. It dumps the entire raw response —
<think>tags and all — intoChatDelta.Contentand leavesChatDelta.ReasoningContentempty. The Go side inchat.gothen preferred the autoparser's content overtokenCallback's correctly-split result, so the tags leaked through.Debug log showing the bug:
Fix shape
applyAutoparserOverride(extracted from chat.go's inline override) now runs Go-sideExtractReasoningWithConfigwhen the autoparser delivered content but no reasoning. When the autoparser DID populateReasoningContent, we trust it untouched — jinja-enabled installs are not regressed.preferAutoparserflag. It flips on the first chunk where the autoparser classifiedreasoning_content; until then the streaming worker uses the Go-side extractor's deltas.gallery/qwen3.yamlnow enablesuse_jinja:trueso the autoparser classifies<think>natively for the 20+ qwen3 family entries sharing this template. The Go-side fallback still covers older on-disk installs and any future imported models without jinja.Test plan
go test ./core/http/endpoints/openai/ ./core/http/endpoints/openresponses/ ./pkg/reasoning/ ./pkg/functions/— greenchat_test.gocovering:<think>in content + empty reasoning → split correctly (red without fix, green with fix)<think></think>block from qwen3/no_think→ tags stripped, no spurious reasoning fieldgolangci-lint run --new-from-merge-base=master— 0 new issues/no_thinkmode: empty think block stripped cleanlydelta.reasoning, content chunks cleanuse_jinja:truevariant (working-autoparser baseline):content_len=39 reasoning_len=376from autoparser — Go-side fallback bypassed as expected🤖 Generated with Claude Code