fix(reasoning): stop <think> leaking into content when autoparser is in pure-content mode by localai-bot · Pull Request #9991 · mudler/LocalAI

localai-bot · 2026-05-25T20:00:04Z

Summary

Fixes #9985 — qwen3-4b (and the rest of the qwen3 family) was returning the <think>...</think> block inside the OpenAI content field instead of in a separate reasoning field. Regression from v4.0.0, introduced by the C++ autoparser ChatDeltas path (#9224).

Root cause

When LocalAI templates a thinking model outside of jinja (the default for the qwen3 gallery), llama.cpp's chat parser falls back to a "pure content" PEG parser. It dumps the entire raw response — <think> tags and all — into ChatDelta.Content and leaves ChatDelta.ReasoningContent empty. The Go side in chat.go then preferred the autoparser's content over tokenCallback's correctly-split result, so the tags leaked through.

Debug log showing the bug:

[ChatDeltas] non-streaming Predict received deltas from C++ autoparser total_deltas=1
[ChatDeltas] non-SSE no-tools: overriding result with C++ autoparser deltas content_len=376 reasoning_len=0

Fix shape

Conditional fallback. applyAutoparserOverride (extracted from chat.go's inline override) now runs Go-side ExtractReasoningWithConfig when the autoparser delivered content but no reasoning. When the autoparser DID populate ReasoningContent, we trust it untouched — jinja-enabled installs are not regressed.
Streaming gets a sticky preferAutoparser flag. It flips on the first chunk where the autoparser classified reasoning_content; until then the streaming worker uses the Go-side extractor's deltas.
Realtime mirrors the non-streaming fallback.
gallery/qwen3.yaml now enables use_jinja:true so the autoparser classifies <think> natively for the 20+ qwen3 family entries sharing this template. The Go-side fallback still covers older on-disk installs and any future imported models without jinja.

Test plan

go test ./core/http/endpoints/openai/ ./core/http/endpoints/openresponses/ ./pkg/reasoning/ ./pkg/functions/ — green
New Ginkgo specs in chat_test.go covering:
- autoparser delivered <think> in content + empty reasoning → split correctly (red without fix, green with fix)
- autoparser already populated reasoning → passthrough untouched (no-regression on jinja path)
- plain content, no reasoning tags → passthrough
- empty <think></think> block from qwen3 /no_think → tags stripped, no spurious reasoning field
- empty chatDeltas → returns existing result
golangci-lint run --new-from-merge-base=master — 0 new issues
End-to-end against running qwen3-4b (Q4_K_M):
- Default thinking mode: content clean, reasoning in its own field
- /no_think mode: empty think block stripped cleanly
- Streaming: reasoning chunks delivered in delta.reasoning, content chunks clean
- use_jinja:true variant (working-autoparser baseline): content_len=39 reasoning_len=376 from autoparser — Go-side fallback bypassed as expected

🤖 Generated with Claude Code

…in pure-content mode When LocalAI templates a thinking model outside of jinja (the default for the qwen3 gallery family), llama.cpp's chat parser falls back to a "pure content" PEG parser that dumps the entire raw response into ChatDelta.Content with an empty ReasoningContent. The Go side then trusted that content verbatim and overrode tokenCallback's correctly-split reasoning, so <think>...</think> blocks ended up in the OpenAI `content` field. Regression from v4.0.0 introduced when the autoparser ChatDeltas path was added (#9224). The override now runs Go-side reasoning extraction defensively when the autoparser delivered content but no reasoning. The streaming worker gains a sticky preferAutoparser flag that flips on the first chunk where the autoparser classified reasoning_content; until then we use the streaming Go-side extractor. Realtime mirrors the non-streaming fallback. When the autoparser already populated ReasoningContent we trust it untouched, so jinja-enabled installs are not regressed. gallery/qwen3.yaml now enables use_jinja, letting the autoparser classify <think> natively for all 20+ qwen3 family entries that share this template. Fixes #9985 Assisted-by: Claude:opus-4-7 [Read] [Edit] [Bash] [Write] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler mentioned this pull request May 25, 2026

Regression: Reasoning/thinking output provided as regular output #9985

Closed

mudler merged commit 1c6c3ad into master May 25, 2026
57 checks passed

mudler deleted the fix/9985-autoparser-reasoning-leak branch May 25, 2026 20:39

This was referenced May 25, 2026

fix(streaming/tools): stop healing-marker stubs from gating off content #9999

Merged

fix(streaming/tools): don't leak prefill-misclassified content as trailing reasoning chunk #10000

Merged

BrewTestBot mentioned this pull request May 27, 2026

localai 4.3.2 Homebrew/homebrew-core#285003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(reasoning): stop <think> leaking into content when autoparser is in pure-content mode#9991

fix(reasoning): stop <think> leaking into content when autoparser is in pure-content mode#9991
mudler merged 1 commit into
masterfrom
fix/9985-autoparser-reasoning-leak

localai-bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 25, 2026

Summary

Root cause

Fix shape

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants