-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Description
Name and Version
commit d8359f5 (tag: b6615, upstream/master, origin/master, origin/HEAD, master)
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
Problem description & steps to reproduce
Assistant messages that include a </think> block containing a double quote render correctly while streaming, but the reasoning text disappears after the message list re-renders or when a saved conversation is imported.
The parser that strips </think> blocks during rehydration drops the reasoning payload, leaving the thinking pane empty even though the original stream contained text.
Steps to reproduce:
(I used Llama-3_3-Nemotron-Super-49B)
- Generate an assistant reply with a </think> block containing a quote ".
- During the streaming -> reasoning text is visible.
- After assistant response, observe that the thinking panel for that message is partial or empty.
- Refresh the page or import the saved conversation JSON -> same.
Expected result:
The </think> block remains intact and visible after any re-render or import.
Actual result:
The </think> block becomes blank or incomplete whenever it contains a double quote, on re-render or import.
Fix:
Switch parsing logic from split('</think>') to a regex match that preserves all content inside <think> and removes only the matched segment.
Working proposal : master...ServeurpersoCom:llama.cpp:fix-thinking-blocks-with-quotes
First Bad Commit
No response