-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
Name and Version
$ ./build/bin/llama-server --version
version: 6684 (606a73f)
built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./build/bin/llama-server -m /mnt/models/DeepSeek-R1-0528-UD-IQ1_S-00001-of-00004.gguf --ctx-size 20480 --port 8080 --jinja --no-warmup --no-repack
./build/bin/llama-server -m /mnt/models/granite-4.0-h-small-Q6_K.gguf --ctx-size 16384 --port 8080 --jinja
./build/bin/llama-server -m /mnt/models/Qwen3-30B-A3B-Thinking-2507-Q4_K_M.gguf --ctx-size 16384 --port 8080 --jinja
Problem description & steps to reproduce
I use the web UI to attach a ~30KB .txt
file with my notes and instructions to create an HTML page with a table showing the results of my notes with some specific columns requested, as well as a summary and analysis.
For thinking models, the web UI correctly shows the reasoning tokens, but frequently drops parts or sometimes even entire final HTML output of the model. I can see that tokens are being produced, because the counters at the bottom of the page are going up, but nothing is showing up.
For non-thinking models some parts of the output might show up (like a single <p>...</p>
) or nothing at all.
I also tested sending my query directly to the /v1/chat/completions
API endpoint, and that endpoint produces a complete and correct response for the same model and server instance, no tokens missing.
I tested multiple different models - DeepSeek, Qwen, Granite - all of them had some issues, though some of them managed to produce some parts of the output in the web UI sometimes.
Based on the fact that this happens in multiple models on the web UI but the API does not have that problem, I assume there's a problem with handling HTML output in the UI.
First Bad Commit
No response