-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
TL;DR
Add a debug option to the WebUI that displays raw toolcall chunks (like reasoning blocks) and lets users inject custom Harmony-formatted tool documentation.
A simple and transparent way to inspect model behavior and help the community improve llama.cpp
.
Summary
Introduce a new WebUI Settings option to display OpenAI-Compatible toolcall chunks, similar to the existing reasoning_content
(thinking blocks) display.
This idea was inspired by PR #13501 by @samolego, who did excellent exploratory work on tool calling in the WebUI.
Even though that PR was eventually closed, it sparked this simpler and safer approach: a read-only visualization of the Harmony toolcall
field (if present) that fits cleanly into the existing WebUI logic.
Rationale
This feature would include a small optional text field to inject custom tool documentation. Together, the checkbox and input field would turn the WebUI into a lightweight debugging console: useful for verifying model compatibility or observing backend behavior during refactoring.
Proposal
- Add a checkbox in Settings (e.g. Show toolcall chunks).
- When enabled, the WebUI displays toolcall-related chunks as structured blocks, similar to reasoning content.
- Add an optional empty Tool prompt field to inject custom tool documentation, formatted according to the Harmony specification, directly into the JSON request. || Using the existing "Custom JSON parameters to send to the API. Must be valid JSON format." field may work!
- No runtime execution, no security implications, no additional parsing complexity — purely a read-only display option.
Benefits
- Fully consistent with the existing OpenAI-Compatible API logic.
- Helps developers debug and understand model outputs in real time.
- Zero execution risk (read-only visualization).
- Educational for those learning about tool calls and chunked responses.
- No impact on inference stability or backend performance.
Motivation
Motivation
The main goal is transparency: allowing users to see the exact toolcall chunks emitted by models in real time, without executing anything client-side.
It provides valuable insight for debugging, education, and development of larger integrations built on top of llama.cpp
.
This also aligns perfectly with ongoing refactoring work and non-regression testing, helping ensure consistent and predictable behavior across models and backend changes.
Possible Implementation
No response