Skip to content

Feature Request: Add a debug option to display OpenAI-Compatible toolcall chunks in the WebUI #16597

@ServeurpersoCom

Description

@ServeurpersoCom

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

TL;DR

Add a debug option to the WebUI that displays raw toolcall chunks (like reasoning blocks) and lets users inject custom Harmony-formatted tool documentation.
A simple and transparent way to inspect model behavior and help the community improve llama.cpp.

Summary

Introduce a new WebUI Settings option to display OpenAI-Compatible toolcall chunks, similar to the existing reasoning_content (thinking blocks) display.

This idea was inspired by PR #13501 by @samolego, who did excellent exploratory work on tool calling in the WebUI.
Even though that PR was eventually closed, it sparked this simpler and safer approach: a read-only visualization of the Harmony toolcall field (if present) that fits cleanly into the existing WebUI logic.

Rationale

This feature would include a small optional text field to inject custom tool documentation. Together, the checkbox and input field would turn the WebUI into a lightweight debugging console: useful for verifying model compatibility or observing backend behavior during refactoring.

Proposal

  • Add a checkbox in Settings (e.g. Show toolcall chunks).
  • When enabled, the WebUI displays toolcall-related chunks as structured blocks, similar to reasoning content.
  • Add an optional empty Tool prompt field to inject custom tool documentation, formatted according to the Harmony specification, directly into the JSON request. || Using the existing "Custom JSON parameters to send to the API. Must be valid JSON format." field may work!
  • No runtime execution, no security implications, no additional parsing complexity — purely a read-only display option.

Benefits

  • Fully consistent with the existing OpenAI-Compatible API logic.
  • Helps developers debug and understand model outputs in real time.
  • Zero execution risk (read-only visualization).
  • Educational for those learning about tool calls and chunked responses.
  • No impact on inference stability or backend performance.

@ngxson @allozaur @ggerganov

Motivation

Motivation

The main goal is transparency: allowing users to see the exact toolcall chunks emitted by models in real time, without executing anything client-side.
It provides valuable insight for debugging, education, and development of larger integrations built on top of llama.cpp.
This also aligns perfectly with ongoing refactoring work and non-regression testing, helping ensure consistent and predictable behavior across models and backend changes.

Possible Implementation

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions