Classify tool call tokens#10
Merged
Merged
Conversation
- Rename ReasoningTokenClassifier to SampledTokenClassifier and accept optional reasoning + tool-call marker pairs. - Add SampledToken::ToolCall variant and TokenUsage tool_call_tokens counter. - Expose llama_rs_detect_tool_call_markers FFI that reports the autoparser's tools.format.section_start/end strings. - completion_tokens now sums every classified output kind so OpenAI-style totals match generated output even for models without reasoning markers.
… gating The autoparser's `analyze_template` only runs tool-call analysis when `jinja_caps.supports_tool_calls` is true, which is itself computed by trying to render the template against a synthetic tool-using conversation. Templates that can't render that exact conversation (Qwen3 is one) end up reporting `supports_tool_calls=false` even though they happily emit tool calls in real use, and the autoparser then leaves `tools.format` empty. `llama_rs_detect_tool_call_markers` now reproduces the autoparser's diff-based detection directly: render the template with and without a tool-call assistant turn (using plain ASCII synthetic names), strip reasoning markers, locate the JSON payload by braces, and return the surrounding text as the open/close markers. This stays grounded in the template's actual emitted output instead of falling back to model-specific heuristics. Also adds `llama_rs_diagnose_tool_call_synthetic_renders` so callers can inspect the rendered no-tools/with-tools outputs when detection fails.
Round-trip test confirms the configured marker pairs come back through markers(), and the undetermined() constructor reports None for both — matching the runtime behaviour the diff-based detector now relies on.
Merge ToolCall and Undeterminable arms into one branch where they share a no-op body, document the new diagnose_tool_call_synthetic_renders helper's errors section, and backtick OpenAI in the TokenUsage::completion_tokens docstring.
New wrapper_chat_parse.{h,cpp} wrap llama.cpp's `common_chat_parse` so
Paddler can recover structured tool-call data without ever deserialising
JSON in Rust on model output. The handle owns the parsed common_chat_msg;
accessor functions return owned strings (count + indexed getters for the
tool_calls list, plus content / reasoning_content getters) and a free
function tears down the handle.
ParsedChatMessage / ParsedToolCall value objects (Rust side) are pure data
and carry their own unit tests. Model::parse_chat_message wraps the FFI
behind a typed Result, with ParseChatMessageError variants per failure
mode (FfiError, ParseException, StringUtf8Error, ToolsSerialization,
NoChatTemplate).
TestFixture::shared now uses OnceLock::get_or_init so multiple tests in a
binary don't race on LlamaBackend::init. New integration tests exercise
parse_chat_message on the env-driven default model (pure content, Qwen3
tool-call payload, partial input, multiple calls, reasoning section,
empty input). The classifier marker-detection test that used to live in
paddler_tests now lives in bindings-tests so the bindings carry their own
quality bar.
…tring detector; require compiled gpu backend in test fixture
…del classifier tests
… for Gemma 4, Mistral 3, Qwen XML
…Rust, tighten lint attributes
…classify-tool-call-tokens
…ly downgraded to March 30
…sts via --no-report accumulation
…synthesis into parse_chat_message
…overage for GLM-4.7 and DeepSeek-R1-8B
…easoning tokens classify correctly
…nd JsonObject duck-type parser
… of letting GGML_ASSERT abort
…cept MTL as a Metal backend name
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.