Xsn/mtmd placeholder chunks#106
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors CLIP/MTMD image and bitmap types to encapsulated accessors with placeholder support; updates image preprocessing, CLIP usage across vision graphs, media ingestion, removes legacy CLIP helpers, and adds server-side token-counting endpoints with tests and documentation. ChangesImage Container Encapsulation and CLIP API Update
Media Bitmap and Tokenization Refactoring
Vision Graph and Model Updates
Server Token Counting Endpoints
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/mtmd/clip-impl.h (1)
7-15:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake
tools/mtmd/clip-impl.hself-contained.
tools/mtmd/clip-impl.hthrowsstd::runtime_error(435-437, 544-546) but doesn’t include<stdexcept>.tools/mtmd/clip.cppincludesclip-impl.hbefore its own<stdexcept>, so the build currently depends on transitive include order rather than the header.Proposed fix
`#include` <array> `#include` <climits> +#include <stdexcept> `#include` <cstdarg> `#include` <cinttypes>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/mtmd/clip-impl.h` around lines 7 - 15, The header tools/mtmd/clip-impl.h is not self-contained because it throws std::runtime_error in functions that raise exceptions (see uses around the throw sites), but doesn't include <stdexcept>; add `#include` <stdexcept> at the top of clip-impl.h so the declarations that use or throw std::runtime_error compile without relying on transitive includes (ensure the include sits with the other standard headers already present).
🧹 Nitpick comments (2)
tools/server/tests/unit/test_vision_api.py (1)
101-117: ⚡ Quick winVerify image content actually affects token counting.
input_tokens > 10is a weak proxy. Add a text-only baseline and assert the multimodal request counts more tokens.Proposed assertion upgrade
def test_vision_chat_completion_token_count(): global server server.start() - res = server.make_request("POST", "/chat/completions/input_tokens", data={ + res = server.make_request("POST", "/chat/completions/input_tokens", data={ "temperature": 0.0, "top_k": 1, "messages": [ {"role": "user", "content": [ {"type": "text", "text": "What is this:"}, {"type": "image_url", "image_url": { "url": get_img_url("IMG_URL_0"), }}, ]}, ], }) assert res.status_code == 200 + assert res.body["object"] == "response.input_tokens" assert res.body["input_tokens"] > 10 + + text_only = server.make_request("POST", "/chat/completions/input_tokens", data={ + "messages": [{"role": "user", "content": "What is this:"}], + }) + assert text_only.status_code == 200 + assert res.body["input_tokens"] > text_only.body["input_tokens"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/server/tests/unit/test_vision_api.py` around lines 101 - 117, The test test_vision_chat_completion_token_count currently only asserts res.body["input_tokens"] > 10; add a text-only baseline request using server.make_request to the same "/chat/completions/input_tokens" endpoint with an equivalent messages payload that contains only the text part (e.g., {"role":"user","content":[{"type":"text","text":"What is this:"}]}) and capture its input_tokens, then assert the multimodal response's input_tokens is greater than the text-only baseline (res.body["input_tokens"] > baseline["input_tokens"]). Ensure you reuse the same request parameters (temperature, top_k) and message ordering so the only difference is the image_url content.tools/server/tests/unit/test_chat_completion.py (1)
578-592: ⚡ Quick winStrengthen token-count contract assertions.
This test currently only checks status and a loose lower bound. It should also validate the response discriminator and deterministic count across identical requests.
Proposed test hardening
def test_chat_completions_token_count(): global server server.start() - # make sure cache can be reused across multiple choices and multiple requests - # ref: https://github.com/ggml-org/llama.cpp/pull/18663 - for _ in range(2): + counts = [] + for _ in range(2): res = server.make_request("POST", "/chat/completions/input_tokens", data={ "messages": [ {"role": "system", "content": "Book"}, {"role": "user", "content": "What is the best book"}, ], }) assert res.status_code == 200 - assert res.body["input_tokens"] > 5 + assert res.body["object"] == "response.input_tokens" + assert isinstance(res.body["input_tokens"], int) + assert res.body["input_tokens"] > 5 + counts.append(res.body["input_tokens"]) + assert counts[0] == counts[1]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/server/tests/unit/test_chat_completion.py` around lines 578 - 592, The test test_chat_completions_token_count only asserts status and a loose lower bound; update it to also assert the response discriminator and deterministic token counts: after calling server.make_request("POST", "/chat/completions/input_tokens", ...) verify res.body contains a discriminator (e.g., res.body["discriminator"] == "chat.completion" or the expected discriminator key/value used by the API) and capture res.body["input_tokens"] on the first request then assert on the second identical request that res.body["input_tokens"] equals the first captured value (in addition to the existing > 5 check) to ensure deterministic counts across identical requests. Ensure asserts reference the test function name test_chat_completions_token_count and the server.make_request response fields res.body["input_tokens"] and res.body["discriminator"] (or the API's exact discriminator key).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/mtmd/clip.cpp`:
- Around line 3432-3439: The GGML_ASSERT is comparing bytes to element count:
change the assertion to check element counts (use GGML_ASSERT(n_step * n_mel ==
buf.size());) because mel_inp->get_ro_buf() returns a std::vector<float> where
buf.size() is number of floats; keep the memcpy as-is (it should still copy
n_step * n_mel * sizeof(float) bytes into inp_raw).
In `@tools/server/server-context.cpp`:
- Around line 4838-4843: The token-counting path for /input_tokens currently
calls process_mtmd_prompt(mctx, prompt.get<std::string>(), files) which runs
full MTMD preprocessing; change this to the placeholder-mode path so counting
uses cheap placeholder chunks (e.g., call a placeholder variant such as
process_mtmd_prompt_placeholder(mctx, prompt.get<std::string>(), files) or add a
boolean flag to process_mtmd_prompt like process_mtmd_prompt(mctx,
prompt.get<std::string>(), files, /*placeholder=*/true)) so that when mctx is
non-null in the /input_tokens flow you compute n_tokens from the
placeholder-mode result instead of performing full preprocessing.
In `@tools/server/server.cpp`:
- Around line 192-193: Duplicate POST route registration for
"/responses/input_tokens" exists: locate the ctx_http.post calls that reference
routes.post_responses_tok_oai (the two entries registering
"/responses/input_tokens") and remove the redundant registration so only a
single ctx_http.post("/responses/input_tokens",
ex_wrapper(routes.post_responses_tok_oai)) remains; also scan the nearby block
(the other occurrence around the 207-211 region) to ensure no other duplicate
registrations remain and consolidate them to a single registration to avoid
route ambiguity.
---
Outside diff comments:
In `@tools/mtmd/clip-impl.h`:
- Around line 7-15: The header tools/mtmd/clip-impl.h is not self-contained
because it throws std::runtime_error in functions that raise exceptions (see
uses around the throw sites), but doesn't include <stdexcept>; add `#include`
<stdexcept> at the top of clip-impl.h so the declarations that use or throw
std::runtime_error compile without relying on transitive includes (ensure the
include sits with the other standard headers already present).
---
Nitpick comments:
In `@tools/server/tests/unit/test_chat_completion.py`:
- Around line 578-592: The test test_chat_completions_token_count only asserts
status and a loose lower bound; update it to also assert the response
discriminator and deterministic token counts: after calling
server.make_request("POST", "/chat/completions/input_tokens", ...) verify
res.body contains a discriminator (e.g., res.body["discriminator"] ==
"chat.completion" or the expected discriminator key/value used by the API) and
capture res.body["input_tokens"] on the first request then assert on the second
identical request that res.body["input_tokens"] equals the first captured value
(in addition to the existing > 5 check) to ensure deterministic counts across
identical requests. Ensure asserts reference the test function name
test_chat_completions_token_count and the server.make_request response fields
res.body["input_tokens"] and res.body["discriminator"] (or the API's exact
discriminator key).
In `@tools/server/tests/unit/test_vision_api.py`:
- Around line 101-117: The test test_vision_chat_completion_token_count
currently only asserts res.body["input_tokens"] > 10; add a text-only baseline
request using server.make_request to the same "/chat/completions/input_tokens"
endpoint with an equivalent messages payload that contains only the text part
(e.g., {"role":"user","content":[{"type":"text","text":"What is this:"}]}) and
capture its input_tokens, then assert the multimodal response's input_tokens is
greater than the text-only baseline (res.body["input_tokens"] >
baseline["input_tokens"]). Ensure you reuse the same request parameters
(temperature, top_k) and message ordering so the only difference is the
image_url content.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: abc7b4b0-c5df-4e97-862f-f86950f9c5c9
📒 Files selected for processing (25)
tools/mtmd/clip-impl.htools/mtmd/clip.cpptools/mtmd/clip.htools/mtmd/models/conformer.cpptools/mtmd/models/glm4v.cpptools/mtmd/models/granite-speech.cpptools/mtmd/models/kimik25.cpptools/mtmd/models/mimovl.cpptools/mtmd/models/qwen2vl.cpptools/mtmd/models/qwen3vl.cpptools/mtmd/models/whisper-enc.cpptools/mtmd/mtmd-cli.cpptools/mtmd/mtmd-helper.cpptools/mtmd/mtmd-helper.htools/mtmd/mtmd-image.cpptools/mtmd/mtmd.cpptools/mtmd/mtmd.htools/server/README.mdtools/server/server-common.cpptools/server/server-common.htools/server/server-context.cpptools/server/server-context.htools/server/server.cpptools/server/tests/unit/test_chat_completion.pytools/server/tests/unit/test_vision_api.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/mtmd/mtmd-image.cpp (1)
1265-1270:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate the placeholder flag into
dst.Line 1265 hardcodes
dstas non-placeholder, but the function immediately returns on placeholder inputs. That breaks the new placeholder flow for the Step3VL preprocessing path by leavingdstshaped like a real image without populated pixels. Passsrc.is_placeholder()through when sizingdstinstead of hardcodingfalse.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/mtmd/mtmd-image.cpp` around lines 1265 - 1270, The dst image is always being created as non-placeholder by calling dst.set_size({target_width, target_height}, false, false) even when src.is_placeholder() is true; change the call in the function that handles Step3VL preprocessing so that the placeholder flag is propagated (pass src.is_placeholder() as the placeholder argument to dst.set_size) instead of hardcoding false, ensuring dst retains the placeholder state when src.is_placeholder() returns true.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@tools/mtmd/mtmd-image.cpp`:
- Around line 1265-1270: The dst image is always being created as
non-placeholder by calling dst.set_size({target_width, target_height}, false,
false) even when src.is_placeholder() is true; change the call in the function
that handles Step3VL preprocessing so that the placeholder flag is propagated
(pass src.is_placeholder() as the placeholder argument to dst.set_size) instead
of hardcoding false, ensuring dst retains the placeholder state when
src.is_placeholder() returns true.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c10f5ec2-d209-4a85-886d-8d9c18d2baff
📒 Files selected for processing (4)
tools/mtmd/clip-impl.htools/mtmd/clip.cpptools/mtmd/mtmd-image.cpptools/mtmd/mtmd.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
- tools/mtmd/clip-impl.h
- tools/mtmd/mtmd.cpp
For AI review
Mirror upstream ggml-org#23913
Summary by CodeRabbit
New Features
Documentation
Tests
Bug Fixes