Skip to content

mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API#23913

Open
ngxson wants to merge 13 commits into
masterfrom
xsn/mtmd_placeholder_chunks
Open

mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API#23913
ngxson wants to merge 13 commits into
masterfrom
xsn/mtmd_placeholder_chunks

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented May 30, 2026

Overview

Tokenizing / preprocessing multimodal input is much more CPU-intensive than tokenizing, because it runs on single thread. This can be wasteful if the user just want to count the number of tokens occupied by an image/audio chunk, without actually using the underlay data.

This PR allow create a "placeholder" bitmap that only contains the dimension, no data buffer will be allocated. Preprocessing ops (i.e. image manipulation) will skip processing it

New server APIs are also added to demonstrate this (support counting both tools input tokens and multimodal input tokens)

  • /v1/chat/completions/input_tokens
  • /v1/responses/input_tokens

In next PRs:

  • Skip preprocess audio
  • Move std/mean f32 to cgraph
  • Update for places where process_mtmd_prompt is being used for counting tokens

Requirements

@github-actions github-actions Bot added python python script changes server labels May 30, 2026
@ngxson ngxson changed the title mtmd: add "placeholder bitmap" for counting tokens w/o preprocessing mtmd, server: add "placeholder bitmap" for counting tokens , add */input_tokens API May 30, 2026
@ngxson ngxson marked this pull request as ready for review May 30, 2026 16:27
@ngxson ngxson requested review from a team as code owners May 30, 2026 16:27
@ngxson
Copy link
Copy Markdown
Contributor Author

ngxson commented May 30, 2026

not quite sure why the server linux CI fails (same on master branch), but I ran the test locally on my mac and it passes 100%

@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented May 31, 2026

ERROR unit/test_basic.py::test_server_start_simple - RuntimeError: Server process died with return code -4

Looks like the server process is getting killed with SIGILL (Illegal operation), -N maps to signal N according to https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

       Signal        x86/ARM     Alpha/   MIPS   PARISC   Notes
                   most others   SPARC
       ─────────────────────────────────────────────────────────────────
       SIGHUP           1           1       1       1
       SIGINT           2           2       2       2
       SIGQUIT          3           3       3       3
       SIGILL           4           4       4       4

@CISC
Copy link
Copy Markdown
Member

CISC commented May 31, 2026

Looks like the server process is getting killed with SIGILL (Illegal operation), -N maps to signal N according to https://docs.python.org/3/library/subprocess.html#subprocess.Popen.returncode

Yep, this is a ccache issue, I've deleted the caches and all is fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants