Skip to content

vocab : reduce debug logs about non-EOG control tokens#18541

Merged
ggerganov merged 2 commits intomasterfrom
gg/context-reduce-logs
Jan 2, 2026
Merged

vocab : reduce debug logs about non-EOG control tokens#18541
ggerganov merged 2 commits intomasterfrom
gg/context-reduce-logs

Conversation

@ggerganov
Copy link
Member

  • Reduce logs for unused control tokens not being marked as EOG
  • Append control attribute instead of assigning

Some models have many unused tokens, which leads to a lot of debug logs. F.ex:

./bin/llama-server -hf ggml-org/embeddinggemma-300M-GGUF --embeddings -lv 5

...

0.00.960.046 D init_tokenizer: initializing tokenizer for type 1
0.00.966.704 D load: control token: 262141 '<unused6239>' is not marked as EOG
0.00.966.706 D load: control token: 262140 '<unused6238>' is not marked as EOG
0.00.966.706 D load: control token: 262137 '<unused6235>' is not marked as EOG
0.00.966.707 D load: control token: 262136 '<unused6234>' is not marked as EOG
0.00.966.707 D load: control token: 262135 '<unused6233>' is not marked as EOG
0.00.966.707 D load: control token: 262134 '<unused6232>' is not marked as EOG
...

@ggerganov ggerganov requested a review from CISC as a code owner January 2, 2026 08:26
@CISC
Copy link
Collaborator

CISC commented Jan 2, 2026

Just an aside...

In #18500 it was discovered that f.ex. jina-v2-en has lots of [unusedX] tokens that are marked as NORMAL, but were treated as (but not set to) CONTROL due to the [] name.

AutoTokenizer does not treat them as control tokens either, so the new behavior after that PR is the correct one.

@ggerganov ggerganov merged commit d84a6a9 into master Jan 2, 2026
69 of 71 checks passed
@ggerganov ggerganov deleted the gg/context-reduce-logs branch January 2, 2026 14:17
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* vocab : reduce debug logs about non-EOG control tokens

* cont : add comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants