Skip to content

Conversation

@o7si
Copy link
Contributor

@o7si o7si commented Nov 12, 2025

In the constructor of llm_tokenizer_ugm, xcda_array is an array with xcda_array_size elements:

xcda_array = (const uint32_t *) &precompiled_charsmap[charsmap_offset];
xcda_array_size = xcda_blob_size / sizeof(uint32_t);

The valid index range is [0, xcda_array_size).

When xcda_array_view is constructed using these parameters:

struct xcda_array_view xcda_view(tokenizer.xcda_array, tokenizer.xcda_array_size);

The accessible index range remains [0, xcda_array_size).

Therefore, in the xcda_array_view::get_node function, it should use >= to check for bounds violation instead of >.

@CISC
Copy link
Collaborator

CISC commented Nov 12, 2025

Nice catch, thank you, merging when CIs are done.

@CISC CISC merged commit ffb6f3d into ggml-org:master Nov 12, 2025
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants