fix(llama-cpp): terminate tensor_buft_overrides with sentinel by richiejp · Pull Request #9919 · mudler/LocalAI

richiejp · 2026-05-21T08:54:22Z

Description

Fix tensor overrides by null terminating the override array after the overrides have been added.

Notes for Reviewers

llama.cpp's model loader asserts back().pattern == nullptr on
params.tensor_buft_overrides (and on params.kv_overrides.back().key[0]
== 0) before binding them into llama_model_params. PR #8560 attempted
to satisfy llama_params_fit's placeholder requirement by pre-filling
params.tensor_buft_overrides up to llama_max_tensor_buft_overrides()
before the option-parse loop. Any subsequent push_back from
override_tensor / draft_cpu_moe / draft_n_cpu_moe / draft_override_tensor
then appended real entries after the placeholders, leaving back() with
a real pattern and tripping the assert. The draft override vector
likewise had no terminator at all.

Mirror upstream common/arg.cpp:645-658 instead: real entries are
pushed during option parsing, and after parsing we pad the main vector
up to ntbo (placeholders land at the end, so back() is always nullptr)
and append a single {nullptr, nullptr} to the draft vector when it is
non-empty. The existing kv_overrides terminator block already matches
upstream and stays.

Verified against ggml-org/llama.cpp@5cbaa5e: only tensor_buft_overrides
(main + draft) and kv_overrides are sentinel-terminated common_params
fields; everything else is size-driven std::vector.

Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Richard Palethorpe io@richiejp.com

Signed commits

Yes, I signed my commits.

llama.cpp's model loader asserts back().pattern == nullptr on params.tensor_buft_overrides (and on params.kv_overrides.back().key[0] == 0) before binding them into llama_model_params. PR mudler#8560 attempted to satisfy llama_params_fit's placeholder requirement by pre-filling params.tensor_buft_overrides up to llama_max_tensor_buft_overrides() *before* the option-parse loop. Any subsequent push_back from override_tensor / draft_cpu_moe / draft_n_cpu_moe / draft_override_tensor then appended real entries after the placeholders, leaving back() with a real pattern and tripping the assert. The draft override vector likewise had no terminator at all. Mirror upstream common/arg.cpp:645-658 instead: real entries are pushed during option parsing, and after parsing we pad the main vector up to ntbo (placeholders land at the end, so back() is always nullptr) and append a single {nullptr, nullptr} to the draft vector when it is non-empty. The existing kv_overrides terminator block already matches upstream and stays. Verified against ggml-org/llama.cpp@5cbaa5e: only tensor_buft_overrides (main + draft) and kv_overrides are sentinel-terminated common_params fields; everything else is size-driven std::vector. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com>

mudler merged commit c68818a into mudler:master May 21, 2026
62 of 63 checks passed

localai-bot added the bug Something isn't working label May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(llama-cpp): terminate tensor_buft_overrides with sentinel#9919

fix(llama-cpp): terminate tensor_buft_overrides with sentinel#9919
mudler merged 1 commit into
mudler:masterfrom
richiejp:fix/override-tensor2

richiejp commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

richiejp commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants