ggml : fix graph reallocation with multiple chunks #16396

Acly · 2025-10-03T01:34:32Z

Reallocation is needed if a single chunk grows in size, even if total allocation size stays the same or is lower.

I was able to reproduce #16383 and confirm this was the cause of the issue. Had to use a remote machine with more VRAM though and it took few tries. This is a cleaned up version of the fix that I haven't verified again so far - can probably do it later today.

Also added a simple test which triggered the same assertion prior to the fix.

reallocation is needed if a single chunk grows in size, even if total allocation size stays the same or is lower

Beinsezii · 2025-10-03T02:40:48Z

Fixes #16298 as well it seems

* origin/master: (124 commits) metal : fix loop bound in ggml_mem_ranges (ggml-org#16412) llama : fix shapes for bert/mpt q/k norm (ggml-org#16409) ggml : fix graph reallocation with multiple chunks (ggml-org#16396) Fix missing messages on sibling navigation (ggml-org#16408) vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (ggml-org#16354) vulkan: Fix FA coopmat1 invalid array indexing (ggml-org#16365) ci : change macos-13 to macos-15-intel (ggml-org#16401) Capture model name only after first token (streaming) or completed request (ggml-org#16405) vulkan: in flash attention, bounds check against nem1 (don't rely on GGML_KQ_MASK_PAD) (ggml-org#16316) webui : Fix messages payload sent to chat completions (ggml-org#16402) fix: track viewportHeight via window.innerHeight to avoid unwanted scrolling (ggml-org#16356) test-barrier : do not use more threads than physically available (ggml-org#16389) ggml webgpu: add support for soft_max, optimize rms_norm (ggml-org#16357) model : Apertus model implementation (ggml-org#15852) musa: update compile flags (ggml-org#16265) ci : fix ubuntu-latest-cmake-rpc (disable ccache) (ggml-org#16388) ci: update vulkan ci (ggml-org#16294) ci : fix clean-up of old logs (ggml-org#16381) SYCL: Update to oneAPI 2025.2 (ggml-org#16371) HIP: add IMbackK to codeowner (ggml-org#16375) ...

ggml : fix graph reallocation with multiple chunks

73235c8

reallocation is needed if a single chunk grows in size, even if total allocation size stays the same or is lower

Acly requested review from ggerganov and slaren as code owners October 3, 2025 01:34

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Oct 3, 2025

This was linked to issues Oct 3, 2025

Eval bug: GGML_ASSERT failed for GLM AIR after f2a789 #16298

Closed

Misc. bug: Core dumped with Vulkan using Default Physical Batch Size. #16383

Closed

slaren approved these changes Oct 3, 2025

View reviewed changes

slaren merged commit 638d330 into ggml-org:master Oct 3, 2025
66 of 68 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix graph reallocation with multiple chunks #16396

ggml : fix graph reallocation with multiple chunks #16396

Acly commented Oct 3, 2025

Uh oh!

Beinsezii commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

ggml : fix graph reallocation with multiple chunks #16396

ggml : fix graph reallocation with multiple chunks #16396

Conversation

Acly commented Oct 3, 2025

Uh oh!

Beinsezii commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!