Skip to content

Conversation

Acly
Copy link
Collaborator

@Acly Acly commented Oct 3, 2025

Reallocation is needed if a single chunk grows in size, even if total allocation size stays the same or is lower.

I was able to reproduce #16383 and confirm this was the cause of the issue. Had to use a remote machine with more VRAM though and it took few tries. This is a cleaned up version of the fix that I haven't verified again so far - can probably do it later today.

Also added a simple test which triggered the same assertion prior to the fix.

reallocation is needed if a single chunk grows in size,
even if total allocation size stays the same or is lower
@Acly Acly requested review from ggerganov and slaren as code owners October 3, 2025 01:34
@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Oct 3, 2025
@Beinsezii
Copy link
Contributor

Fixes #16298 as well it seems

@slaren slaren merged commit 638d330 into ggml-org:master Oct 3, 2025
66 of 68 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Oct 3, 2025
* origin/master: (124 commits)
metal : fix loop bound in ggml_mem_ranges (ggml-org#16412)
llama : fix shapes for bert/mpt q/k norm (ggml-org#16409)
ggml : fix graph reallocation with multiple chunks (ggml-org#16396)
Fix missing messages on sibling navigation (ggml-org#16408)
vulkan: Replace uses of maxMemoryAllocationSize and VK_WHOLE_SIZE (ggml-org#16354)
vulkan: Fix FA coopmat1 invalid array indexing (ggml-org#16365)
ci : change macos-13 to macos-15-intel (ggml-org#16401)
Capture model name only after first token (streaming) or completed request (ggml-org#16405)
vulkan: in flash attention, bounds check against nem1 (don't rely on GGML_KQ_MASK_PAD) (ggml-org#16316)
webui : Fix messages payload sent to chat completions (ggml-org#16402)
fix: track viewportHeight via window.innerHeight to avoid unwanted scrolling (ggml-org#16356)
test-barrier : do not use more threads than physically available (ggml-org#16389)
ggml webgpu: add support for soft_max, optimize rms_norm (ggml-org#16357)
model : Apertus model implementation (ggml-org#15852)
musa: update compile flags (ggml-org#16265)
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (ggml-org#16388)
ci: update vulkan ci (ggml-org#16294)
ci : fix clean-up of old logs (ggml-org#16381)
SYCL: Update to oneAPI 2025.2 (ggml-org#16371)
HIP: add IMbackK to codeowner (ggml-org#16375)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Misc. bug: Core dumped with Vulkan using Default Physical Batch Size. Eval bug: GGML_ASSERT failed for GLM AIR after f2a789
3 participants