Skip to content

Misc. bug: Core dumped with Vulkan using Default Physical Batch Size. #16383

@Nabokov86

Description

@Nabokov86

Name and Version

ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon Graphics (RADV RENOIR) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
version: 6665 (95ce098)
built with cc (Ubuntu 14.2.0-19ubuntu2) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux (Ubuntu 25.04 plucky)

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server --gpu-layers 100 -dev Vulkan0 -t 1 -tb 1 --no-mmap -c 10000 -fa off -m ./gpt/gpt-oss-20b-mxfp4.gguf

Problem description & steps to reproduce

After the recent changes, llama.cpp is consistently crashing when using Vulkan with the default physical batch size (-ub 512).

I've found that reducing the physical batch size to 401 or lower seems to work correctly.

First Bad Commit

No response

Relevant log output

...
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.590457
/llama.cpp/ggml/src/ggml-backend.cpp:1850: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
[New LWP 82368]
[New LWP 82367]
[New LWP 82366]
[New LWP 82365]
[New LWP 82364]
[New LWP 82363]
[New LWP 82362]
[New LWP 82361]
[New LWP 82360]
[New LWP 82359]
[New LWP 82358]
[New LWP 82357]
[New LWP 82356]
[New LWP 82355]
[New LWP 82354]
[New LWP 82353]
[New LWP 82352]
[New LWP 82350]
[New LWP 82349]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
Function(s) ^std::(move|forward|as_const|(__)?addressof) will be skipped when stepping.
Function(s) ^std::(shared|unique)_ptr<.*>::(get|operator) will be skipped when stepping.
Function(s) ^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty) will be skipped when stepping.
Function(s) ^std::(basic_string|vector|array|deque|span)<.*>::operator.] will be skipped when stepping.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
warning: 56	../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S: No such file or directory
#0  __syscall_cancel_arch () at ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S:56
56	in ../sysdeps/unix/sysv/linux/x86_64/syscall_cancel.S
#1  0x000077136fc9eb63 in __internal_syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=0, a6=0, nr=61) at ./nptl/cancellation.c:49
warning: 49	./nptl/cancellation.c: No such file or directory
#2  __syscall_cancel (a1=<optimized out>, a2=<optimized out>, a3=<optimized out>, a4=<optimized out>, a5=a5@entry=0, a6=a6@entry=0, nr=61) at ./nptl/cancellation.c:75
75	in ./nptl/cancellation.c
#3  0x000077136fd1ae9f in __GI___wait4 (pid=<optimized out>, stat_loc=<optimized out>, options=<optimized out>, usage=<optimized out>) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#4  0x00007713706cdf13 in ggml_print_backtrace () from /llama.cpp/build/bin/libggml-base.so
#5  0x00007713706ce0bb in ggml_abort () from /llama.cpp/build/bin/libggml-base.so
#6  0x00007713706e853d in ggml_backend_tensor_alloc () from /llama.cpp/build/bin/libggml-base.so
#7  0x00007713706e2367 in ggml_gallocr_alloc_graph () from /llama.cpp/build/bin/libggml-base.so
#8  0x00007713706e818f in ggml_backend_sched_alloc_graph () from /llama.cpp/build/bin/libggml-base.so
#9  0x000077137049fa88 in llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) () from /llama.cpp/build/bin/libllama.so
#10 0x00007713704a4c5f in llama_context::decode(llama_batch const&) () from /llama.cpp/build/bin/libllama.so
#11 0x00007713704a5c0f in llama_decode () from /llama.cpp/build/bin/libllama.so
#12 0x000062cb89e63220 in server_context::update_slots() ()
#13 0x000062cb89e263b2 in server_queue::start_loop() ()
#14 0x000062cb89de2bc9 in main ()
[Inferior 1 (process 82348) detached]
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions