-
Notifications
You must be signed in to change notification settings - Fork 13.2k
Closed
Labels
Description
Name and Version
version: 6568 (f2a789e)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu
Operating systems
Linux
GGML backends
Vulkan
Hardware
7900 XTX + 7900X
Models
https://huggingface.co/Beinsezii/GLM-4.5-Air-Q4F-Q8A-Q8SH-GGUF
Problem description & steps to reproduce
llama-server -hf Beinsezii/GLM-4.5-Air-Q4F-Q8A-Q8SH-GGUF -c 16384 -ncmoe 37 -ub 2048
http post http://localhost:8080/completion --content-type application/json { prompt: ( 1..1230 | each { 'cat ' } | str join | str trim ), n_predict: 1 }
Any request ≥ 1230 tokens will trigger it.
Context size needs to hit at least the 10240 slot, so ≥ 9985.
Micro batch 1024 and 4096 do not seem to trigger.
First Bad Commit
Relevant log output
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = 0, n_prompt_tokens = 11192
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.182988
/tmp/llama.cpp/ggml/src/ggml-backend.cpp:1842: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
/tmp/llama.cpp/vulkan/bin/libggml-base.so(+0x146b6) [0x7f6ab416e6b6]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_print_backtrace+0x203) [0x7f6ab416eaf3]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_abort+0x130) [0x7f6ab416ec90]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_backend_tensor_alloc+0xd9) [0x7f6ab41898c9]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_gallocr_alloc_graph+0x495) [0x7f6ab4182dd5]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_backend_sched_alloc_graph+0x16f) [0x7f6ab418883f]
/tmp/llama.cpp/vulkan/bin/libllama.so(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xca) [0x7f6ab7095c9a]
/tmp/llama.cpp/vulkan/bin/libllama.so(_ZN13llama_context6decodeERK11llama_batch+0x40f) [0x7f6ab709accf]
/tmp/llama.cpp/vulkan/bin/libllama.so(llama_decode+0xe) [0x7f6ab709bc6e]
./vulkan/bin/llama-server(+0xd8631) [0x5649a0055631]
./vulkan/bin/llama-server(+0xa4388) [0x5649a0021388]
./vulkan/bin/llama-server(+0x5fb55) [0x56499ffdcb55]
/usr/lib/libc.so.6(+0x27675) [0x7f6ab3a27675]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x7f6ab3a27729]
./vulkan/bin/llama-server(+0x619a5) [0x56499ffde9a5]
/tmp/llama.cpp/launch.sh: line 25: 2882474 Aborted (core dumped)