Eval bug: GGML_ASSERT failed for GLM AIR after f2a789

### Name and Version

version: 6568 (f2a789e33)
built with cc (GCC) 15.2.1 20250813 for x86_64-pc-linux-gnu

### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

7900 XTX + 7900X 

### Models

https://huggingface.co/Beinsezii/GLM-4.5-Air-Q4F-Q8A-Q8SH-GGUF

### Problem description & steps to reproduce

`llama-server -hf Beinsezii/GLM-4.5-Air-Q4F-Q8A-Q8SH-GGUF -c 16384 -ncmoe 37 -ub 2048`

`http post http://localhost:8080/completion --content-type application/json { prompt: ( 1..1230 | each { 'cat ' } | str join | str trim ), n_predict: 1 }`

Any request ≥ 1230 tokens will trigger it.

Context size needs to hit at least the 10240 slot, so ≥ 9985.

Micro batch 1024 and 4096 do not seem to trigger.

### First Bad Commit

f2a789e33490deb483a2694b066b37e45524bb79

### Relevant log output

```shell
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 16384, n_keep = 0, n_prompt_tokens = 11192
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.182988
/tmp/llama.cpp/ggml/src/ggml-backend.cpp:1842: GGML_ASSERT((char *)addr + ggml_backend_buffer_get_alloc_size(buffer, tensor) <= (char *)ggml_backend_buffer_get_base(buffer) + ggml_backend_buffer_get_size(buffer)) failed
/tmp/llama.cpp/vulkan/bin/libggml-base.so(+0x146b6) [0x7f6ab416e6b6]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_print_backtrace+0x203) [0x7f6ab416eaf3]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_abort+0x130) [0x7f6ab416ec90]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_backend_tensor_alloc+0xd9) [0x7f6ab41898c9]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_gallocr_alloc_graph+0x495) [0x7f6ab4182dd5]
/tmp/llama.cpp/vulkan/bin/libggml-base.so(ggml_backend_sched_alloc_graph+0x16f) [0x7f6ab418883f]
/tmp/llama.cpp/vulkan/bin/libllama.so(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0xca) [0x7f6ab7095c9a]
/tmp/llama.cpp/vulkan/bin/libllama.so(_ZN13llama_context6decodeERK11llama_batch+0x40f) [0x7f6ab709accf]
/tmp/llama.cpp/vulkan/bin/libllama.so(llama_decode+0xe) [0x7f6ab709bc6e]
./vulkan/bin/llama-server(+0xd8631) [0x5649a0055631]
./vulkan/bin/llama-server(+0xa4388) [0x5649a0021388]
./vulkan/bin/llama-server(+0x5fb55) [0x56499ffdcb55]
/usr/lib/libc.so.6(+0x27675) [0x7f6ab3a27675]
/usr/lib/libc.so.6(__libc_start_main+0x89) [0x7f6ab3a27729]
./vulkan/bin/llama-server(+0x619a5) [0x56499ffde9a5]
/tmp/llama.cpp/launch.sh: line 25: 2882474 Aborted                    (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: GGML_ASSERT failed for GLM AIR after f2a789 #16298

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: GGML_ASSERT failed for GLM AIR after f2a789 #16298

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions