-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5080, compute capability 12.0, VMM: yes
Device 1: NVIDIA GeForce GTX 1660, compute capability 7.5, VMM: yes
version: 7046 (879dec3)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
GGML backends
CUDA
Hardware
5080
Models
Qwen2.5-Coder-3B-Q8_0.gguf
Problem description & steps to reproduce
When using the llama-vscode plugin for automatic filling, it occasionally causes the server to crash.
First Bad Commit
No response
Relevant log output
slot update_slots: id 0 | task 458 | reusing chunk with size 1, shifting KV cache [505, 506) -> [24, 25)
slot update_slots: id 0 | task 458 | reusing chunk with size 4969, shifting KV cache [820, 5789) -> [25, 4994)
srv stop: cancel task, id_task = 450
srv stop: cancel task, id_task = 455
srv log_server_r: request: POST /infill 172.18.0.1 500
srv log_server_r: request: POST /infill 172.18.0.1 500
srv log_server_r: request: POST /infill 172.18.0.1 500
srv log_server_r: request: POST /infill 172.18.0.1 500
slot update_slots: id 0 | task 458 | n_tokens = 4994, memory_seq_rm [4994, end)
slot update_slots: id 0 | task 458 | prompt processing progress, n_tokens = 5666, batch.n_tokens = 672, progress = 1.000000
slot update_slots: id 0 | task 458 | prompt done, n_tokens = 5666, batch.n_tokens = 672
init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
- the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 4784
- the tokens for sequence 0 in the input batch have a starting position of Y = 4994
it is required that the sequence positions remain consecutive: Y = X + 1
decode: failed to initialize batch
llama_decode: failed to decode, ret = -1
srv update_slots: Invalid input batch. i = 0, n_batch = 2048, ret = -1
srv send_error: task id = 458, error: Invalid input batch.
slot release: id 0 | task 458 | stop processing: n_tokens = 5666, truncated = 0
srv update_slots: all slots are idle
srv stop: cancel task, id_task = 458
srv update_slots: all slots are idle
srv log_server_r: request: POST /infill 172.18.0.1 500
slot get_availabl: id 1 | task -1 | selected slot by LRU, t_last = 14806193847
slot launch_slot_: id 1 | task -1 | sampler chain: logits -> logit-bias -> dist
slot launch_slot_: id 1 | task 465 | processing task
slot update_slots: id 1 | task 465 | new prompt, n_ctx_slot = 8192, n_keep = 0, task.n_tokens = 5752
slot update_slots: id 1 | task 465 | n_past = 22, slot.prompt.tokens.size() = 350, seq_id = 1, pos_min = -1
/opt/llama.cpp/tools/server/server.cpp:3747: pos_min == -1, but n_past > 0 - should not happen: https://github.com/ggml-org/llama.cpp/pull/13833#discussion_r2116181237
/usr/local/lib/libggml-base.so.0(+0x16298)[0x7f3bbb94d298]
/usr/local/lib/libggml-base.so.0(ggml_print_backtrace+0x1e4)[0x7f3bbb94d664]
/usr/local/lib/libggml-base.so.0(ggml_abort+0x11e)[0x7f3bbb94d7ee]
llama-server(+0xccf9f)[0x55b10b371f9f]
llama-server(+0x9c1e9)[0x55b10b3411e9]
llama-server(+0x6b7c9)[0x55b10b3107c9]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x2724a)[0x7f3bbb46024a]
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f3bbb460305]
llama-server(+0x6d501)[0x55b10b312501]Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working