-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Closed
Copy link
Labels
Description
Name and Version
docker image: ghcr.io/ggml-org/llama.cpp:server-b6755
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -hf LiquidAI/LFM2-VL-1.6B
Problem description & steps to reproduce
Run llama-server -hf LiquidAI/LFM2-VL-1.6B
Open webui http://localhost:8080
Upload a photo and enter a prompt
Once the model done generating text, click on re-generate, the server should crash:
main: server is listening on http://0.0.0.0:8080 - starting the main loop
srv update_slots: all slots are idle
srv params_from_: Chat format: Content-only
slot get_availabl: id 0 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 270
slot update_slots: id 0 | task 0 | n_past = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 9, n_tokens = 9, progress = 0.033333
slot update_slots: id 0 | task 0 | n_past = 9, memory_seq_rm [9, end)
srv process_chun: processing image...
srv process_chun: image processed in 20362 ms
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 270, n_tokens = 5, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 270, n_tokens = 5
slot update_slots: id 0 | task 0 | created context checkpoint 1 of 8 (pos_min = 264, pos_max = 264, size = 0.156 MiB)
srv log_server_r: request: GET /health 127.0.0.1 200
slot print_timing: id 0 | task 0 |
prompt eval time = 20707.07 ms / 270 tokens ( 76.69 ms per token, 13.04 tokens per second)
eval time = 5239.30 ms / 142 tokens ( 36.90 ms per token, 27.10 tokens per second)
total time = 25946.36 ms / 412 tokens
slot release: id 0 | task 0 | stop processing: n_past = 411, truncated = 0
srv update_slots: all slots are idle
srv log_server_r: request: POST /v1/chat/completions 192.168.20.1 200
srv log_server_r: request: GET /health 127.0.0.1 200
srv log_server_r: request: GET /health 127.0.0.1 200
srv params_from_: Chat format: Content-only
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 1.000 (> 0.100 thold), f_keep = 0.657
slot launch_slot_: id 0 | task 144 | processing task
slot update_slots: id 0 | task 144 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 270
slot update_slots: id 0 | task 144 | old: ...
<|im_start|>assistant
slot update_slots: id 0 | task 144 | new: ...
<|im_start|>assistant
slot update_slots: id 0 | task 144 | 708 6 64015 708
slot update_slots: id 0 | task 144 | 708 6 64015 708
slot update_slots: id 0 | task 144 | n_past = 270, cache_tokens.size() = 411, seq_id = 0, pos_min = 410, n_swa = 1
slot update_slots: id 0 | task 144 | restored context checkpoint (pos_min = 264, pos_max = 264, size = 0.156 MiB)
slot update_slots: id 0 | task 144 | n_past = 265, memory_seq_rm [265, end)
libggml-base.so(+0x183cb)[0x772c4dada3cb]
libggml-base.so(ggml_print_backtrace+0x21f)[0x772c4dada82f]
libggml-base.so(+0x2b20f)[0x772c4daed20f]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x772c4d94220c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x772c4d942277]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x772c4d9424d8]
/app/llama-server(+0x8b8d9)[0x5dd3a4a818d9]
/app/llama-server(+0xf4fd6)[0x5dd3a4aeafd6]
/app/llama-server(+0x9550f)[0x5dd3a4a8b50f]
/app/llama-server(+0x552b3)[0x5dd3a4a4b2b3]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x772c4d58dd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x772c4d58de40]
/app/llama-server(+0x56d35)[0x5dd3a4a4cd35]
terminate called after throwing an instance of 'std::runtime_error'
what(): Chunk not found
First Bad Commit
No response