Bug: server (at least) craches using VULKAN #7769

metal3d · 2024-06-05T12:46:03Z

What happened?

Hi,
Compiled server using VULKAN backend (as OpenCL was removed :sad:), I can start a server with a model. But, as far as an inference is requested, I've got an error message.

PS: Vulkan is theoretically not intended to replace OpenCL

GGML_ASSERT: /home/metal3d/Projects/ML/llama.cpp/ggml-vulkan.cpp:4069: d_D != nullptr
[New LWP 787893]
[New LWP 787894]
[New LWP 787895]
[New LWP 787897]
[New LWP 787900]
[New LWP 787917]
[New LWP 787918]
[New LWP 787919]
[New LWP 787924]
[New LWP 787925]
[New LWP 787926]
[New LWP 787927]
[New LWP 787928]
[New LWP 787929]
[New LWP 787930]
[New LWP 787931]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#0  0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#1  0x00000000005d811b in ggml_print_backtrace ()
#2  0x000000000067dafc in void ggml_vk_op_f32<vk_op_unary_push_constants>(ggml_backend_vk_context*, vk_context*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, ggml_op, vk_op_unary_push_constants const&&) [clone .constprop.0] ()
#3  0x00000000006a1507 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) ()
#4  0x000000000062aad4 in ggml_backend_sched_graph_compute_async ()
#5  0x000000000056acb9 in llama_decode_internal(llama_context&, llama_batch) [clone .isra.0] ()
#6  0x000000000056c949 in llama_decode ()
#7  0x00000000004e0171 in server_context::update_slots() ()
#8  0x00000000004b1978 in server_queue::start_loop() ()
#9  0x00000000004552cc in main ()

Name and Version

version adc9ff3

Sorry: 2b33896

What operating system are you seeing the problem on?

Linux Fedora 40

Relevant log output

GGML_ASSERT: /home/metal3d/Projects/ML/llama.cpp/ggml-vulkan.cpp:4069: d_D != nullptr
[New LWP 787893]
[New LWP 787894]
[New LWP 787895]
[New LWP 787897]
[New LWP 787900]
[New LWP 787917]
[New LWP 787918]
[New LWP 787919]
[New LWP 787924]
[New LWP 787925]
[New LWP 787926]
[New LWP 787927]
[New LWP 787928]
[New LWP 787929]
[New LWP 787930]
[New LWP 787931]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#0  0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#1  0x00000000005d811b in ggml_print_backtrace ()
#2  0x000000000067dafc in void ggml_vk_op_f32<vk_op_unary_push_constants>(ggml_backend_vk_context*, vk_context*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, ggml_op, vk_op_unary_push_constants const&&) [clone .constprop.0] ()
#3  0x00000000006a1507 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) ()
#4  0x000000000062aad4 in ggml_backend_sched_graph_compute_async ()
#5  0x000000000056acb9 in llama_decode_internal(llama_context&, llama_batch) [clone .isra.0] ()
#6  0x000000000056c949 in llama_decode ()
#7  0x00000000004e0171 in server_context::update_slots() ()
#8  0x00000000004b1978 in server_queue::start_loop() ()
#9  0x00000000004552cc in main ()

The text was updated successfully, but these errors were encountered:

metal3d · 2024-06-05T12:49:02Z

Sorry, using 2b33896

metal3d · 2024-06-05T13:23:14Z

It seems that I must use f16 quantized model while bf16 fails... Is it a bug or a normal behavior ?

stduhpf · 2024-06-06T12:45:05Z

I have a similar issue, according to a quick git bisect, it seems bde7cd3 (#7640) Is the commit that introduced this issue.

Edit: oh, 2b33896 is broken too for me, but a5735e4 isn't...? I guess this was fixed at some point, but then bde7cd3 broke it again?

stduhpf · 2024-06-06T13:25:58Z

@0cc4m do you have any Idea what's up with that?

metal3d added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 5, 2024

metal3d mentioned this issue Jun 5, 2024

Please compile also clblast version! #7768

Open

rhjdvsgsgks mentioned this issue Jun 6, 2024

Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f #7730

Closed

slaren mentioned this issue Jun 6, 2024

vulkan : reuse parent extra for views #7806

Merged

slaren linked a pull request Jun 6, 2024 that will close this issue

vulkan : reuse parent extra for views #7806

Merged

0cc4m closed this as completed in #7806 Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: server (at least) craches using VULKAN #7769

Bug: server (at least) craches using VULKAN #7769

metal3d commented Jun 5, 2024 •

edited

Loading

metal3d commented Jun 5, 2024

metal3d commented Jun 5, 2024

stduhpf commented Jun 6, 2024 •

edited

Loading

stduhpf commented Jun 6, 2024

Bug: server (at least) craches using VULKAN #7769

Bug: server (at least) craches using VULKAN #7769

Comments

metal3d commented Jun 5, 2024 • edited Loading

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

metal3d commented Jun 5, 2024

metal3d commented Jun 5, 2024

stduhpf commented Jun 6, 2024 • edited Loading

stduhpf commented Jun 6, 2024

metal3d commented Jun 5, 2024 •

edited

Loading

stduhpf commented Jun 6, 2024 •

edited

Loading