Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: server (at least) craches using VULKAN #7769

Closed
metal3d opened this issue Jun 5, 2024 · 4 comments · Fixed by #7806
Closed

Bug: server (at least) craches using VULKAN #7769

metal3d opened this issue Jun 5, 2024 · 4 comments · Fixed by #7806
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@metal3d
Copy link
Contributor

metal3d commented Jun 5, 2024

What happened?

Hi,
Compiled server using VULKAN backend (as OpenCL was removed :sad:), I can start a server with a model. But, as far as an inference is requested, I've got an error message.

PS: Vulkan is theoretically not intended to replace OpenCL

GGML_ASSERT: /home/metal3d/Projects/ML/llama.cpp/ggml-vulkan.cpp:4069: d_D != nullptr
[New LWP 787893]
[New LWP 787894]
[New LWP 787895]
[New LWP 787897]
[New LWP 787900]
[New LWP 787917]
[New LWP 787918]
[New LWP 787919]
[New LWP 787924]
[New LWP 787925]
[New LWP 787926]
[New LWP 787927]
[New LWP 787928]
[New LWP 787929]
[New LWP 787930]
[New LWP 787931]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#0  0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#1  0x00000000005d811b in ggml_print_backtrace ()
#2  0x000000000067dafc in void ggml_vk_op_f32<vk_op_unary_push_constants>(ggml_backend_vk_context*, vk_context*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, ggml_op, vk_op_unary_push_constants const&&) [clone .constprop.0] ()
#3  0x00000000006a1507 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) ()
#4  0x000000000062aad4 in ggml_backend_sched_graph_compute_async ()
#5  0x000000000056acb9 in llama_decode_internal(llama_context&, llama_batch) [clone .isra.0] ()
#6  0x000000000056c949 in llama_decode ()
#7  0x00000000004e0171 in server_context::update_slots() ()
#8  0x00000000004b1978 in server_queue::start_loop() ()
#9  0x00000000004552cc in main ()

Name and Version

version adc9ff3

Sorry: 2b33896

What operating system are you seeing the problem on?

Linux Fedora 40

Relevant log output

GGML_ASSERT: /home/metal3d/Projects/ML/llama.cpp/ggml-vulkan.cpp:4069: d_D != nullptr
[New LWP 787893]
[New LWP 787894]
[New LWP 787895]
[New LWP 787897]
[New LWP 787900]
[New LWP 787917]
[New LWP 787918]
[New LWP 787919]
[New LWP 787924]
[New LWP 787925]
[New LWP 787926]
[New LWP 787927]
[New LWP 787928]
[New LWP 787929]
[New LWP 787930]
[New LWP 787931]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#0  0x00007f5864430e03 in wait4 () from /lib64/libc.so.6
#1  0x00000000005d811b in ggml_print_backtrace ()
#2  0x000000000067dafc in void ggml_vk_op_f32<vk_op_unary_push_constants>(ggml_backend_vk_context*, vk_context*, ggml_tensor const*, ggml_tensor const*, ggml_tensor const*, ggml_tensor*, ggml_op, vk_op_unary_push_constants const&&) [clone .constprop.0] ()
#3  0x00000000006a1507 in ggml_backend_vk_graph_compute(ggml_backend*, ggml_cgraph*) ()
#4  0x000000000062aad4 in ggml_backend_sched_graph_compute_async ()
#5  0x000000000056acb9 in llama_decode_internal(llama_context&, llama_batch) [clone .isra.0] ()
#6  0x000000000056c949 in llama_decode ()
#7  0x00000000004e0171 in server_context::update_slots() ()
#8  0x00000000004b1978 in server_queue::start_loop() ()
#9  0x00000000004552cc in main ()
@metal3d metal3d added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 5, 2024
@metal3d
Copy link
Contributor Author

metal3d commented Jun 5, 2024

Sorry, using 2b33896

@metal3d
Copy link
Contributor Author

metal3d commented Jun 5, 2024

It seems that I must use f16 quantized model while bf16 fails... Is it a bug or a normal behavior ?

@stduhpf
Copy link
Contributor

stduhpf commented Jun 6, 2024

I have a similar issue, according to a quick git bisect, it seems bde7cd3 (#7640) Is the commit that introduced this issue.

Edit: oh, 2b33896 is broken too for me, but a5735e4 isn't...? I guess this was fixed at some point, but then bde7cd3 broke it again?

@stduhpf
Copy link
Contributor

stduhpf commented Jun 6, 2024

@0cc4m do you have any Idea what's up with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants