-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Closed
Labels
Description
Name and Version
x86
$ build/bin/llama-cli --version
version: 6573 (e7a5130a2)
built with cc (GCC) 14.3.1 20250523 (Red Hat 14.3.1-1) for x86_64-redhat-linux
s390x
$ build/bin/llama-cli --version
version: 6492 (885a6ed5c)
built with gcc (GCC) 15.1.0 for s390x-redhat-linux
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Test code
Command line
# On x86
$ build/bin/test-thread-safety -hf ggml-org/models -hff tinyllamas/stories15M-q4_0.gguf -ngl 99 -p "The meaning of life is" -n 1 -c 256 -ub 32 -np 4 -t 2
# On s390x
$ build/bin/test-thread-safety -m /devfield/taronaeo/hf_models/stories15M-be.Q4_0.gguf -ngl 99 -p "The meaning of life is" -n 1 -c 256 -ub 32 -np 4 -t 2
Problem description & steps to reproduce
Steps to reproduce
As above.
Problem description
Running the test-thread-safety
tests with -DLLAMA_SANITIZE_THREAD=OFF
showed no problems. However, when -DLLAMA_SANITIZE_THREAD=ON
, it fails with warnings like these
==================
Model 1/2, Context 4/4: The meaning of life is a
Model 2/2, Context 3/4: The meaning of life is long
Model 1/2, Context 2/4: The meaning of life is a
Model 2/2, Context 4/4: The meaning of life is a
==================
WARNING: ThreadSanitizer: data race (pid=1482004)
Write of size 4 at 0xbf100005d174 by thread T54:
#0 ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) <null> (libggml-cpu.so+0xa6b91) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#1 ggml_compute_forward_rope <null> (libggml-cpu.so+0xc3a67) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#2 ggml_graph_compute_thread.isra.0 <null> (libggml-cpu.so+0x15521) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#3 ggml_graph_compute._omp_fn.0 <null> (libggml-cpu.so+0x156dd) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#4 gomp_thread_start ../../../gcc-15.1.0-src/libgomp/team.c:129 (libgomp.so.1+0x233ed) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)
Previous write of size 1 at 0xbf100005d174 by thread T47:
#0 quantize_row_q8_0 <null> (libggml-cpu.so+0x10b5ef) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#1 ggml_compute_forward_mul_mat <null> (libggml-cpu.so+0x13635) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#2 ggml_graph_compute_thread.isra.0 <null> (libggml-cpu.so+0x15287) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#3 ggml_graph_compute._omp_fn.0 <null> (libggml-cpu.so+0x156dd) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#4 GOMP_parallel ../../../gcc-15.1.0-src/libgomp/parallel.c:178 (libgomp.so.1+0x18465) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)
#5 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x173cf) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#6 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#7 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#8 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#9 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#10 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#11 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x102a311) (BuildId: fa75a093058478083ba71485087678931ae65af5)
#13 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)
Location is heap block of size 5408 at 0xbf100005d000 allocated by thread T47:
#0 operator new[](unsigned long) ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_new_delete.cpp:70 (libtsan.so.2+0x8e50b) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
#1 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x1734b) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#2 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#3 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#4 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#5 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#6 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#7 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x1029f7b) (BuildId: fa75a093058478083ba71485087678931ae65af5)
#9 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)
Thread T54 (tid=1482063, running) created by thread T47 at:
#0 pthread_create ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1041 (libtsan.so.2+0x647e3) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
#1 gomp_team_start ../../../gcc-15.1.0-src/libgomp/team.c:859 (libgomp.so.1+0x2399b) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)
#2 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x173cf) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
#3 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#4 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
#5 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#6 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#7 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#8 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
#9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x1029f7b) (BuildId: fa75a093058478083ba71485087678931ae65af5)
#10 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)
Thread T47 (tid=1482053, running) created by main thread at:
#0 pthread_create ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1041 (libtsan.so.2+0x647e3) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
#1 __gthread_create(unsigned long*, void* (*)(void*), void*) /devfield/taronaeo/gcc-15.1.0-build/s390x-redhat-linux/libstdc++-v3/include/s390x-redhat-linux/bits/gthr-default.h:709 (libstdc++.so.6+0x10bba5) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)
#2 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:172 (libstdc++.so.6+0x10bba5)
#3 __libc_start_call_main <null> (libc.so.6+0x2a641) (BuildId: 6d6d6b5b19538c7e90ca1433b99afbd42605ea4d)
SUMMARY: ThreadSanitizer: data race (/devfield/taronaeo/llama.cpp/build/bin/libggml-cpu.so+0xa6b91) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8) in ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool)
First Bad Commit
N/A
Relevant log output
See https://gist.github.com/taronaeo/dc9d3ef76c0124d805a84b1a77db6c44 for x86