Misc. bug: ThreadSanitizer: data race when running test-thread-safety tests on x86 and s390x

### Name and Version

### x86

```
$ build/bin/llama-cli --version

version: 6573 (e7a5130a2)
built with cc (GCC) 14.3.1 20250523 (Red Hat 14.3.1-1) for x86_64-redhat-linux
```

### s390x

```
$ build/bin/llama-cli --version

version: 6492 (885a6ed5c)
built with gcc (GCC) 15.1.0 for s390x-redhat-linux
```

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

Test code

### Command line

```shell
# On x86

$ build/bin/test-thread-safety -hf ggml-org/models -hff tinyllamas/stories15M-q4_0.gguf -ngl 99 -p "The meaning of life is" -n 1 -c 256 -ub 32 -np 4 -t 2

# On s390x

$ build/bin/test-thread-safety -m /devfield/taronaeo/hf_models/stories15M-be.Q4_0.gguf -ngl 99 -p "The meaning of life is" -n 1 -c 256 -ub 32 -np 4 -t 2
```

### Problem description & steps to reproduce

### Steps to reproduce

As above.

### Problem description

Running the `test-thread-safety` tests with `-DLLAMA_SANITIZE_THREAD=OFF` showed no problems. However, when `-DLLAMA_SANITIZE_THREAD=ON`, it fails with warnings like these

```
==================
Model 1/2, Context 4/4: The meaning of life is a

Model 2/2, Context 3/4: The meaning of life is long

Model 1/2, Context 2/4: The meaning of life is a

Model 2/2, Context 4/4: The meaning of life is a

==================
WARNING: ThreadSanitizer: data race (pid=1482004)
  Write of size 4 at 0xbf100005d174 by thread T54:
    #0 ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool) <null> (libggml-cpu.so+0xa6b91) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #1 ggml_compute_forward_rope <null> (libggml-cpu.so+0xc3a67) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #2 ggml_graph_compute_thread.isra.0 <null> (libggml-cpu.so+0x15521) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #3 ggml_graph_compute._omp_fn.0 <null> (libggml-cpu.so+0x156dd) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #4 gomp_thread_start ../../../gcc-15.1.0-src/libgomp/team.c:129 (libgomp.so.1+0x233ed) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)

  Previous write of size 1 at 0xbf100005d174 by thread T47:
    #0 quantize_row_q8_0 <null> (libggml-cpu.so+0x10b5ef) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #1 ggml_compute_forward_mul_mat <null> (libggml-cpu.so+0x13635) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #2 ggml_graph_compute_thread.isra.0 <null> (libggml-cpu.so+0x15287) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #3 ggml_graph_compute._omp_fn.0 <null> (libggml-cpu.so+0x156dd) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #4 GOMP_parallel ../../../gcc-15.1.0-src/libgomp/parallel.c:178 (libgomp.so.1+0x18465) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)
    #5 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x173cf) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #6 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #7 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #8 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #9 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #10 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #11 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x102a311) (BuildId: fa75a093058478083ba71485087678931ae65af5)
    #13 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)

  Location is heap block of size 5408 at 0xbf100005d000 allocated by thread T47:
    #0 operator new[](unsigned long) ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_new_delete.cpp:70 (libtsan.so.2+0x8e50b) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
    #1 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x1734b) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #2 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #3 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #4 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #5 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #6 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #7 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x1029f7b) (BuildId: fa75a093058478083ba71485087678931ae65af5)
    #9 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)

  Thread T54 (tid=1482063, running) created by thread T47 at:
    #0 pthread_create ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1041 (libtsan.so.2+0x647e3) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
    #1 gomp_team_start ../../../gcc-15.1.0-src/libgomp/team.c:859 (libgomp.so.1+0x2399b) (BuildId: 1260605dd662b26fdf634f98b2cb00aa005c76e4)
    #2 ggml_backend_cpu_graph_compute(ggml_backend*, ggml_cgraph*) <null> (libggml-cpu.so+0x173cf) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8)
    #3 ggml_backend_graph_compute_async <null> (libggml-base.so+0x37e5d) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #4 ggml_backend_sched_graph_compute_async <null> (libggml-base.so+0x41779) (BuildId: 9a1b0dd2116330cc7362e57a49756cb4737a57ca)
    #5 llama_context::graph_compute(ggml_cgraph*, bool) <null> (libllama.so+0x9d319) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #6 llama_context::process_ubatch(llama_ubatch const&, llm_graph_type, llama_memory_context_i*, ggml_status&) <null> (libllama.so+0x9da3b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #7 llama_context::decode(llama_batch const&) <null> (libllama.so+0xa7c45) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #8 llama_decode <null> (libllama.so+0xa912b) (BuildId: 3a4372146df9c4d47be1488c0d38650a3801294b)
    #9 std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() <null> (test-thread-safety+0x1029f7b) (BuildId: fa75a093058478083ba71485087678931ae65af5)
    #10 execute_native_thread_routine ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:104 (libstdc++.so.6+0x10ba5d) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)

  Thread T47 (tid=1482053, running) created by main thread at:
    #0 pthread_create ../../../../gcc-15.1.0-src/libsanitizer/tsan/tsan_interceptors_posix.cpp:1041 (libtsan.so.2+0x647e3) (BuildId: 191f5b3fa68e2fe752f1bc5fcd75f4139bade7da)
    #1 __gthread_create(unsigned long*, void* (*)(void*), void*) /devfield/taronaeo/gcc-15.1.0-build/s390x-redhat-linux/libstdc++-v3/include/s390x-redhat-linux/bits/gthr-default.h:709 (libstdc++.so.6+0x10bba5) (BuildId: 46543fbd6e17080db802196d5fe8b465aed8f200)
    #2 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) ../../../../../gcc-15.1.0-src/libstdc++-v3/src/c++11/thread.cc:172 (libstdc++.so.6+0x10bba5)
    #3 __libc_start_call_main <null> (libc.so.6+0x2a641) (BuildId: 6d6d6b5b19538c7e90ca1433b99afbd42605ea4d)

SUMMARY: ThreadSanitizer: data race (/devfield/taronaeo/llama.cpp/build/bin/libggml-cpu.so+0xa6b91) (BuildId: e9373907839b72312ebac1e2ff855d49f1b0e5a8) in ggml_compute_forward_rope_f32(ggml_compute_params const*, ggml_tensor*, bool)
```

### First Bad Commit

N/A

### Relevant log output

```shell
See https://gist.github.com/taronaeo/dc9d3ef76c0124d805a84b1a77db6c44 for x86
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: ThreadSanitizer: data race when running test-thread-safety tests on x86 and s390x #16245

Name and Version

x86

s390x

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Steps to reproduce

Problem description

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: ThreadSanitizer: data race when running test-thread-safety tests on x86 and s390x #16245

Description

Name and Version

x86

s390x

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Steps to reproduce

Problem description

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions