ggml-cpu : add runtime rvv detection #17496

xctan · 2025-11-25T15:11:32Z

Also added VLEN-agnostic kernel selection to ggml_vec_dot_q2_K_q8_K for RVV-disabled and wider devices.

Also vlen-agnostic kernel selection is added to ggml_vec_dot_q2_K_q8_K for rvv-disabled and wider devices.

xctan · 2025-11-25T15:13:06Z

Further refactoring of related kernels is planned for follow-up PRs.

ggerganov · 2025-11-26T09:25:29Z

ggml/src/ggml-cpu/arch/riscv/quants.c

+    // allow benign data race here
+    static volatile ggml_vec_dot_t func_ptr = NULL;
+    ggml_vec_dot_t func = func_ptr;
+    if (func == NULL) {
+        func = ggml_vec_dot_q2_K_q8_K_generic;
+    #if defined(__riscv_v)
+        const int vlen = ggml_cpu_get_riscv_vlen();
+        if (vlen >= 256) {
+            func = ggml_vec_dot_q2_K_q8_K_rvv256;
+        } else if (vlen >= 128) {
+            func = ggml_vec_dot_q2_K_q8_K_rvv128;
+        }
+    #endif
+        func_ptr = func;
+    }


Although it is indeed benign, would be better to avoid the race to not trip thread sanitizers. What is the concern that makes you introduce the func_ptr cache - want to avoid repeated calls to ggml_cpu_get_riscv_vlen()?

To simplify compilation and avoid building the CPU library twice for different targets, I've implemented detection for devices lacking RVV support via an additional getauxval() call. This guarding is essential because the VLEN query instruction will trigger a SIGILL signal if RVV is absent or disabled, making ggml_cpu_get_riscv_vlen() inherently heavier than a direct __riscv_vlenb() call.

For improved TSAN compatibility, I propose modifying the initialization to use a relaxed atomic store. Currently, ggml_vec_dot_q2_K_q8_K() is compiled to a light-weight 4-instruction trampoline before its initialization logic, as shown here:

000000000005d8f4 <ggml_vec_dot_q2_K_q8_K>: 5d8f4: 00089317 auipc t1,0x89 5d8f8: ec433303 ld t1,-316(t1) # e67b8 <func_ptr.0> 5d8fc: 00030363 beqz t1,5d902 <ggml_vec_dot_q2_K_q8_K+0xe> 5d900: 8302 jr t1 // [snip]

I will proceed with implementing a TSAN-friendly version.

If you have an idea how to resolve the TSAN via atomics that should be OK. I was thinking there is probably a simple way by caching the vlen result:

// this is thread-safe static const int vlen = ggml_cpu_get_riscv_vlen();

And then this vlen could be used to index a static table of functions, f.ex:

const int idx = vlen/128; func = func_table[idx];

Ignore my comment if I am missing something.

While a static lookup table isn't future-proof for wider VLEN devices, we can clamp the detected VLEN stored in the cache. I'll try your suggestion.

ggml-cpu : add runtime rvv detection

4739e47

Also vlen-agnostic kernel selection is added to ggml_vec_dot_q2_K_q8_K for rvv-disabled and wider devices.

xctan requested a review from ggerganov as a code owner November 25, 2025 15:11

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 25, 2025

loci-dev mentioned this pull request Nov 25, 2025

UPSTREAM PR #17496: ggml-cpu : add runtime rvv detection auroralabs-loci/llama.cpp#321

Open

ggerganov reviewed Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu : add runtime rvv detection #17496

ggml-cpu : add runtime rvv detection #17496

xctan commented Nov 25, 2025

Uh oh!

xctan commented Nov 25, 2025

Uh oh!

ggerganov Nov 26, 2025

Uh oh!

xctan Nov 26, 2025

Uh oh!

ggerganov Nov 26, 2025

Uh oh!

xctan Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggml-cpu : add runtime rvv detection #17496

Are you sure you want to change the base?

ggml-cpu : add runtime rvv detection #17496

Conversation

xctan commented Nov 25, 2025

Uh oh!

xctan commented Nov 25, 2025

Uh oh!

ggerganov Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xctan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

xctan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants