vulkan: fuse snake activation (mul, sin, sqr, mul, add) by ServeurpersoCom · Pull Request #22855 · ggml-org/llama.cpp

ServeurpersoCom · 2026-05-08T21:41:24Z

Overview

Vulkan version of the snake activation fusion. Symmetric counterpart of #22667 (CUDA): same matcher (mul, sin, sqr, mul, add rewritten to y = x + sin(a*x)^2 * inv_b), same broadcast contract (a / inv_b shaped [1, C] over x [T, C]), same F32 / F16 / BF16 coverage.

The shader uses a native 2D dispatch via gl_GlobalInvocationID.x/y so the c = idx / T resolution that needs fastdiv on CUDA is free here. Otherwise the design is one-to-one with the CUDA path.

Validation

test_snake_fuse from the CUDA PR is backend-agnostic and now also covers Vulkan: it builds the 5 op chain a frontend emits and compares the CPU naive path against the Vulkan fused path via run_whole_graph(), so passing implies the rewrite preserves the math.
NMSE tolerances unchanged: 5e-3 BF16, 5e-5 F16, 1e-7 F32.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES Opus 4.7 + rootless pod

Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel. test_snake_fuse from the CUDA PR now also compares CPU naive vs Vulkan fused across F32 / F16 / BF16.

Rename T / C to ne0 / ne1 in the shader and push constants to match the standard naming convention used across the Vulkan backend. Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous (the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be tightly packed on the broadcast dim (the shader reads data_a[i1]).

ServeurpersoCom · 2026-05-09T07:07:07Z

+-----------------+----------+----------+----------+
| Config          |   Mean   |   Diff   | Speedup  |
+-----------------+----------+----------+----------+
| CUDA NOFUSION   | 616.9 ms |          |          |
| CUDA FUSION     | 432.2 ms | -184.7   |  +42.7%  |
| Vulkan NOFUSION | 950.9 ms |          |          |
| Vulkan FUSION   | 761.7 ms | -189.2   |  +24.8%  |
+-----------------+----------+----------+----------+
NOFUSION: GGML_CUDA_DISABLE_FUSION / GGML_VK_DISABLE_FUSION
Non-regression checked end-to-end on BigVGAN, output drift within BF16 roundoff.

…ffbolznv review)

…nv review)

jeffbolznv · 2026-05-09T16:07:40Z

I noticed that the unit tests only run for F32, probably because one of the ops isn't supported for the other types. I guess that's fine, the change LGTM.

ServeurpersoCom · 2026-05-09T16:15:09Z

I noticed that the unit tests only run for F32, probably because one of the ops isn't supported for the other types. I guess that's fine, the change LGTM.

Confirmed on my side too. I saw the gate before coding the kernel and chose to keep the shader symmetric with the CUDA path (F32 / F16 / BF16) so the matcher and the shader are ready as soon as supports_op lifts SIN and SQR beyond F32. I'll look into that in a follow-up PR.

Thanks for the thorough review!

ServeurpersoCom requested a review from a team as a code owner May 8, 2026 21:41

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 8, 2026

jeffbolznv reviewed May 8, 2026

View reviewed changes

Comment thread ggml/src/ggml-vulkan/vulkan-shaders/snake.comp Outdated

jeffbolznv reviewed May 9, 2026

View reviewed changes

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp

jeffbolznv reviewed May 9, 2026

View reviewed changes

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp

ServeurpersoCom added 2 commits May 9, 2026 17:25

vulkan: tighten snake fusion type checks for all operands (address je…

0048ab5

…ffbolznv review)

vulkan: reject snake fusion when ne[2] or ne[3] > 1 (address jeffbolz…

a4f2dd3

…nv review)

jeffbolznv approved these changes May 9, 2026

View reviewed changes

ServeurpersoCom mentioned this pull request May 10, 2026

Ggml/cuda snake fusion hardening #22912

Merged

ggerganov requested a review from a team May 15, 2026 05:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855

vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855
ServeurpersoCom wants to merge 4 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/vulkan-snake-fusion

ServeurpersoCom commented May 8, 2026

Uh oh!

Uh oh!

ServeurpersoCom commented May 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffbolznv commented May 9, 2026

Uh oh!

ServeurpersoCom commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ServeurpersoCom commented May 8, 2026

Overview

Validation

Requirements

Uh oh!

Uh oh!

ServeurpersoCom commented May 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffbolznv commented May 9, 2026

Uh oh!

ServeurpersoCom commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants