Skip to content

vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855

Open
ServeurpersoCom wants to merge 4 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/vulkan-snake-fusion
Open

vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855
ServeurpersoCom wants to merge 4 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/vulkan-snake-fusion

Conversation

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Overview

Vulkan version of the snake activation fusion. Symmetric counterpart of #22667 (CUDA): same matcher (mul, sin, sqr, mul, add rewritten to y = x + sin(a*x)^2 * inv_b), same broadcast contract (a / inv_b shaped [1, C] over x [T, C]), same F32 / F16 / BF16 coverage.

The shader uses a native 2D dispatch via gl_GlobalInvocationID.x/y so the c = idx / T resolution that needs fastdiv on CUDA is free here. Otherwise the design is one-to-one with the CUDA path.

Validation

test_snake_fuse from the CUDA PR is backend-agnostic and now also covers Vulkan: it builds the 5 op chain a frontend emits and compares the CPU naive path against the Vulkan fused path via run_whole_graph(), so passing implies the rewrite preserves the math.
NMSE tolerances unchanged: 5e-3 BF16, 5e-5 F16, 1e-7 F32.

Requirements

Add snake.comp shader with F32 / F16 / BF16 pipelines and
ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op
decomposition emitted by audio decoders (BigVGAN, Vocos) for snake
activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single
elementwise kernel.

test_snake_fuse from the CUDA PR now also compares CPU naive vs
Vulkan fused across F32 / F16 / BF16.
@ServeurpersoCom ServeurpersoCom requested a review from a team as a code owner May 8, 2026 21:41
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels May 8, 2026
Comment thread ggml/src/ggml-vulkan/vulkan-shaders/snake.comp Outdated
Rename T / C to ne0 / ne1 in the shader and push constants to match
the standard naming convention used across the Vulkan backend.

Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous
(the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be
tightly packed on the broadcast dim (the shader reads data_a[i1]).
@ServeurpersoCom
Copy link
Copy Markdown
Contributor Author

+-----------------+----------+----------+----------+
| Config          |   Mean   |   Diff   | Speedup  |
+-----------------+----------+----------+----------+
| CUDA NOFUSION   | 616.9 ms |          |          |
| CUDA FUSION     | 432.2 ms | -184.7   |  +42.7%  |
| Vulkan NOFUSION | 950.9 ms |          |          |
| Vulkan FUSION   | 761.7 ms | -189.2   |  +24.8%  |
+-----------------+----------+----------+----------+
NOFUSION: GGML_CUDA_DISABLE_FUSION / GGML_VK_DISABLE_FUSION
Non-regression checked end-to-end on BigVGAN, output drift within BF16 roundoff.

Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp
Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp
Comment thread ggml/src/ggml-vulkan/ggml-vulkan.cpp
@jeffbolznv
Copy link
Copy Markdown
Contributor

I noticed that the unit tests only run for F32, probably because one of the ops isn't supported for the other types. I guess that's fine, the change LGTM.

@ServeurpersoCom
Copy link
Copy Markdown
Contributor Author

I noticed that the unit tests only run for F32, probably because one of the ops isn't supported for the other types. I guess that's fine, the change LGTM.

Confirmed on my side too. I saw the gate before coding the kernel and chose to keep the shader symmetric with the CUDA path (F32 / F16 / BF16) so the matcher and the shader are ready as soon as supports_op lifts SIN and SQR beyond F32. I'll look into that in a follow-up PR.

Thanks for the thorough review!

@ggerganov ggerganov requested a review from a team May 15, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants