vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855
Open
ServeurpersoCom wants to merge 4 commits into
Open
vulkan: fuse snake activation (mul, sin, sqr, mul, add)#22855ServeurpersoCom wants to merge 4 commits into
ServeurpersoCom wants to merge 4 commits into
Conversation
Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(a*x)^2 * inv_b and rewrites it to a single elementwise kernel. test_snake_fuse from the CUDA PR now also compares CPU naive vs Vulkan fused across F32 / F16 / BF16.
jeffbolznv
reviewed
May 8, 2026
Rename T / C to ne0 / ne1 in the shader and push constants to match the standard naming convention used across the Vulkan backend. Tighten ggml_vk_can_fuse_snake: require x and dst to be contiguous (the shader uses idx = i0 + i1 * ne0) and require a / inv_b to be tightly packed on the broadcast dim (the shader reads data_a[i1]).
Contributor
Author
|
jeffbolznv
reviewed
May 9, 2026
jeffbolznv
reviewed
May 9, 2026
Contributor
|
I noticed that the unit tests only run for F32, probably because one of the ops isn't supported for the other types. I guess that's fine, the change LGTM. |
jeffbolznv
approved these changes
May 9, 2026
Contributor
Author
Confirmed on my side too. I saw the gate before coding the kernel and chose to keep the shader symmetric with the CUDA path (F32 / F16 / BF16) so the matcher and the shader are ready as soon as supports_op lifts SIN and SQR beyond F32. I'll look into that in a follow-up PR. Thanks for the thorough review! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
Vulkan version of the snake activation fusion. Symmetric counterpart of #22667 (CUDA): same matcher (mul, sin, sqr, mul, add rewritten to y = x + sin(a*x)^2 * inv_b), same broadcast contract (a / inv_b shaped [1, C] over x [T, C]), same F32 / F16 / BF16 coverage.
The shader uses a native 2D dispatch via gl_GlobalInvocationID.x/y so the c = idx / T resolution that needs fastdiv on CUDA is free here. Otherwise the design is one-to-one with the CUDA path.
Validation
test_snake_fuse from the CUDA PR is backend-agnostic and now also covers Vulkan: it builds the 5 op chain a frontend emits and compares the CPU naive path against the Vulkan fused path via run_whole_graph(), so passing implies the rewrite preserves the math.
NMSE tolerances unchanged: 5e-3 BF16, 5e-5 F16, 1e-7 F32.
Requirements