Skip to content

Conversation

@0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Nov 18, 2025

Fixes #17297

To use subgroup operations on Intel, we have to force full subgroups, otherwise the subgroup size and how many of the threads are actually active may vary. I don't think this change should have any negative effect on other drivers.

@jeffbolznv
Copy link
Collaborator

While I agree this is worth a try, I don't understand why the failing change actually needs it. It doesn't assume a specific subgroup size or mapping of invocations to subgroups.

@0cc4m
Copy link
Collaborator Author

0cc4m commented Nov 18, 2025

I don't fully understand it, but I think if VK_PIPELINE_SHADER_STAGE_CREATE_REQUIRE_FULL_SUBGROUPS_BIT is not set, a Vulkan driver may disable specific threads of a subgroup for performance reasons, which leads to undefined behaviour when subgroup operations are used.

I can't find much information about this in the specification, it's mostly experience from dealing with Intel GPUs which do vary their subgroup size. I guess someone would need to dig deeper into the ANV driver to figure out what exactly it is doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Vulkan\Llama-server.exe (b7064+) hangs during prompt processing if "--flash-attn on"

3 participants