Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subgroups proposal #4368
Add subgroups proposal #4368
Changes from all commits
edb05dc
72ea9d2
15a1ef5
a89ad23
590bb0d
70075ed
77d43bb
c665322
5e45780
145392c
161220e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these percentages are fractions of unique reports.
It would be more meaningful to weight by shipped (and active) units. But this data is hard to get. (As a committee we should decide on a set of representative popular devices. But that goes way beyond this issue.)
I think generally the guidance works out ok though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an additional
subgroups-f16
feature? The existingf16
feature already requires SM6.2 and almost all devices that support the proposedsubgroups
functionality + thef16
feature also supportshaderSubgroupExtendedTypes
(see reports analysis).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some numbers for Vulkan (pulled from the analysis):
subgroup
featuref16
f16
andsubgroup
100% being core WebGPU functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting data. I'd prefer not to cleave the feature. Can you clarify what the f16 feature is? Does it represent the set of features required to enable f16 in WGSL? The other features seem to mirror Vulkan features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this as a todo for now (and linked to your report).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for the
f16
features is here.This is the whole commit: teoxoy/gpuinfo-vulkan-query@8681e00 (includes the src and the analysis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so it is all the features needed to implement WGSL f16. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this report analysis, splitting off the arithmetic ops looks the most promising, but considering the devices we get back are only Gen7 (Ivy Bridge, Haswell) intel it might not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when subgroup_invocation_id +- delta goes OOB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An indeterminate value is returned. I was trying to avoid all the fiddly cases in the proposal that will have to be addressed in the final spec.
MSL says lower invocations are unmodified, but SPIR-V says undefined value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually MSL has a variant of shuffles which fill-in a defined value in the OOB cases. But leaving it as UB is fine as well IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding this, but given that at least one invocation would end up with an indeterminate value, wouldn't the developer always have to end up doing some comparison against
subgroup_invocation_id
to ensure the indeterminate value doesn't cause trouble?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a definition of how invocations are grouped into quads (how it maps to subgroup_invocation_id)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vulkan explicitly requires that quads are consequence subgroup invocations. See https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#shaders-scope-quad. MSL provides a quad id. D3D12 says they are evenly divided in the subgroup (so I assume just like vulkan).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are adding subgroup operations to fragment shaders yet right? So are there helper invocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it seems we are, but i thought these were even newer functionality than compute subgroups, are we looking to add them anyway? It seems it could lose some reach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to gpuinfo.org very few devices don't support fragment shaders.
Because general portability seems to be off the table based on empirical testing, it seemed beneficial to allow them in fragment shaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring
subgroupSupportedStages
to include the fragment stage seems to drop support for the Adreno 600 series (see reports analysis).This is probably fine given that D3D12 and Metal support subgroup ops in the fragment stage unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does feel somewhat counter to all the efforts we've put in for portability. This is certainly an implementor's decision, but I would feel more comfortable making the extension unavailable for known drivers that perform the
aggressive opimizations that do not preserve the correct active invocations
. At least then there's incentive for these things to be fixed, instead of forever worked-around.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a todo to see how much more portable we can make individual functions.