Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wgsl: array size may be module-scope constant (possibly overridable) #1792

Closed

Conversation

dneto0
Copy link
Contributor

@dneto0 dneto0 commented Jun 1, 2021

It's still an i32; it was only ever an INT_LITERAL in the first place.

Fixed: #1431

wgsl/index.bs Outdated
array_type_decl
| attribute_list* ARRAY LESS_THAN type_decl (COMMA array_size_expr)? GREATER_THAN

array_size_expr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constrains the expression by grammar.
I could have used literal_or_ident which is defined in the "Attributes" section, and then constrained with a validation rule. I could go either way.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2021

Previews, as seen when this build job started (5476593):
WebGPU | IDL
WGSL
Explainer

wgsl/index.bs Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2021

Previews, as seen when this build job started (88153ac):
WebGPU | IDL
WGSL
Explainer

Copy link
Contributor

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dneto0 one thing I don't understand, maybe you can clarify it. So this is supported in SPIR-V, right? If I have a structure with such an array, what would be the offset declarations on the fields after this array? They can't be known without a concrete value for a constant, so how would SPIR-V decorate them?

@dneto0 dneto0 added a11y-needs-resolution Issue the Accessibility Group has raised and looks for a response on. wgsl WebGPU Shading Language Issues labels Jun 3, 2021
@dneto0 dneto0 added this to the MVP milestone Jun 3, 2021
@dneto0 dneto0 added this to Open PRs in WGSL Jun 3, 2021
@plehegar
Copy link

plehegar commented Jun 3, 2021

is this an accessibility issue that needs particular attention from the Accessibility Group (APA) ? a11y-needs-resolution is reserved for accessibility issues where APA expects it to be resolved to their satisfaction before a transition. If this issue is related to accessibility, please use a11y-tracker instead and let the APA Group decides on the importance of the issue.

@kvark kvark removed the a11y-needs-resolution Issue the Accessibility Group has raised and looks for a response on. label Jun 3, 2021
@dneto0
Copy link
Contributor Author

dneto0 commented Jun 3, 2021

So this is supported in SPIR-V, right? If I have a structure with such an array, what would be the offset declarations on the fields after this array? They can't be known without a concrete value for a constant, so how would SPIR-V decorate them?

Yes, this is supported in SPIR-V directly. SPIR-V allows

Length must come from a constant instruction of an integer-type scalar whose value is at least 1.

And "constant instruction" includes spec constants.

Yes, offsets are literals, and members can't overlap. So the offset of the field after such an array must be high enough so that it accommodates any array size that is actually used.
This is checked at pipeline creation time.

@dneto0
Copy link
Contributor Author

dneto0 commented Jun 3, 2021

(In fact SPIR-V allows OpSpecConstantOp, which is basically const_expr, which is #1272 and is a post-MVP item.

@w3cbot w3cbot added the a11y-needs-resolution Issue the Accessibility Group has raised and looks for a response on. label Jun 4, 2021
Copy link
Contributor

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dneto0 oh wow, let's talk about it some more (maybe on the call?). I'm feeling concerned about this now.

So you are saying that SPIR-V offsets are literal, and they have to take into account all possible (valid) values for such a specialization constant here. To us, it means that SPIR-V can't be automatically generated from WGSL any more. Even though we can specialize the array size in SPIR-V, we can't possibly know what to put into the offset decorations of any fields that follow.

I suppose the same applies to Metal. The long discussion of #1064 was aimed to produce an MTLLibrary at shader module creation. But it's not possible if any of the array sizes are specialized, if I understand correctly. So having this feature hinders our ability to do useful work at shader module creation (at least on SPIR-V and Metal).

@dneto0 dneto0 removed the a11y-needs-resolution Issue the Accessibility Group has raised and looks for a response on. label Jun 4, 2021
@dneto0
Copy link
Contributor Author

dneto0 commented Jun 4, 2021

To us, it means that SPIR-V can't be automatically generated from WGSL any more. Even though we can specialize the array size in SPIR-V, we can't possibly know what to put into the offset decorations of any fields that follow.

No. We can do the following:

  • Default to the value of the initializer of the overridable constant. Or use 1 if no initializer.
  • Otherwise, the shader author can put a size attribute on the structure field containing the array.
  • If it's a nested array inside another, the author can adjust the strides (on the outer array) for the max case.

The same issue arises in GLSL without a fundamental problem.

Example:

#version 450

layout(constant_id = 0) const uint N = 10;
layout(set=0,binding=0) buffer B { float data[N]; uint after; } buf;

void main() {
  buf.data[1] = 42;
}

Produces this SPIR-V:

               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main"
               OpExecutionMode %main LocalSize 1 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_GOOGLE_cpp_style_line_directive"
               OpSourceExtension "GL_GOOGLE_include_directive"
               OpName %main "main"
               OpName %N "N"
               OpName %B "B"
               OpMemberName %B 0 "data"
               OpMemberName %B 1 "after"
               OpName %buf "buf"
               OpDecorate %N SpecId 0
               OpDecorate %_arr_float_N ArrayStride 4
               OpMemberDecorate %B 0 Offset 0
               OpMemberDecorate %B 1 Offset 40
               OpDecorate %B BufferBlock
               OpDecorate %buf DescriptorSet 0
               OpDecorate %buf Binding 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
      %float = OpTypeFloat 32
       %uint = OpTypeInt 32 0
          %N = OpSpecConstant %uint 10
%_arr_float_N = OpTypeArray %float %N
          %B = OpTypeStruct %_arr_float_N %uint
%_ptr_Uniform_B = OpTypePointer Uniform %B
        %buf = OpVariable %_ptr_Uniform_B Uniform
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
      %int_1 = OpConstant %int 1
   %float_42 = OpConstant %float 42
%_ptr_Uniform_float = OpTypePointer Uniform %float
       %main = OpFunction %void None %3
          %5 = OpLabel
         %18 = OpAccessChain %_ptr_Uniform_float %buf %int_0 %int_1
               OpStore %18 %float_42
               OpReturn
               OpFunctionEnd

In this case the compiler (Glslang) assumed an element count of 10, so the offset of the subsequent field was 40.

@kvark
Copy link
Contributor

kvark commented Jun 4, 2021

No. We can do the following:

All of these require changes to either the syntax, or the semantics of WGSL. So it's something we should consider in strong relation with this PR. We can't land this without having a solution either available, or as a part of this PR.

In this case the compiler (Glslang) assumed an element count of 10, so the offset of the subsequent field was 40.

I don't understand how this is supposed to work. So if I instantiate this entry point with the value of 11, then what happens? Will the pipeline creation fail? or am I going into UB territory with this?

@dneto0
Copy link
Contributor Author

dneto0 commented Jun 4, 2021

I don't understand how this is supposed to work. So if I instantiate this entry point with the value of 11, then what happens? Will the pipeline creation fail? or am I going into UB territory with this?

Pipeline creation fails.

We can keep things simple follow Glslang's lead: for a overridable constant that's used in this way, it must have an initializer, and assumes that value is the element count for the purpose of sizing in host-shareable. Then the pipeline check is "between 1 and the initializer value".

@dneto0
Copy link
Contributor Author

dneto0 commented Jun 4, 2021

So it's something we should consider in strong relation with this PR. We can't land this without having a solution either available, or as a part of this PR.

Agreed. Thanks for the important feedback.

@kvark
Copy link
Contributor

kvark commented Jun 4, 2021

We can keep things simple follow Glslang's lead: for a overridable constant that's used in this way, it must have an initializer, and assumes that value is the element count for the purpose of sizing in host-shareable. Then the pipeline check is "between 1 and the initializer value".

That sounds good to me, not seeing any obvious issues with it right now.
Thanks for moving this forward!
We should bring this to the group.

@dneto0
Copy link
Contributor Author

dneto0 commented Jun 4, 2021

This is the GLSL-for-Vulkan rule:

https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_vulkan_glsl.txt#L1203

Arrays inside a block may be sized with a specialization constant, but the block will have a static layout. Changing the specialized size will not re-layout the block. In the absence of explicit offsets, the layout will be based on the default size of the array.

So it's not just Glslang making it up. :-)

@kvark
Copy link
Contributor

kvark commented Jun 4, 2021

Hold on. If the specialization doesn't change any layout... what is the point in specializing the array sizes at all?
Wouldn't all the same use-cases be served by just declaring the array with the maximum size (which is the default value for the specialization in the current/future proposal)?

@dneto0
Copy link
Contributor Author

dneto0 commented Jun 4, 2021

The first use case for this is for sizing arrays in workgroup storage. E.g. one element per invocation in the workgroup.
And if you want to have this in storage buffer, then the easiest thing to do is to put that flexibly-sized thing at the end of the block (struct). Then there's no subsequent field and you're done.

When it says "doesn't change layout" what it means is it doesn't change the offsets. (So choose a big offset to start with).

@dneto0
Copy link
Contributor Author

dneto0 commented Jul 14, 2021

I'm currently rebasing this change and incorporating the effect of #1135

- rename "array size" in many places to the more specific term
  "element count"
- element count may be an unsigned integer literal, as per gpuweb#1135
- describe array type matching in the positive sense, and as an
  if-and-only-if set of rules.
- update example to show array size with unsigned integer literal.
- Reword array layout to make "element stride" a defined term, and
  separately write out how its value is determined.
@dneto0 dneto0 force-pushed the issue-1431-specializable-array-size branch from 30ce338 to f9b4596 Compare July 14, 2021 21:07
@github-actions
Copy link
Contributor

Previews, as seen when this build job started (f9b4596):
WebGPU | IDL
WGSL
Explainer

@dneto0
Copy link
Contributor Author

dneto0 commented Jul 14, 2021

I've rebased, and added a commit that does:

PTAL

Copy link
Contributor

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, if we want to land it. We still need to get feedback from Apple.

| attribute_list* ARRAY LESS_THAN type_decl (COMMA element_count_expression)? GREATER_THAN

element_count_expression
: INT_LITERAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be our constexpr thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on where #1272 goes. We can refactor rules later if it converges that way.

* They have the same element type.
* Their element count specifications match, i.e. any of the following are true:
* They are both runtime-sized.
* They are both fixed-sized with element counts |N1| and |N2| such that one of the following is true:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a particular design choice which I'm not 100% sold on, so I wanted to highlight it.

The example below shows cases where certain types are forced to be different by these rules.

But this also means that we lose some ability to 'substitute equals for equals'.
E.g. before:

             let a : array<i32,4> = array<i32,4>();  / / zero-init

after:

             let size = 4;
             let a : array<i32,size> = array<i32,4>();  / /  fails type-check

An alternative formulation would allow size-specifiers to match if they are:
- Case 1: literals or non-overridable module-scope constants with the same value
- Case 2: the name of the same overridable constant

We definitely don't want to allow different overridable constants to match, because they can be overridden to different values. The type check should depend only on the source text.

On the whole, I decided to go with the more restrictive formulation. We can relax the rules later. We can't go in the other direction.

@litherum
Copy link
Contributor

litherum commented Jul 20, 2021

After discussing internally, we are against the ability to make array sizes overridable, for a few reasons:

  1. Neither HLSL for MSL have support for this language feature. GLSL only gained support for this language feature starting in the latest version, version 4.6. This is a pretty clear signal that developers aren't generally using this feature.
  2. Adding this feature into the API/language would lead authors astray. Instead of using this feature, an author could reorganize their data layout to use runtime-sized arrays instead of overridden-sized arrays, and doing this would lead them to consistent performance on all devices. If we accept this proposal, authors will be tempted to use this instead of runtime-sized arrays, and thereby hit a performance cliff on one (or two?) of the backend APIs. (And it isn't reasonable to expect an implementation to automatically be able to reorganize the data to use runtime sized arrays under the hood.)
  3. From discussions with the Metal team, this feature is philosophically contrary to, and unimplementable on, MSL. C++ expects that the compiler can determine the sizeof all types at compilation time, and this expectation is embedded into the core of the C++ language itself.

@alan-baker
Copy link
Contributor

alan-baker commented Jul 20, 2021

After discussing internally, we are against the ability to make array sizes overridable, for a few reasons:

  1. Neither HLSL for MSL have support for this language feature. GLSL only gained support for this language feature starting in the latest version, version 4.6. This is a pretty clear signal that developers aren't generally using this feature.
  2. Adding this feature into the API/language would lead authors astray. Instead of using this feature, an author could reorganize their data layout to use runtime-sized arrays instead of overridden-sized arrays, and doing this would lead them to consistent performance on all devices. If we accept this proposal, authors will be tempted to use this instead of runtime-sized arrays, and thereby hit a performance cliff on one (or two?) of the backend APIs. (And it isn't reasonable to expect an implementation to automatically be able to reorganize the data to use runtime sized arrays under the hood.)
  3. From discussions with the Metal team, this feature is philosophically contrary to, and unimplementable on, MSL. C++ expects that the compiler can determine the sizeof all types at compilation time, and this expectation is embedded into the core of the C++ language itself.

The problem with 2 is that this feature is mainly for workgroup storage where the runtime sized arrays are disallowed. We would be happy to limit this feature to prevent it being used in uniform or storage storage classes if there is a clean way.

For 3, I thought Metal allowed an entry point parameter of pointer to local, would that not work to some extent?

@litherum
Copy link
Contributor

litherum commented Jul 20, 2021

The following is unrepresentable in MSL:

const constant size_t chunkSize [[function_constant(0)]];

struct MyStruct {
    float chunk[chunkSize];
    int extraData;
};

Because the sizeof(MyStruct) would be unknown. So, the particular address space isn't important here.

@litherum
Copy link
Contributor

this feature is mainly for workgroup storage

Right, Metal has different facilities for dynamic workgroup storage: https://developer.apple.com/documentation/metal/mtlrenderpassdescriptor/2866527-threadgroupmemorylength

Coming up with a feature that is specifically for solving this more narrow problem would be great!

@kainino0x
Copy link
Contributor

https://developer.apple.com/documentation/metal/mtlrenderpassdescriptor/2866527-threadgroupmemorylength

This appears to be for render pass tile memory (and only in the very latest version of Metal). Is there an equivalent for workgroup storage?

@litherum
Copy link
Contributor

https://developer.apple.com/documentation/metal/mtlcomputecommandencoder/1443142-setthreadgroupmemorylength

@kvark
Copy link
Contributor

kvark commented Jul 20, 2021

So basically, if this proposal is rejected, we need a way for users to control the size of threadgroup memory.
Something like var<workgroup> usizedWorkgroupData: array<vec4<f32>>;, but with a way to communicate the size of this binding from the host side:

  • Vulkan shader side: declare this as an array with specialized length
  • Vulkan host side: set the specialized constant
  • Metal shader side: declare a raw pointer to threadgroup data as an argument to the compute entry point
  • Metal host side: set setThreadgroupMemoryLength value

Edit: fixed array<vec4<f32, 4>> to array<vec4<f32>>

@kdashg
Copy link
Contributor

kdashg commented Jul 27, 2021

WGSL meeting minutes 2021-07-27
  • AB: What we want to turn this into, to support this on Metal, is to allow a single outermost level of array in workgroup storage that can be sized by a specialization constant. This can be compiled to SPIR-V and Metal. They’re pretty much the exact same.
  • AB: The tooling shouldn’t assign a number under the hood. We need to determine what is associated with what global. If we
  • MM: That approach isn’t unreasonable. Think structural features should be imposed structurally. Unfortunate to have this limited by a bunch of prose. Think this feature is sufficiently distinct from spec constants. Would like a different grammar or type to represent this. Just because it seems kind of silly to have something in this spec that can only be used in a corner case that is surprising and non-obvious.
  • AB: Sympathetic to that argument, but don’t want a lot of different array types. There’s a tension there.
  • MM: Can write down some ideas.
  • AB: Important thing here is that we should have a feature that functions this way (non-statically sized workgroup storage array). How it’s spelled is a second step.
  • MM: +1 to that. Want one language feature that maps to both Vulkan and Metal. How does an HLSL author write code that exercises this feature? Is it possible?
  • GR: Don’t know.
  • MM: If it’s possible in D3D, then the WGSL feature should map to that too.
  • AB: As I recall, it’s an #ifdef that is then late-compiled.
  • MM: I plan to talk to Greg about this offline, and come back with examples/proposals.
  • GR: Heard back: there is no such D3D feature.
  • MM: Ok, then will limit viable targets to match Vulkan and Metal.

@kdashg kdashg moved this from Needs Discussion to Needs Action in WGSL Jul 30, 2021
@litherum
Copy link
Contributor

litherum commented Aug 8, 2021

I've split out the dynamically-sized threadgroup memory feature to #2024, so we can discuss paths forward on that in its own place, rather than on this PR.

@kdashg
Copy link
Contributor

kdashg commented Aug 10, 2021

WGSL meeting minutes 2021-08-10
  • (Rebased, and updated to incorporate Buffer indices should be unsigned #1135 (allow array element count to be unsigned), and a bunch of cleanups)
  • Further discussion: allow this only the case where:
    • The array type is the store type for a workgroup variable. (only the outer dimension. (Then can be implemented in MSL as a pointer-to-local entry point param.)
  • [Deferred until this week. Myles didn’t do his homework]
  • Dynamically sized threadgroup memory #2024
  • MM: for 1792, allow non-overridable module scope constant for array sizes
  • DN: well separate overridable part out
  • Resolved module scope constants (non-overridable) are useable for array sizes
  • MM: Don’t want arbitrary overridable arrays due to Metal philosophy (and C++). Want to know the static size whenever possible. Willing to limit to workgroup arrays. Size is known later than shader creation time. How do we expose workgroup arrays sized after shader creation. Option 1 use specialization constants and limit to just outermost workgroup array. Option 2 make a new feature, add attribute to the workgroup variable, specified via the API. Could get the length of an array the same as a runtime sized array.
  • DN: control knob specified after shader creation that affects “stuff”. Seems like specialization constants to me. Hard to think past. Would prefer fewer new features.
  • MM: new use of specialization constants feels distinct from existing use cases.
  • DN: feels parallel to sizing the workgroup. Single knob to control multiple language constructs.
  • MM: workgroup size is not specializable.
  • DN: we changed the spec to allow overridable constants.
  • JP: allow workgroup size components to be pipeline-overrideable constants #1442 made that change.
  • MM: seems compelling. One source of truth for what the workgroup variable is.
  • JP: not always 1:1, but usually a function of it.
  • MM: that seems more in line with option 2
  • DN: worry about the coupling between api and shader
  • MM: Metal exposes one variable
  • JG: should we tackle constexpr sooner?
  • JP: not sure there is a single source of truth with option 2. Still modify API and shader. Works even better with contexpr
  • MM: seems too narrow a view to me. Regard it as part of the binding feature.
  • DN: think it is often solvable just in the shader.

@dneto0
Copy link
Contributor Author

dneto0 commented Aug 10, 2021

Calling it out a decision confirmed today:

Resolved module scope constants (non-overridable) are useable for array sizes

@dneto0
Copy link
Contributor Author

dneto0 commented Aug 10, 2021

In the interest of preserving past decisions, I'll close this PR and open another that allows an element count to be a module-scope constant, but not overridable.

@dneto0 dneto0 closed this Aug 10, 2021
WGSL automation moved this from Needs Action to Done Aug 10, 2021
ben-clayton pushed a commit to ben-clayton/gpuweb that referenced this pull request Sep 6, 2022
This exposes an issue with the ldexp tests that I have documented in
gpuweb/cts#1791

Fixes gpuweb#1785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wgsl WebGPU Shading Language Issues
Projects
WGSL
Done
Development

Successfully merging this pull request may close these issues.

allow array size that is a pipeline-overrideable constant
9 participants