WGSL 2020 08 18

Dean Jackson	:	Chair
Dan Sinclair	:	⌨️ Scribe
Google Meet	:	Location
https://webgpu.dev/wgsl	:	Specification
WGSL Issues	:	Open Issues
Marked Issues	:	Meeting Issues

Tentative Agenda

2020-08-11 Action item review:
- Apple: Will look into validating grammar on CI.
- MM: Updated proposal for #854
Valid or invalid: Return statement in the middle of a function body (#996)
snake_case or lowerCamelCase or UpperCamelCase? (#993)
Support #line directive (#606)
Extern declarations (#883)
Function overloading (#876)
Introduce Subgroup Operations Extension (#954)
Move array stride decoration to the struct fields instead of types (#960)
Invariant qualifier (#893)
Support .length() for runtime-sized arrays (#989)
WGSL should require uniform storage class variables to be block decorated. (#1004)
Shader portability (and performance portability). (#895)

📋 Attendance

WIP, the list of all the people invited to the meeting. In bold, the people that have been seen in the meeting:

Apple
- Dean Jackson
- Fil Pizlo
- Myles C. Maxfield
- Robin Morisset
Google
- Dan Sinclair
- David Neto
- Kai Ninomiya
- Ryan Harrison
- Sarah Mashayekhi
Intel
- Yunchao He
- Narifumi Iwamoto
Microsoft
- Damyan Pepper
- Rafael Cintron
- Greg Roth
- Michael Dougherty
- Tex Riddell
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Joshua Groves
Mehmet Oguz Derin
Timo de Kort
Lukasz Pasek
Tyler Larson
Pelle Johnsen
Matijs Toonen
Hamada Gasmallah
Dominic Cerisano

⚖️ Discussion

2020-08-11 Action item review

Apple: Will look into validating grammar on CI.
- No update
MM: Updated proposal for #854
- No update

Valid or invalid: Return statement in the middle of a function body ([#996](https://github.com/gpuweb/gpuweb/issues/996))

DS: My preference is to disallow it for now. Easier to not have it.
RM: Just to be clear, return in middle of block.
DS: Yes, basically, if there is a return it must be the last statement in a block.
GR: Usually only seen as bugs
DS: Possibly useful for debugging
DN: You could also do if(true) { return; } as well.
DJ: Ok, disallow for now. Will move to resolved.

snake_case or lowerCamelCase or UpperCamelCase? ([#993](https://github.com/gpuweb/gpuweb/issues/993))

DS: we are currently inconsistent with ourselves and should be consistent.
DS: Just need to work out which we care about?
DN: What is the API side doing, camel case?
DM: Yea
DN: We should do that, what to do about acronyms. E.g. is IBM that or Ibm or something else.
DS: First letter of an acronym is uppercase, rest is lowercase.
DS: What about the first letter in camelCase?
DN: Depends.
DM: We don’t have class names.
DS: Would TextureLoad or textureLoad?
DM: e.g. in Rust we have methods camelCase and variables with underscores. It doesn’t matter.
DS: I don’t care which we pick, as long as it is consistent
DS: camelCase, first letter lower. Acronyms are similar. (i.e. writeToPbo)

Sampling from MSAA (#990)

DM: I suggest we move them to separate types. This is all in the proposal.
DS: Sounds good to me.

Support #line directive ([#606](https://github.com/gpuweb/gpuweb/issues/606))

DS: I think we agreed to use source maps, which means we don’t need this.
Close this issue.

Extern declarations ([#883](https://github.com/gpuweb/gpuweb/issues/883))

DS: Doesn’t seem to be needed for MVP.
DM: If he has a use case required for MVP, he should post it.

Function overloading ([#876](https://github.com/gpuweb/gpuweb/issues/876))

DS: Again, I don’t think this is necessary for MVP. You can use a different name. We don’t have generics.
DS: We might do it internally with the stdlib, but users won’t be able to do it in the MVP.
JG: Ultimate goal is that people can rewrite the stdlib in userspace, but that can come later.
DM: Put a post-MVP label on it.

Introduce Subgroup Operations Extension ([#954](https://github.com/gpuweb/gpuweb/pull/954))

DJ: Was waiting until David returned
DN: Dragons dragons dragons. Dark corners in the execution model. Lots of things from the last two years where this is hard to get portable. Hard to pin down consistent behaviour
DM: Can we document the dragons so we can come back to them later
DN: Tried to explain as good as possible in my comment. If GR has comments, please add them.
GR: Missed that comment, will look it over. Would welcome any help to document the dark corners of this area.
RM: Myles started talking to the Metal team to better understand these corners and we already got some information that confirms it’s full of dragons. He was hoping to get a meeting with them where they could explain in more detail. So, hopefully more information coming in a week or two.
DN: Can also point to work by AMD engineers in LLVM around the convergent attribute and it’s been in the tree ~1.5 years and the engineer driving wants to make changes. Looking for an LLVM dev meeting talk, but can’t find it. Will keep looking. Ongoing within the LLVM community on how to handle aspects in the compiler stack.
MD: Understand the concerns, but are we making these based on the PR operations or the broader set? There are tricky ops, but the exposed ones are small.
DN: Haven’t look at the proposal in detail, would have to look.
MD: Would suggest looking at the PR. Less controversial for the stdlib.
GR: Will take a look. Quickly, are helper lanes involved?
MD: Should I list the ops?
GR: That’s all right.
DN: If not in a fragment shader, then less problematic.
GR: That’s true
DN: Limited to compute would be better.
MD: Discussions around invocations and some ops are less defined between frag and compute. Current extension restricted to compute only. Only exposes ops you can’t make arbitrary indexing like give me a mask or index but implicit only.
GR: So, limited to compute
MD: Yes.
GR: That reduces my disagreement by some sizable percentage.
MD: Maybe wait for Metal team, and take a look at exposed ops.
GR: Will take a look.
DJ: Let’s move this back to discussion and wait for more feedback from Apple and others.

Move array stride decoration to the struct fields instead of types ([#960](https://github.com/gpuweb/gpuweb/issues/960))

DJ: Got to the point we were trying to work out DMs comment on if we can restrict to not require padding
DM: Do we need explicit stride at all. Confusion is if the stride has to be a multiple of the actual stride or anything above the sum of sizes
DN: Must be a multiple of the storage size of the element
DM: Is there a valid use case where the user wants to be a multiple but not a single element.
DN: For example, an array of uint in a UBO the uint is 4 bytes because it’s a uniform the stride is 16. So, 4 bytes and 12 bytes padding because that's what underlying api gives for UBO. Imposed by system, and not use. In order to format data for buffer need to know the stride.
JG: Brings back to we’re weirdly asking the user to prove their knowledge so they don’t get this wrong. So, we might have the stride even if there is only 1 or 2 right answers.
DN: Usually 1 natural right answer given the constraints. Could limit the valid answers to that natural one. Don’t think itj simplifies implementations as you need to inject padding. Need to compute the amount of padding. Would have to write tests for invalid cases as well.
DM: So, same vein as explicit offsets
DN: Yup
JG: Did we write non-normative text on why we did this weird thing?
DN: Not yet
JG: Should consider doing that.
DN: One of the next things for the spec is these size constraints and the io size counting as that defines the interface. Needed for CTS. Next low-hanging fruit to go after so will add that note.
DM: Same vein as offsets, but specified in a different place. Slightly confusing. Can you confirm if you want different strides to be assignable to each other. You mentioned you wanted that to be allowed.
DN: Yes. That would have to be written, can go either way. Don’t think it’s common to load an entire array. Could ban loing the entire array if we want. Text needs to be written and reviewed.
JG: Can you do that in SPIR-V or DXIL?
DN: You can load it all at once, yes. Don’t go loading a 1k element array, will probably break on some driver.
JG: Little surprised, it would seem these assignments of structs in one go, certainly of arrays is surprising. Structs not so much. Ifwe push that back and do it later, for now load in a loop. Then, no stride problems with assignments.
DN: Maybe that’s the right way to do it is to ban the assignment of arrays.
JG: From that point there are 2 directions. Either towards putting the annotation on the struct field and it’s never part of the type, then you can’t go back and do assignable arrays or you go the other way and it has to be an array . Could polyfilll on the backend for assigning arrays of strides.
DN: There was a SPIR-V 1.4 instruction to do logical copy of data from 1 thing to another with layout differences. Does happen, but not sure if motivation was structs or arrays.
JG: Interesting.
DJ: So does this mean we require the decoration and waiting on David to describe how to do this and the other IO/padding/alignment, etc
DN: Yes. We still need the stride decoration , I think it needs to be on the array type due to multi-dimensional arrays being weird. Can review again when more text is written.

Invariant qualifier ([#893](https://github.com/gpuweb/gpuweb/issues/893))

DJ: Nearly resolved, deferred decision to this week. Consensus was to have invariant.
DM: I think invariant is a portability concern and in most cases isn’t a perf concern to have a position invariant as most vertex shaders would just do a matrix mult. So, not stressing ALU too much. So, default to safe side and allow users to opt into fast path if shader is too complicated. So, have other key world to make position variant. Only talking about position output of vertex shader
RM: Sounds reasonable. Would like to say last week Myles was arguing the other side that the cost of invariant was significant. So, if we decide to go that way should delay closing issue so Myles can have his chance to argue.
JG: Concerned that on the graphics side, there are weird trade offs and it’s the nature of fast graphics APIs you play fast and loose and backfill correctness. Hard to intuit what variable sizes are big enough and other things which cause perf issues. Historically the preference has been to do the fast thing by default as it generally works. If you find it doesn’t work, move to slower. This would be an inversion of that and don’t feel like this is an area we should innovate in vs existing APIs
GR: not easy to identify which calculations contribute to that result. Would just be nix-ing fast math throughout.
DM: You don’t get a message saying your code is non-portable. It will work on some machines and stop working on some some. People will complain and blame app or WebGPU. Don’t recall any WebGPU case where we say let’s be non-portable as the cost may be higher. We’ve measured the cost and if we saw a high cost, made it non-portable.
JG: I think this is a surprising thing to innovate
GR: Wouldn’t say it’s portability. If people use low precision types they’ll get type errors and inconsistencies. Not portability, just how graphics is done.
DM: We don't have low precision types.
GR: In the case of WGSL example doesn’t apply, but same kind of argument.
DM: If you want low precision types we’ll have this argument.
GR: Was an analogy.
JG: If there were prior art which had an API making the more portable decision i’d be more compelled. Lack of innovation by APIs which care about portability and had trouble with this that they didn’t choose to make invariant by default is a clear direction and that should be our default. Others who have thought hard and worked on APIs haven’t found this to be a portability concern to make this the default.
DM: Other people have made proper decisions for their APIs. We have more portability constraints than others.
JG: We have the same constraints
DM: Largely invisible to users. Not something needed to consider when developing programs unless minority.
JG: But it is. Hard to show you actually need it. Concerned this is part of the fast-math cargo-cult which you throw at things to make it faster. The default really is to be variant as that matches the usecase most people have. In the exceptional usecase you can reach for the invariant. That’s where all previous APIs centered and what tWebGL did.
DM: Would be fine, but needing the invariant would not be discoverable. It’s a footgun
JG: It’s graphics programming
DM: We’re re-defining what that is. We're building a safe graphics API
JG: Reconsider in light of what webgl offers. WebGL has done much of this. Programmable shaders already exist and ship to lots of places. Timeline to have WebGPU portability of WebGL is a decade.
DM: Doesn’t mean we can’t have a higher bar.
JG: Then need to express that. We have done this before, we did it the other way. You’re recommending we could do better but has to be phrased as everyone else ships with the other thing and it works well and it isn’t the primary concern. It would be innovation. Is this where we want to innovate not that we have a new thing and will do it this other way. That’s the trade off. Should think about this one more.
DN: Separate concern. We have to be careful how we phrase which expressions are affected by this annotation. Needs to be expressed as data-dependencies. Writing to an invariant expression propagates back through previous expressions. Lots of text and subtle stuff to make clear.
DJ: What state is this?
JG: No resolution yet. Need to think/talk more. Also need Myle’s input again. Come back to it next week?

Support .length() for runtime-sized arrays ([#989](https://github.com/gpuweb/gpuweb/issues/989))

DS: Right now, no way to get this info. We need it. Exists in SPIRV and HLSL.
DS: Suggest array_length, to differentiate from vector length. On Metal, it will have to be passed in.
RM: What does Metal do?
DS: In Metal you seem to declare an array of length 1 and simply read off the end of it, where you know the length.
DN: Similar to OpenCL
TR: Where are these arrays coming from?
DS: They are shader input, the last parameter. You can’t define one in the shader itself.
DN: Storage buffer only. Like VLA from C99.
DN: In CL you get a pointer to the base of the struct, not to the array.
JG: We would implement this manually on MSL since we need to do bounds checking already. It is implementable.
DM: Seems useful
DS: We could not have runtime-sized.
DN: They are useful.
JG: In most cases you know how many there are in the array. But this seems useful.
JG: Need to pick a name that doesn’t suggest we have arbitrary runtime-sized arrays.
DS: runtimeArraySize?
JG: storageBufferLength?
DN: Boundless can have buffers of buffers, that are runtime sized.
JG: Generalize the function name, but only its use. This is not a commonly called function.
TR: Similar to append and consume buffers in D3D? They have a counter attached.
DN: Different use case. It might use a runtime array for implementation.
DM: Can we call it arrayLength or arraySize if we have different types?
JG: Seems reasonable. Documentation could say that it only applies to particular types.
Resolved: arrayLength

WGSL should require uniform storage class variables to be [[block]] decorated. ([#1004](https://github.com/gpuweb/gpuweb/issues/1004))

DS: UBOs in Vulkan must be block decorated structs. So this is a constraint we have.
DN: In GLSL you can write loose uniforms, collect them and invent a binding for you. Similar to D3D. I want to avoid that in WGSL.
DM: That’s outdated. Only used in compatibility mode.
DN: Yes. I want to make it clear they can’t use this.
JG: MM might have opinions so I suggest we defer.
DM: What if you use it as a storage buffer?
DN: It’s fine. THe offsets have slightly relaxed rules. It works as long as you have the right alignments.
DM: If you can statically define …… I think it can be inferred.
DN: Not sure what you are asking the implementation to do.
DM: Implementation can see what the variable is bound to.
DM: There seems to be only one right way to do it, so why make the user say it?
JG: Is there ever a reason to omit block
DN: You can’t embed a block inside another block.
DS: But any struct that is attached to a module scoped variable we could behave as if block
DN: Only valid for SBO and UBO. For other cases may not work.
DM: Going to be hard with nested structures. This has the benefit of being explicit.
JG: Wonder if we’re generally going to be de-duplicating and synthesizing types. I think Dan was saying if we synthesized the block version for global variables isn’t terrible. Being able to say we see thing, emit code is nicer.
DN: Wary of having types not matching what the user wrote and what the program is type checking. Now translating types and that gets hairy. One part of code in the implementation has to trick another.
DS: If we add it now, we can remove it later. We could just put the text that a uniform variable must be a block decorated structure.
JG: If we wanted to make the block optional, we could raise a new issue? I will file it.
- https://github.com/gpuweb/gpuweb/issues/1008

Shader portability (and performance portability). ([#895](https://github.com/gpuweb/gpuweb/issues/895))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGSL 2020 08 18

Tentative Agenda

📋 Attendance

⚖️ Discussion

Clone this wiki locally