WGSL 2020 08 25

Dean Jackson	:	Chair
Dan Sinclair	:	⌨️ Scribe
Google Meet	:	Location
https://webgpu.dev/wgsl	:	Specification
WGSL Issues	:	Open Issues
Marked Issues	:	Meeting Issues

Tentative Agenda

2020-08-18 Action item review:
- Look into validating grammar on CI (Apple)
- Updated proposal for #854 (Myles)
PR checks
- methods to camelCase 🐪🧳 #1016
- popcount #968
Introduce Subgroup Operations Extension (#954)
Invariant qualifier (#893)
WGSL should require uniform storage class variables to be block decorated. (#1004)
Shader portability (and performance portability). (#895)
WorkGroup memory size and barriers? (#863)
Move texture methods into an import? (#1018)
Are entryPoints namespaces per module or per stage per module? (#1023)
Should the spec mention location aliasing? (#956)
Disallow fragment atomic ops on texture storages (#728)
Support pack / unpack functions (#988)
Textures and Samplers status check #573 #574

📋 Attendance

WIP, the list of all the people invited to the meeting. In bold, the people that have been seen in the meeting:

Apple
- Dean Jackson
- Fil Pizlo
- Myles C. Maxfield
- Robin Morisset
Google
- Dan Sinclair
- David Neto
- Kai Ninomiya
- Ryan Harrison
- Sarah Mashayekhi
Intel
- Yunchao He
- Narifumi Iwamoto
Microsoft
- Damyan Pepper
- Rafael Cintron
- Greg Roth
- Michael Dougherty
- Tex Riddell
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Joshua Groves
Mehmet Oguz Derin
Timo de Kort
Lukasz Pasek
Tyler Larson
Pelle Johnsen
Matijs Toonen
Hamada Gasmallah
Dominic Cerisano

⚖️ Discussion

2020-08-18 Action item review

Look into validating grammar on CI (Apple)

MM: Not done, started looking but not enough progress
DJ: I volunteer as tribute.

Updated proposal for #854 (Myles)

MM: No progress

Open Easy(?) Pull Requests

methods to camelCase 🐪🧳 #1016

Merged
DS: Open question. Do we want to convert other things like type names, storage classes, etc. Or only methods?
DN: Would like the answer to be no. I like snake case for enums and attributes.
JG: Should probably be split into another issue. Pure bikeshedding ….
DJ: Lets do this by github.
DS: I’ll file an issue on github and immediately close it. It seems people want snake case.
JG: I’m not that keen on having two different types of things.
PJ: Consistency would be nice.
JG: Snake case feels more low-level to me.
DS: OK. I’ll leave bug open and we can discuss.

popcount #968

DJ: Just checking in if we were going to merge or not.
MM: Think we should do it.

MSAA (#1029)

DM: Previously discussed here and main group agreed to have separate type. PR to main API merged. API for WGSL is approved by Dan. Can merge unless objections.
JG: Is this what SPIRV does?
DM: SPIRV may do many things like load or fetch from textures but we only have one based on what we load
JG: For texture types, do they have a proper type for multisample?
DM: Yes, Multisample has a separate flag.
JG: Ok. Sounds good to me.
MM: Seems like something we have to do, so may as well merge and can open a followup if needed.

Introduce Subgroup Operations Extension (#954)

DJ: Summary, seems like general consensus was to stop, especially after comments from David. New data from Myles.
MM: Had conversation with Metal team. Metal is based off c++ so has same language requirements/behaviour as c++. In particular, calls can be sunk or hoisted as in c++. So for most functions no big deal, for these it is. Sinking or hoisting these is different behaviour. Way it’s been defined for ‘valid’ in MSL allows these optimizations. Does not affect the validity of the program. How ever have found it is unintuitive and authors hit this problem. Will write src and wonder why it doesn’t do what they expect. So, for webgpu, propose a strong definition of validity, at least for these functions. Hoisting or sinking these into loops or other things should be disallowed. Just due to programmer intuition.
DN: Same issue comes up with compute barrier (workgroup barrier). Call these collective operations because invocations have to cooperate to create result. True if you never move in/out of control flow everything works out. Has been problematic to define language on what is legal (valid/invalid) transformations. Normally better to specify what a given program does (src text does this thing). Typically wack-a-mole if you define what transforms are possible because people get creative.
MM: We have a notion of being stronger then C++. What would that take to improve the language. However, problematic as MSL compiler is based off LLVM and LLVM doesn’t have concepts needed to solve problem. Impossible to solve using LLVM pipeline today. Upstream LLVM is working on it. Apple is also interested. Sort of un-implementable to day to have strong guarantees. We’d like this, but can’t. Two other points. Right now, we guarantee (in MSL) if you run the function giving active threads in bitmask and don't have control flow and run SIMD function then the active threads in first call will be threads in SIMD function. This is a guarantee authors use when writing MSL. Last thing, all of these issues go away if we limit the subgroup operations to just uniform control flow. If we do limil SIMD operations to uniform control flow this is still useful. Can write useful programs with this constraint. Direction forward for WGSL is to allow these functions in uniform control flow. Wait for LLVM (or backend languages) to solve problem in general case, can then open to non-uniform control flow. Important piece, it is unlikely the function signature will change in the future. So, likely won’t have competing solutions if we go down this path. Just makes function calls available in more places then they are today.
DN: The guarantee mentioned is no-spontaneous divergence or re-convergence. To my knowledge, MSL would be the only one to make the guarantee. Even if we had completely uniform control flow, we can't guarantee that property in Vulkan or other places. For CUDA nvidia hs cooperative group feature which takes an extra param. https://developer.nvidia.com/blog/cooperative-groups/#:~:text=At%20its%20simplest%2C%20Cooperative%20Groups,GPUs%20(Compute%20Capability%203.0%2B).
MM: Have looked at that, and our understanding is that will not be the long term solution. That is information already in the program. Nvidia has made programmer turn that into explicit dependency but they don’t need too. A good solution wouldn't have that parameter and we think it’s a stopgap solution and won’t be the case long term. This is a path that would be workable with MSL but don’t know about other languages. If other languages can’t support then ….
JG: Is any of this implementatable? What subset is implementable and how much do we have to trim our expectations on how reliable it is to match underlying languages. Sounds like some aspects are ‘here be dragons’ in that it might do what you want but we can’t guarantee it. Except, in some cases for MSL. But we don’t have, or they’re different for SPIR-V/Vulkan. Does uniform control flow not buy us safely here?
DN: No, only at the total uniform control flow points. The algorithm for calculating sine/cosine is a loop. This is a hidden way to get a loop that the programmer didn’t write. You need a lot of cooperation in the entire stack which unless you’ve tested you’ll be non-portable. Seen this in things like progress as well.
DM: Doesn’t that mean your cos function ruins uniform control flow you’re talking about something strong?
DN In SPIR-V you either have complete uniformity for the entire invocation group (workgroup for compute shader) or you don’t. Nothing guarantees that a subgroup executing together will stay together, even through straight line code
JG: Because it’s secretly not straight line
DN: Yes
MD: But all operations are active reductions, so isn’t that expected and programmer should be programming around that?
DN: I think what happens is people get the behaviour they want in practice but there is an edge they crossed they don’t know and the system doesn’t guarantee.
MM: Do you think offline you could point to an example program that uses this function in uniform but the mask isn’t all 1s. The reason this is interesting is Metal runs on arbitrary GPUs. This is something we can guarantee. This is the guarantee that Metal the language has. There may be some devices and some drivers which don’t fulfill this guarantee and those are non-conformant devices/drivers. Even though they claim to support Metal, they are not compliant with the Metal guarantees. As such, they should not be running WebGPU.
DN: I assume Metal only runs on apple devices.
MM: Yes, but Apple devices have arbitrary GPUs.
DN: If I bought an Apple device would I run into this situation
MM: You could. For those devices we would not allow WebGPU to run on it. They are rare enough that’s fine. The reason to bring this up is that if this is a guarantee metal mostly provides then devices you go out and buy can have this guarantee so would like to see a counter example.
DN: Do you have test suites that test for this or going off this device does not promise this?
MM: Will get back to you. Trying to draw distinction between whats practically true on all cards and what’s a guarantee of the API
DM: Trying to think about this more. If you have diverging threads from cos and you do texture fetch you need derivatives requiring threads in sync, the compiler adds sync or something else?
DN: Compiler is aware and will arrange a barrier. Not necessarily the case for subgroup ops as they’re typically non-uniform so no implied extra work to keep it straight.
JG: Is there a place these rules are laid out?
DN: I wish there was a good execution spec at this level of detail but I’ve not seen one in public. The reason I see this is, I’m a worry wart (worried beyond normal) or have been in conversation with folks who see these bugs on driver X development and are now executing on driver Y. This is why AMD is working on this in LLVM. LLVM is a great CPU compiler, not a great GPU compiler. In addition that LLVM is constantly under flux.
DJ: What do they do if they hit that bug? What are the steps to avoid?
DN: First hammer is to convert subgroup builtins and anything with special behaviours into an opaque LLVM builtin. Then you go through optimization framework and play wack-a-mole, modifying transforms to avoid doing anything in the presence of unknown builtins. More better, what AMD is doing to mark things as convergent and then plumb through the stack. Lot of wack-a-mole to make sure someone patching upstream doesn’t violate conventions
JG: Can we skate where the puck is going? Is there a cohesive fix being aimed for?
DN: You could get the maximum list view by saying I trust nothing. Which I think was going for by turning off convergent turns off all optimizations. Then you say no spontaneous reconvergence. Then maybe everything is explicit. Speculating at this point.
DJ: Back to our task, what do you suggest for now? Avoid until post MVP?
DN: Yes. Don’t think world is ready where we can promise portable semantics.
MM: Would like to see an example where there is a card (GPU and driver) which can’t abide by the proposal.
DN: Do you allow builtin call to cosine.
MM: Yes.
DN: K, what about call to user written helper
MM: Yes.
DJ: Both an example card and API as well?
MM: Just something where there is a computer I can run a program which shows the bad behaviour. Just an existence proof that the Metal guarantees are not good enough for WGSL.
JG: Or at least aren’t present everywhere
MM: Present enough, don’t want to say everywhere.
DJ: David you were referencing an LLVM pull request, can you add a link into the doc.
DN: Will copy over.
MM: It’s in the issue.
DJ: No problem then.
DN: Also subject of talk
GR: Which conf?
DN: LLVM Dev conf
MM: So, are you going to try to find example to convince Myles and then we punt?
DN: Would re-order but will try to find example.
DJ: Will comment we aren’t going to address right now but won’t close the issue but will mark as resolved and close once we get an example and analyze it.
MM: Don’t think we should punt until we have an example.
DJ: That’s what I said, just mark resolution from today.

Invariant qualifier (#893)

DJ: Quick one, discussed last week and waiting for feedback from Myles but can push off to next week.

WGSL should require uniform storage class variables to be block decorated. (#1004)

DJ: Resolved from last week, #1008 is a followup issue. Nothing more to say here.

Shader portability (and performance portability). (#895)

DJ: Been open for a while. Don’t know if there is anything to discuss, but would like response or action on issue.
DM: Don’t think it’s actinonable. Know there are places where shader performance is non-portable but no plan.
DN: The great unsolved compute shader problem. Don’t think it’s possible to fix
MM: Only solution is make some shaders slower on some devices artifically. Noone wil do that.
DJ: Need to know how much slower. What do we want to do on this issue? Leave it open, or comment and close
DM: Don’t have it filed anywhere else and know it’s a problem so may just leave it.
MM: Maybe turn into checklist of things to do and as we add those features we can check them off and we can eventually close.

WorkGroup memory size and barriers? (#863)

DJ: Similar topic.

Move texture methods into an import? (#1018)

DS: I thought this would be interesting. We have sin/cos in an import. Do we want to do it for textures and other things in the stdlib. Should these be imports?
MM: How many imports will we have in the stdlib? Would like 1 or lots. Not eg. 3
DS: Depends. We could have uber-imports that grab everything.
JG:Feels weird to not have everything imported. The use case for not doing that is narrow.
DS: Question for Myles - does anyone not import <metal/stdlib>?
MM: I don’t think so.
MM: We give you a tool to import everything. It’s on you if we add something to the stdlib and it clashes with your function. It’s so common.
JG: Solved in GLSL with reserved prefixes.
MM: Isn’t that the same as namespaces? eg. wgsl::
DS: This would require tex::
JG: They are similar but feel different.
JG: It means you need to reserve the namespaces.
DS: You can’t make namespaces at the moment.
MM: The suggestion is that everything is in a stdlib namespace, and is the only forbidden name if we give people the ability to create their own
DS: You import into a namespace, so it would be up to you. e.g import textures as t.
DN: Namespace for modules. They collide or not.
DS: If we don’t want to do it, what do we do with glsl std450 now?
MM: I don’t think the fact that in one of our backends, functions are hidden behind a namespace should make a difference.
DS: WGSL already does this. You import things like sin, cos through Import “GLSL.std.450”
DN: Where this comes in is if you have cos with some precision bounds and another with different bounds. I can see different namespaces for those things. I would advocate a w3c or wgsl namespace that can have different slices of the stdlib. Different methods get slotted into different namespaces.
MM: That example, precision.. the design problem isn’t namespaces. It’s different names of those functions.
DN: doesn’t address portability
JG: I think mix-n-match is the right solution.
MM: Does HLSL have two different sin functions?
DN: No. But you can build your own.
GR: There are fast and slow versions.
MM: How do you pick?
GR: It can be a compiler function, or they can do it in the shader (on the variable).
DN: There is a “native” square root. https://man.opencl.org/sqrt.html shows sqrt and native_sqrt with different error guarantees. See table here: https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html for native_cos, native_exp, etc.
GR: Also applies to derivative functions.
DN: Advocate for a style that can have multiple namespaces under a global “wgsl” that can accomodate.
MM: Those choices don’t exist yet.
DN: I think it’s useful for people to build a module-based namespace.
JG: Worried about the amount of typing
MM: Concern about transitive namespaces automatically including each other. Can break a program.
JG: We want to avoid collisions. Thus we need reservation. Can be namespaces, or prefixes. I like that namespaces can be renamed/imported however you like. Don’t thin it is useful to split it up into different parts. Just a single import.
DS: Gives you everything in 450?
MM: I like the idea of user functions winning over stdlib functions
DN: I think that’s bad for portability over time. If I read the code I don’t know what will run.
JG: Not ideal, but does allow monkey patching.
DN: yuck
JG: One way to have both is a “std” namespace, or a way to import all of “std”. Then people can choose one to use. It does make copy & paste harder, because you’re not sure where things are coming from.
MM: This is similar to python? “import X” or “from X import *”
DS: Kai made a proposal for this in the issue. Means you know what you are getting.
JG: Or “import load from std”
PJ: Similar to JS too now.
JG: The “as” in JS is for import naming
discussion about JS imports
JG: I think we’re making progress here.
MM: I think the python design makes sense.
DS: So, get rid of glsl std450, make that wgsl.1. Move new stuff into wgsl.2. Numbers are for versioning. Then add the alternate import method as well.
MM: Can we avoid versioning? No other language has needed it for stdlib.
JG: Only reason we need versions is if a browser had implemented a new version and you wanted to detect it / rewrite.
DN: How do I tell if a shader uses a particular version? Or will run against an implementation?
MM: Compile would fail.
JG: Like C++.
JG: Maybe delay versioning. We’re close to a decision on namespaces.
DS: Call it wgsl:: now. If we need a version later, we can always add the .1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGSL 2020 08 25

Tentative Agenda

📋 Attendance

⚖️ Discussion

2020-08-18 Action item review

Look into validating grammar on CI (Apple)

Updated proposal for #854 (Myles)

Open Easy(?) Pull Requests

methods to camelCase 🐪🧳 #1016

popcount #968

MSAA (#1029)

Introduce Subgroup Operations Extension (#954)

Invariant qualifier (#893)

WGSL should require uniform storage class variables to be block decorated. (#1004)

Shader portability (and performance portability). (#895)

WorkGroup memory size and barriers? (#863)

Move texture methods into an import? (#1018)

Clone this wiki locally