Skip to content

WGSL 2020 08 11

François Daoust edited this page Dec 2, 2020 · 1 revision
Dean Jackson


Dan Sinclair


⌨️ Scribe
Google Meet




WGSL Issues


Open Issues
Marked Issues


Meeting Issues

Tentative Agenda

  • 2020-08-04 Action item review:
    • Apple: Will look into validating grammar on CI.
    • JG: Will update builtin PR to fixup merge conflicts.
    • MM: Updated proposal for #854
    • DM: File an issue on module validation (#974)
  • Add popcount and reverse_bits integer built-in functions (#947)
  • Express popcount, reverse_bits as function prototypes (#968)
  • Forbid --x (#801)
  • Introduce Subgroup Operations Extension (#954)
  • Interface matching rules (#644)
  • Move array stride decoration to the struct fields instead of types (#960)
  • Support #line directive (#606)
  • Extern declarations (#883)
  • Validation of whole shader modules (#974)
  • Invariant qualifier (#893)
  • Function overloading (#876)
  • Support .length() for runtime-sized arrays (#989)
  • Shader portability (and performance portability). (#895)

📋 Attendance

WIP, the list of all the people invited to the meeting. In bold, the people that have been seen in the meeting:

  • Apple
    • Dean Jackson
    • Fil Pizlo
    • Myles C. Maxfield
    • Robin Morisset
  • Google
    • Dan Sinclair
    • David Neto
    • Kai Ninomiya
    • Ryan Harrison
    • Sarah Mashayekhi
  • Intel
    • Yunchao He
    • Narifumi Iwamoto
  • Microsoft
    • Damyan Pepper
    • Rafael Cintron
    • Greg Roth
    • Michael Dougherty
    • Tex Riddell
  • Mozilla
    • Dzmitry Malyshau
    • Jeff Gilbert
  • Joshua Groves
  • Mehmet Oguz Derin
  • Timo de Kort
  • Lukasz Pasek
  • Tyler Larson
  • Pelle Johnsen
  • Matijs Toonen
  • Hamada Gasmallah
  • Dominic Cerisano

⚖️ Discussion

2020-08-04 Action item review:

  • Apple: Will look into validating grammar on CI.
    • No progress.
  • JG: Will update builtin PR to fixup merge conflicts.
    • Done
  • MM: Updated proposal for #854
    • No progress
  • DM: File an issue on module validation (#974)
    • Done

Add popcount and reverse_bits integer built-in functions (#947)

  • DS: Basically adding more built-ins, and how they map. HLSL only has the unsigned versions but we can cast, do it, then cast back. Same with #968.
  • MM: We should add these as they’re useful to authors.
  • DM: Consider different names.
  • DJ: Better name?
  • DM: Maybe count_bits.
  • MM: popcount is term of art in at least one shading language and x86 instruction.
  • DM: That’s a convincing argument.
  • DS: I would like to make the names consistent - either add or remove one of the underscores
  • MM: prefer camel case
  • DS: Partly my fault that I’ve been inconsistent in naming.
  • JG: I will file an issue to make all the names consistent.
  • RESOLUTION: Accept these.

Express popcount, reverse_bits as function prototypes (#968)

  • DJ: Also covered by the above?
  • DS: Just a different way to expose the above.
  • DS: Once we get #947 merged, the editors can decide whether or not to merge in this?
  • RESOLUTION: Editors can decide if they want to merge this in.

Forbid --x (#801)

  • MM: Don’t suggest adding them, but want to reserve them. Add it to the lexer but not to any production in the grammar.
  • DS: OK with reserving. JG’s comment about “should --5 be allowed?”. Current grammar will parse the negative 5 part first as part of the number.
  • RM: I don’t think the issues are with literals, because it doesn’t make sense there.
  • DS: “--5” is just a silly way to type positive 5
  • RM: I would like people to get an error if they type “--x” (for now)
  • DS: Currently “a--5” would do the right thing. Should we make it a parse error?
  • RM: I suggest parse error.
  • DS: What about “a minus space minus 5”?
  • RM: That’s ok
  • JG: In generated/templated shaders, having “a--5” would be fairly easy to produce (accidentally).
  • JG: I feel like if we don’t add unary minus now, we likely won’t add it later. Seems like an unusual thing to add after MVP. We can make the call now.
  • RM: I disagree that we wouldn’t add it. Since it is syntactic sugar, we can easily add it if there is a lot of feedback requesting it.
  • JG: Difference between wanting to use it for familiarity and whether or not they need it. Many languages that don’t have it are ok. Then the postfix v prefix forms are a source of errors. From an understandability and usability perspective, we don’t need it.
  • MM: Since every other shading language has this, I think it is possible that we’d add it.
  • JG: I’m simply suggesting that we don’t have to make the decision now with the reasoning that we might want to add it later.
  • MM: It’s a cost-benefit analysis and we don’t know the answer yet.
  • JG: It’s that we don’t have consensus, not that we can’t know whether or not to add it right now. So yeah, we can reserve it.
  • DS: I’m ok without ++ and --, but I’ll note that we also do not have += and -=.
  • MM: We’re not talking about += because it would create an invalid program.
  • MM: I don’t think we need to mention the literals yet.
  • JG: I think it will all work out.
  • RESOLUTION: Reserve “--” and “++” for now

Introduce Subgroup Operations Extension (#954)

  • MD: Matched PR to reorg of spec. Have shown this is a direct translation to all 3 shading languages or IRs. Don’t think there is anything contentious left
  • MM: Need David to discuss?
  • DM: Yes.
  • RM: Question from comments, long debate about divergence and convergence and what I get is compilers from AMD and Nvidia compiles don’t guarantee reconvergence. So, how do graphics developers make it work, is there a case you are guaranteed reconvergence? Do you get it at the earliest time even though it isn’t in the spec? How can you know you’re in uniform control flow?
  • MD: In the case of simd groups no hardware providers guarantee it will be convergent. All the operations we have are implicit. If you have diverged and you have a sum … (missed) …. Doesn’t matter if it diverges or reconverges.
  • MM: Specific example, ballot takes 2 args, payload and which thread to read from. If you put in the 2nd arg a thread that isn't hitting the instruction then that’s undefined behaviour. How do you know which values are legal?
    MD: Don’t have such an instruction.
  • MM: I mean for native languages like HLSL. How would I know?
  • MD: I think it’s undefined behaviour.
  • RM: I think we can delay the debate until David is back. Will look more at the reconvergence problem and try to see if there are any hardware guarantees. Hard to guarantee uniformity as a compiler if the programmer can’t guarantee it.
  • MM: I thought SPIR-V had these annotations where control flow diverges and reconverges. If drivers that implement SPIR-V don’t have these guarantees we need this investigation.
  • MD: Correct concern, suggest making investigation on set of operations in PR instead of the full set in other APIs. PR is more restrictive.
  • RM: Makes sense.
  • MD: Also need to decide if we want the naming to be subgroup or simd.
  • MM: Need to decide if we’re going to add them first, functions more important than names.
  • SUMMARY: We will discuss this at a future meeting when David Neto is present

Interface matching rules (#644 and #974)

  • MM: Still waiting for Microsoft for #644. For #974 talked last week, unsure about the next step.
  • DM: Issues has proposal, next step is to decide if agree or disagree
  • MM: For #947
  • DM: Yes.
  • MM: For all errors will be reported at compilation or link time. Mapping is listed in the spec.
  • DM: Yes.
  • MM: I’m of 2 minds. First, would be convenient for implementers to let error float between complication and link and app authors will have to check both of them. The exact location of error isn’t super important. On the other hand, implementing this there is a fairly natural break of where errors would occur. If you want to create a MetalLibrary inside CreateShaderLibrary that requires certain error checking and forbids other error checking. So, I think either way is fine.
  • DM: I think the proposal is orthogonal to if you want to create an MSL Library at shader module creation. You can still parse and validate the shader module according to the rules listed. At pipeline creation you have to make sure a certain class of errors doesn’t stop the library creation.
  • MM: Right, don’t have to create the metal library in one place or another. We should design WGSL so you could do it at ShaderModuleCreation time. If you didn’t, then you wouldn't be upholding the portable performance desires. You’re right architecturally, but in reality all implementations will.
  • DM: I don’t think we can, right now. Unless we can patch the library after it’s built. The main reason is the binding model comes from Vulkan and has to map to Metal. That mapping depends on the pipeline layout, only known at link time, not compile time.
  • JG: Don’t think there is a real perf difference between the two places to create the MSL library. Both have to fail, both have to take enough time so they have promises in the fast path. Don’t think we have a forcing function there. Not on the performance side. So, I’d also add in WebGL we’ve had portability issues when having shaders that compile but don’t link, or don’t compile and dont’ link on different browsers. Not a big concern, but comes up occasionally. Typically, whatever browser they happened to write against is too strict or too liberal and the author thinks if the shader compiles it will link, or doesn't check compile status and just link results and info on the browser console is lacking. Minor portability concern but survivable.
  • DM: Possible big problem hiding there. May only have a shader module that links under certain runtime conditions. We’d possibly have two conditions with one failing and one succeeding. Don’t want an application to pass on one implementation and fail on another.
  • JG: Not quite following.
  • DM: You have 20 different pixel shaders for lighting and a scene has a specific type of light using a shader module and the player enters the scene and you create the pipeline and at that point if you deferred shader creation the pipeline creation would fail. Before that you created the shader module and everything is fine. A different implementation validates at shader module creation and fails at start.
  • JG: Because you assume compilation successes then link successes.
  • DM: Can’t assume that.
  • JG: That’s the error you’re talking about?
  • DM: Linking always has a class of errors. Linking can always fail.
  • JG: If allowed to say use an extension that doesn’t exist. You can fail at module or pipeline creation. An application could be written to assume without the extension compilation fails it will not use that extension. That’s the signal they use. That’s wrong if we defer that error to link time, and it will see shader compile succeed and then assume it can use htat extension. Is that what’ you're concerned about?
    DM: No, not features or extensions. Modules that don’t participate in pipeline creation. If you defer the user doesn’t know the module is broken.
  • JG: Yes. I don’t think that’s what we have, or we’d go with> Would never have a trivially broken entry point that we allow implementations to either fail early at compile with the assumption that you’ll use the entry point and other implementations deferring until usage. That part can’t be flexible.
  • DM: That’s what the issue is about. For each class of errors, this is the stage this needs to be validated. If you give any freedom the implementations can diverge. That’s a problem with the spec.
  • MM: Sounds like you’re agreeing and therefore the spec should dictate where the errors are.
  • DM: I think Jeff is saying WebGL showed we shouldn’t guarantee that shader module creation fails under certain conditions
  • JG: No, WebGL tells the opposite.
  • DM: Wonderful.
  • MM: Anyone else want to chime in?
  • …. [nothing]
  • MM: Everyone who spoke up are all in agreement so … resolved?
  • JG: Talking about the high level but haven’t discussed really [missed]
  • DM: Issues does go into the low level if you’d like to take time to review. Only saw one comment from Corentin so far.
  • DS: There is agreement that this is the correct approach. Next week we can decide if the error exposure is in the right place, and whether we want to turn it into a PR.
  • JG: I thought Myles wanted freedom to put errors in different places. That is something this issue proposal does not support
  • MM: It would be convenient but in reality probably wouldn't use it
  • JG: So, we can forbid it?
  • MM: Yes.
    DM: So move from needs discussion to needs spec
  • MM: Yes.
  • RESOLUTION: Accept this and move it into Needs Specification.

Move array stride decoration to the struct fields instead of types (#960)

  • DM: I think the arguments were strong and willing to withdraw the issue.
  • MM: Ok. Aren’t there unanswered questions, can you assign from two arrays with different strides?
  • DM: My understanding is this would be allowed. A type has an attachment, technically it’s the same type on the LHS and RHS, just different attachments. So type checking works. If implementation needs to change the stride it would be done automatically.
  • MM: Little scary for us because in Metal these will be different types because of inserted padding members. So, assignment from one to another isnt’ a memcpy where all the stuff is in registers. Would be a scatter / gather. Potential performance problem but haven’t measured so don’t know if it will be or not. The other interesting part is if you have a function that accepts an array as an argument is the Metal implementation going to have to specialize for each type of array passed into the function?
  • DM: Can’t you resolve the difference in the caller instead of in the function?
  • JG: So templated and inlined everywhere?
  • MM: Would be nice if we didn't have to inline if not required. If inlining could be a perf vs correctness decision.
  • DM: Do we know if HLSL has the same problem?
    MM: Can you stride an array? If they don’t they’d have to do the same as Metal.
  • JG: Isn't it a property of the type for SPIR-V? I think that makes it our default which is where we are now.
  • DM: I thought we’d have the decoration on an array on a host shareable structure and omit it in other places. Could lead to function called from a host-shareable and from a different host-shareable. David suggested this would work and we do the stride conversion itself.
  • MM: THe alternative is the author converts and the stride is part of the type. So, then if you want to make that function call with different stride arrays you as the author have to do the conversion. That is how Metal works now. If you have two arrays with different strides (meaning, different field types) then if you want to run the same function you have to write two functions. Although, in Metal you could use templates. I guess the argument holds for HLSL?
  • DM: Sounds like we need to talk with David and someone from Microsoft about this.
  • GR: Is the question about strides on matrices, HLSL has no issue with that.
  • DM: Arrays
  • GR: Arrays yes.
  • MM: How do I annotate that?
  • GR: You don’t.
  • MM: How do I do it, add padding members?
  • GR: If you want to do it, then yes.
  • MM: Then everything said is applicable to HLSL
  • DM: Can we restrict stride to not require padding members in HLSL or MSL and then we relax it later if we want?
  • MM: Can reopen if there is going to be one right answer for the stride. If there is only one set of things that makes it not fail we shouldn’t make the author fill out paper work to write their program.
  • DJ: Let’s come back to this next week.

Invariant qualifier (#893)

  • JG: I think this is basically, we don’t have this, we should have it. Investigation shows we could have a limited subset. We should spec what we can have.
  • DM: I thought we could say all position outputs are invariant and not have extra API. Is there a perf downside?
    MM: The perf downside is big enough it’s off by default in MSL. The reason is because the attribute means we disable, effectively, all optimizations that lead up to the variable.
  • JG: That sounds more like precise, possibly. Ok. Resolved, should add what we can.
  • DM: Should just say for MVP it’s enforce and we can add fine tuning.
  • JG: Not against, but so easy to add it.
  • DM: Source of foot guns for people. Whoever does it, it may or may not work until they realize there is a qualifier
  • MM: Most shaders don’t need it. Only certain shaders care about exact bit results.
  • JG: Uncommon you’d want this. People put work into optimizing when you don’t need it.
  • DM: Can make same argument that most shaders just do matrix multiply of positions. Only talking bout positions.
  • JG: Still much slower if a tonne of vertex operations. Definitely the state of art in other APIs is fast math and you can opt-in if you need invariant. Needing invariant is exceptional.
  • DM: Doesn’t invariant allow fast path?
    JG: Other way around. By default fast mode and invariant means not so fast that different calculations in different shaders doing the same ops give different results. And precise means ab == ba. Invariant means ab == ba which by default is not true, but faster.
  • DM: Will need to dig up more.
  • JG: Just imagine there are weird ops the shader does on a per-waveform basis.
  • DM: Talking about matrix mul. The common case.
  • JG: Based on the waveform the op is scheduled into the matrix op may use different hardware opcodes and get different precision. Happens today
  • DM: Recompiled for different ways?
  • JG: Different microops, yea. May not be exactly, but good way to think about it. a*b sent to different vertex invocations could get different slight errors, good enough on screen, but computations and seams to match, you need invariant.
  • DM: True, but would be surprised if it applied to this in particular.
  • GR: This comes up a lot
  • DM: As in it happens?
  • GR: Yes, developers ask us why it doesn't work and we tell them what was just described.
  • MM: So, does that mean we go slow mode, or that we tell them to use the qualifier?
  • GR: A lot of shader developers are used to requiring this. Should be fast by default.
  • DC: Does fast mean low precision?
  • MM: No.
  • JG: No necessarily precision but that the results could be different. But probably reduced precision. 3 levels, fast and loose. Fast but same results everywhere, maybe lower precision. Actually get a precise result. Talking about first 2 of a*b same in all invocations. Not true by defaut in any of our graphics APIs.
  • DJ: Can we resolve to add this for the ones we can support across languages, whith DM Still investigating?
    MM: Usually investigate before resolution
  • JG: Defer to next meeting.

Support #line directive (#606)

Extern declarations (#883)

Validation of whole shader modules (#974)

Function overloading (#876)

Support .length() for runtime-sized arrays (#989)

Shader portability (and performance portability). (#895)

Clone this wiki locally