shaders: semantics of "discard" #361

dneto0 · 2019-07-09T22:21:06Z

GLSL and HLSL have long had a useful "discard" primitive. MSL has discard_fragment().
GLSL's discard maps to OpKill in SPIR-V.

In D3D discard, the invocation continues execution so it can later participate in computing derivatives.

In SPIR-V OpKill stops later side effects from occurring, and control flow becomes non-uniform. So you can't compute derivatives after executing OpKill.

SPIR-V has a recently proposed multi-vendor extension to add D3D-style discard, via SPV_EXT_demote_to_helper_invocation. See KhronosGroup/SPIRV-Registry#43 for details.

The MSL discard_fragment() builtin is similar to OpKill:

Multiple fragment threads or helper threads associated with a fragment thread execute
together to compute derivatives. If any (but not all) of these threads executes the
discard_fragment function, the behavior of any derivative computations (explicit or implicit)
is undefined.

krogovin · 2019-08-08T08:57:56Z

I'd advocate the following:

the default behavior is that derivative ops after discard might be correct or not.
as an extension, have a shader property that can specify to follow D3D's style discard.

This way, the MSL is safe. My personal 2 cents is that the D3D discard style can leave some performance on the floor for situations where a fragment shader has an early out via discard.

grorg · 2020-07-07T18:30:06Z

Discussed at the 2020-07-07 meeting.

kdashg · 2020-07-07T18:38:04Z

Resolved: discard/kill breaks uniform control flow. Since derivatives must be within UCF, this is safe on all backends.

dneto0 · 2020-07-09T18:35:59Z

Revisiting this:

In this weeks meeting @litherum raised the issue of blending of fragment outputs. From the minutes:

MM: The three different backend languages have different definitions of discard. Discard is different then an early return because the blending step after the fragment shader doesn’t execute if you hit a discard statement. Early return would still blend after the fragment shader.

I want to highlight that as an important requirement I hadn't captured in the summary above.
OpKill is an early return from the entry point, and also throws away the fragment so it has no further effects in the pipeline.
Vulkan 25. Fragment operations says:

Fragments can also be programmatically discarded in a fragment shader by executing OpKill.

Reviewing the MSL spec, I see that in Metal 2.3 the behaviour of discard_fragment() flipped from GLSL-discard semantics to HLSL-discard semantics:

Multiple fragment threads or helper threads associated with a fragment thread execute
together to compute derivatives. Prior to Metal 2.3, if any (but not all) of these threads executes
the discard_fragment function, the thread is terminated and the behavior of any derivative
computations (explicit or implicit) is undefined. Since Metal 2.3, discard_fragment marks
the fragment is terminated while continuing to execute in parallel to make derivative
computations well defined. Even though execution continues, the write behavior remains the
same as before. The fragment will discard the fragment output and discard all writes to buffer
or texture after discard_fragment.

Sure, that looks like it doesn't matter for WGSL because of our plan for an extra statically-checked uniformity rule.

However, there is a separating case which does come up in real life. Schematically in HLSL:

  while (true) {
      discard;
  }

If you have semantics of "super-return", i.e. terminate the invocation right away, then it terminates. Yay.
If you have "demote to helper" then you either have:
- run forever. (boo)
- look like you run forever, except that the implementation sees that when all invocations in the quad have been demoted to helper, then they all terminate themselves.

With a slightly more complex example you can coax one of the 4 invocations in a quad to run forever, hanging up the other ones.

int quad_idx = (int(gl_FragCoord.x) & 1) + (int(gl_FragCoord.y) & 1) * 2;
if (quad_idx == 0) {
   terminate; 
}
while (quad_idx < 8) {
   quad_idx *= 2;
}

The quad_idx will be 0 for a "top-left" fragment, and non-zero for the other 3 it's partnered with.
So with the demote-to-helper semantics you get infinite loop for the top-left, and its partners are stuck there as well.

My point is that there are considerations that go beyond uniformity-for-the-purpose-of-derivatives.

I think whatever path we choose, there will be work to do in translation to different back-ends.

magcius · 2020-07-09T18:41:19Z

Just to be clear, while (true) { discard; } in HLSL is an infinite loop, because HLSL's discard is not a "super-return".

magcius · 2020-07-09T18:45:19Z

Sure, that looks like it doesn't matter for WGSL because of our plan for an extra statically-checked uniformity rule.

So what you're saying is that code like this is invalid?

Output main(texcoord : TEXCOORD0) {
    float4 alphaTex = AlphaTex.Sample(texcoord);
    if (alphaTex.a <= 0.25) {
        // No need to process the rest of the shader, do expensive lighting, etc.
        discard;
        return;
    }

    float4 diffuse = DiffuseTex.Sample(texcoord);
    out.color = doExpensiveLighting(diffuse);
}

How would you suggest we write it? The goal is to kill the shader if all four helper invocations are dead, but still let them continue to interact for correct derivatives. Will we just be tanking performance and have to evaluate expensive lighting on all fragments?

dneto0 · 2020-07-09T19:30:44Z

Yes, that code is invalid in WGSL by committee decision from at least San Diego F2F in January 2019. There would be a conservative static analysis that would reject such code before being run.

I noted at the time that there were going to be applications that would not work, or would suffer. We left it at that. We didn't have metrics on hand, and the sentiment at the time was to error on consistency and portability.

Please do challenge that.

dneto0 · 2020-07-09T19:31:18Z

Just to be clear, while (true) { discard; } in HLSL is an infinite loop, because HLSL's discard is not a "super-return".

Except that some implementations will terminate. So whether it's spec'd or not, you get inconsistent behaviour from implementations.

magcius · 2020-07-09T19:33:27Z

Except that some implementations will terminate.

That's surprising to me. It will eventually terminate in a TDR, but I expect this to be something MS tests for compatibility. I know of a port house that had to emulate the DX12 behavior in Vulkan by translating all discards into while loops while porting games over.

dneto0 · 2020-07-09T20:01:23Z

Specifically, I'm told that NVIDIA drivers will terminate all the quad invocations when all four in the quad become helpers.

kvark · 2020-07-09T20:30:27Z

@magcius that's a good piece of alpha-testing code. I think it's fine for us to disallow it, in accordance to the decision we made for "discard" (that is makes control flow non-uniform). The users can work around it:

Output main(texcoord : TEXCOORD0) {
    float2 gradX = ddx(texcoord);
    float2 gradY = ddy(texcoord);
    float4 alphaTex = AlphaTex.SampleGrad(texcoord, gradX, gradY);
    if (alphaTex.a <= 0.25) {
        // No need to process the rest of the shader, do expensive lighting, etc.
        discard;
        return;
    }

    float4 diffuse = DiffuseTex.SampleGrad(texcoord, gradX, gradY);
    out.color = doExpensiveLighting(diffuse);
}

To clarify, this would work because SampleGrad doesn't rely on uniform control flow, and the gradients are obtained while we still are in the control flow, so calling ddx and ddy there is valid.

magcius · 2020-07-09T20:48:03Z

@kvark that's going to be more expensive. Submitting gradients alongside texture coordinates is double the amount of data going to the sampler unit. On Apple platforms, I believe it was substantially slower (I believe about 20% overhead?)

kvark · 2020-07-09T20:54:02Z

We can talk great length about the performance aspects of this workaround, but it is nevertheless a rather non-intrusive workaround that people can do. The other alternative is having non-portable shader code, which is not even on the table.

krogovin · 2020-07-10T06:58:10Z

The idea that some pretty standard shader constructs allowed in all other shading languages are not allowed in WGSL is going raise the barrier to entry IMO.

The examples in these threads all have really simple control flow; things can get much nastier later with more complicated control flow. I confess that I think that the goal of shaders are guaranteed to be portable across all GPU vendors (especially in mobile) is only possible on paper. Shader compilers in drivers are often strictly restricted with respect to control flow "depth" where the computation of that depth is GPU architecture specific and the limits also being GPU architecture specific. When that depth is exceeded one gets lovely messages like "internal compiler error" or "shader too complicated to compile".

Trying to write something that will detect non-uniform control flow I think is going to fail in that some uniform flows will be misidentified. I have a large battery of techniques that guarantee that the flow control is uniform within a primitive, but the flow control is generated by data read (per fragment). The upshot being that the shader is portable with the data I feed it, but not on arbitrary data.

Making the WGSL require that using derivative ops is uniform flow control for all data fed to it will make a fair number of techniques that I use (and have used all the way back to GL3) not possible.

To play devil's advocate, what the current situation with WebGL's discard and uniform flow control when it maps to Direct3D?

dneto0 · 2020-07-21T22:01:36Z

To play devil's advocate, what the current situation with WebGL's discard and uniform flow control when it maps to Direct3D?

That is to be discussed in #921

I confess that I think that the goal of shaders are guaranteed to be portable across all GPU vendors (especially in mobile) is only possible on paper.

This has been discussed at the community group. The cost of attaining consistent behaviour across implementation depends a lot on the particular feature you're talking about. We're dealing with them on a case by case basis.

We're trying to iterate on this. Regarding uniformity vs. discard, we've chosen on a path which is strictly conservative and which we know, as you point out, will reject techniques that are known to work in other APIs. We do know that there is a performance cost to it; we have not quantified the performance/power cost, and don't have a good way to understand the prevalence/importance of those techniques.

I can see a future where W3C releases an initial WebGPU which is more restrictive, and over time folks like you can quantify the importance for us, which then lets us revisit particular decisions with more data in hand.

magcius · 2020-07-22T00:26:55Z

This is where I will suggest "please talk to IHVs". You have no ability to meaningfully profile as an ISV, as the shader compiler can do magic beyond your recognition and thwart pretty much any benchmark. Only IHVs with knowledge of what things compile to can make informed decisions.

Metal intentionally switched to D3D semantics because it's what everybody wants. I have suggested before that emulation of D3D-like semantics is a good idea, and I still partially believe it. Perhaps opt-in?

krogovin · 2020-07-22T07:36:58Z

This is where I will suggest "please talk to IHVs". You have no ability to meaningfully profile as an ISV, as the shader compiler can do magic beyond your recognition and thwart pretty much any benchmark. Only IHVs with knowledge of what things compile to can make informed decisions.

This is spot on. Control flow is the most miserable part of making a shader compiler and some of the weirdness that can happen from control flow can be very GPU specific.

teoxoy · 2023-01-10T17:40:03Z

Considering #3176 landed, should this be closed?

kdashg · 2023-01-10T18:41:56Z

I think so.

dneto0 added the programming model label Jul 9, 2019

oscarbg mentioned this issue Jul 12, 2019

Support SPV_EXT_demote_to_helper_invocation KhronosGroup/SPIRV-Cross#1052

Closed

dneto0 mentioned this issue Apr 2, 2020

Task: Define the kill statement #676

Closed

grorg added the wgsl WebGPU Shading Language Issues label Jun 23, 2020

kdashg mentioned this issue Jul 7, 2020

s/kill/discard/ #911

Closed

dneto0 mentioned this issue Jul 21, 2020

Investigation: kill vs demoteToHelper behavior in native APIs #921

Closed

dneto0 mentioned this issue Mar 4, 2021

Describe discard statement #1476

Merged

Kangz added this to the V1.0 milestone Feb 2, 2022

kdashg added the wgsl resolved Resolved - waiting for a change to the WGSL specification label Apr 26, 2022

Kangz added the copyediting Pure editorial stuff (copyediting, *.bs file syntax, etc.) label Sep 20, 2022

Kangz removed the wgsl resolved Resolved - waiting for a change to the WGSL specification label Sep 20, 2022

kdashg closed this as completed Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shaders: semantics of "discard" #361

shaders: semantics of "discard" #361

dneto0 commented Jul 9, 2019

krogovin commented Aug 8, 2019

grorg commented Jul 7, 2020

kdashg commented Jul 7, 2020

dneto0 commented Jul 9, 2020

magcius commented Jul 9, 2020

magcius commented Jul 9, 2020 •

edited

Loading

dneto0 commented Jul 9, 2020

dneto0 commented Jul 9, 2020

magcius commented Jul 9, 2020

dneto0 commented Jul 9, 2020 •

edited

Loading

kvark commented Jul 9, 2020

magcius commented Jul 9, 2020 •

edited

Loading

kvark commented Jul 9, 2020

krogovin commented Jul 10, 2020

dneto0 commented Jul 21, 2020

magcius commented Jul 22, 2020

krogovin commented Jul 22, 2020

teoxoy commented Jan 10, 2023

kdashg commented Jan 10, 2023

shaders: semantics of "discard" #361

shaders: semantics of "discard" #361

Comments

dneto0 commented Jul 9, 2019

krogovin commented Aug 8, 2019

grorg commented Jul 7, 2020

kdashg commented Jul 7, 2020

dneto0 commented Jul 9, 2020

magcius commented Jul 9, 2020

magcius commented Jul 9, 2020 • edited Loading

dneto0 commented Jul 9, 2020

dneto0 commented Jul 9, 2020

magcius commented Jul 9, 2020

dneto0 commented Jul 9, 2020 • edited Loading

kvark commented Jul 9, 2020

magcius commented Jul 9, 2020 • edited Loading

kvark commented Jul 9, 2020

krogovin commented Jul 10, 2020

dneto0 commented Jul 21, 2020

magcius commented Jul 22, 2020

krogovin commented Jul 22, 2020

teoxoy commented Jan 10, 2023

kdashg commented Jan 10, 2023

magcius commented Jul 9, 2020 •

edited

Loading

dneto0 commented Jul 9, 2020 •

edited

Loading

magcius commented Jul 9, 2020 •

edited

Loading