Fragment shader not writing to depth buffer correctly #314

s-perron · 2019-11-01T02:47:01Z

I have noticed a shader that does not write to the depth buffer correctly. It seems like two OpKills with a load of an image are needed. I find it very odd that the load of the image makes a difference.

I will attached a reduced version of the spir-v, and the gcn that was generated. In the gcn, there is a sample trace through the code that shows the problem. Please let me know if I have misunderstood any of the instructions. I'm still learning gcn.

s-perron · 2019-11-01T02:52:59Z

reduced.trace.gcn.txt
reduced.spvasm.txt

s-perron · 2019-11-01T03:36:04Z

I need to go back to the drawing board. The first instructions in BB0_4 is an OR, but I treated it like an AND.

linqun · 2019-11-01T05:44:05Z

What will happen if change the sample function from texture to textureLod?
use textureLod can avoid set whole quad mode (s_wqm_b64), and simplify the generated gcn code.

s-perron · 2019-11-01T16:47:57Z

After looking at it more, I believe the implicit lod is required for to see the problem. The s_wqm_b64 instruction is done at the start of the function. Presumebly because the implicit lod needs all threads in the quad to be active.

However, after the code related to the first OpKill, some threads will get turned off. When the image_sameple_b is reached, not every thread in quads with active threads will be active.

Does this sound right?

s-perron · 2019-11-01T18:56:05Z

After looking at it further, I believe that the "SI Whole Quad Mode" is not complete.

Non uniform control flow will cause a work group that is in whole quad mode to no longer be in whole quad mode because part of a quad could go one way and the rest another. However, there is no instruction to put it back into whole quad mode.

I can take a look at it next week if nobody is looking at it yet.

Here is another example:

               OpCapability Shader
               OpCapability SampledBuffer
               OpCapability StorageImageExtendedFormats
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %ShadePixel "main" %gl_FragCoord
               OpExecutionMode %ShadePixel OriginUpperLeft
               OpSource HLSL 600
               OpName %ShadePixel "ShadePixel"
               OpDecorate %gl_FragCoord BuiltIn FragCoord
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
       %bool = OpTypeBool
    %float_0 = OpConstant %float 0
%_ptr_Input_v4float = OpTypePointer Input %v4float
       %void = OpTypeVoid
         %26 = OpTypeFunction %void
%gl_FragCoord = OpVariable %_ptr_Input_v4float Input
 %ShadePixel = OpFunction %void None %26
         %27 = OpLabel
         %28 = OpLoad %v4float %gl_FragCoord
         %29 = OpCompositeExtract %float %28 0
         %30 = OpFOrdLessThan %bool %29 %float_0
               OpSelectionMerge %42 None
               OpBranchConditional %30 %42 %31
         %31 = OpLabel
         %37 = OpDPdx %v4float %28
         %38 = OpCompositeExtract %float %37 0
         %39 = OpFOrdLessThan %bool %38 %float_0
               OpSelectionMerge %40 None
               OpBranchConditional %39 %41 %40
         %41 = OpLabel
               OpKill
         %40 = OpLabel
               OpBranch %42
         %42 = OpLabel
               OpReturn
               OpFunctionEnd

s-perron · 2019-11-01T18:56:53Z

Here is the code that llpc generates for the new example

Disassembly of section .text:
_amdgpu_ps_main:
        s_mov_b64 s[0:1], exec                                     // 000000000000: BE80017E 
        s_wqm_b64 exec, exec                                       // 000000000004: BEFE077E 
        v_cmp_ngt_f32_e32 vcc, 0, v2                               // 000000000008: 7C960480 
        s_and_saveexec_b64 s[2:3], vcc                             // 00000000000C: BE82206A 
        s_xor_b64 s[2:3], exec, s[2:3]                             // 000000000010: 8882027E 
        s_cbranch_execz BB0_4                                      // 000000000014: BF88000D 

BB0_1:
        v_mov_b32_e32 v0, v2                                       // 000000000018: 7E000302 
        s_nop 1                                                    // 00000000001C: BF800001 
        v_mov_b32_dpp v0, v0  quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0// 000000000020: 7E0002FA FF085500 
        s_nop 1                                                    // 000000000028: BF800001 
        v_subrev_f32_dpp v0, v2, v0  quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0// 00000000002C: 060000FA FF080002 
        v_mov_b32_e32 v0, v0                                       // 000000000034: 7E000300 
        v_cmp_gt_f32_e32 vcc, 0, v0                                // 000000000038: 7C880080 
        s_and_saveexec_b64 s[4:5], vcc                             // 00000000003C: BE84206A 
        s_xor_b64 s[4:5], exec, s[4:5]                             // 000000000040: 8884047E 

BB0_2:
        s_mov_b64 exec, 0                                          // 000000000044: BEFE0180 

BB0_3:
        s_or_b64 exec, exec, s[4:5]                                // 000000000048: 87FE047E 

BB0_4:
        s_or_b64 exec, exec, s[2:3]                                // 00000000004C: 87FE027E 
        s_and_b64 exec, exec, s[0:1]                               // 000000000050: 86FE007E 
        v_mov_b32_e32 v0, 0                                        // 000000000054: 7E000280 
        exp mrt0 v0, off, off, off done vm                         // 000000000058: C4001801 00000000 
        s_endpgm                                                   // 000000000060: BF810000

perlfu · 2019-11-02T01:34:58Z

The "SI Whole Quad Mode" implementation currently only supports Vulkan's model of implicit derivatives. The behaviour of these is only defined for uniform control flow.

From SPIRV 1.5 spec.:

2.20 Code Motion
Texturing instructions in the Fragment Execution Model that rely on an implicit derivative cannot be moved into control flow that is not known to be uniform control flow within each derivative group.

I haven't stared hard at the shader here, but I suspect this limitation might be what you are running into.

perlfu · 2019-11-02T01:46:55Z

If this is the case, then you probably need to be using the DemoteToHelperInvocationEXT extension, substituting OpDemoteToHelperInvocationEXT for OpKill instructions.
Note that OpDemoteToHelperInvocationEXT is not a terminator, so it's not always a simple substitution as branches must converge.
I've been looking at adding a pass to do this automatically for shaders which rely on this kind of behaviour.

s-perron · 2019-11-04T14:24:23Z

Thanks for the explanation. That sound exactly like what is going on. This was an hlsl shader being compiled for Vulkan instead of directx. I now think this should be fixed in DXC. It should generate the new instruction instead of an OpKill.

Also the pass you are looking at writing might be better in spirv-tools. It could be used by anybody instead of being specific to AMD hardware. I'll see if it will be needed on our side.

s-perron closed this as completed Nov 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fragment shader not writing to depth buffer correctly #314

Fragment shader not writing to depth buffer correctly #314

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

linqun commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

perlfu commented Nov 2, 2019

perlfu commented Nov 2, 2019

s-perron commented Nov 4, 2019 •

edited

Loading

Fragment shader not writing to depth buffer correctly #314

Fragment shader not writing to depth buffer correctly #314

Comments

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

linqun commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

s-perron commented Nov 1, 2019

perlfu commented Nov 2, 2019

perlfu commented Nov 2, 2019

s-perron commented Nov 4, 2019 • edited Loading

s-perron commented Nov 4, 2019 •

edited

Loading