RWStructuredBuffer<uint> Out : register(u0);
[numthreads(8,8,1)]
void main(uint3 TID : SV_GroupThreadID) {
for (uint i = 0; i < 8; i++) {
for (uint j = 0; j < 8; j++) {
if (i == TID.x && j == TID.y) {
uint index = TID.x * 8 + TID.y;
Out[index] = WaveActiveMax(index);
break;
}
}
}
}
Found while working on #164496. Child of #136930.
The issues are potentially with SimplifyCFG, JumpThreading, and GVN passes, where control convergence instructions are folded incorrectly.
Loop intrinsic cannot be preceded by a convergent operation in the same basic block.
%4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ]
in function main