-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Open
Description
This testcase demonstrates the net result of the register allocation pipeline is to produce 14 instructions to copy undefined lanes.
many-copies-of-undef-lanes.ll.zip
# Output of -stop-after=register-coalescer
---
name: copies_of_undef_lanes_tuple_copy
tracksRegLiveness: true
machineFunctionInfo:
isEntryFunction: true
stackPtrOffsetReg: '$sgpr32'
occupancy: 8
body: |
bb.0:
undef %41.sub0:sgpr_64 = S_MOV_B32 0
undef %42.sub9:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%42.sub8:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%41.sub1:sgpr_64 = COPY %41.sub0
%43:vreg_64_align2 = COPY %41
%47:sreg_64 = S_AND_B64 $exec, -1, implicit-def dead $scc
bb.1:
undef %67.sub1:vreg_64_align2 = COPY %42.sub9
%67.sub0:vreg_64_align2 = COPY %42.sub8
undef %52.sub0_sub1:vreg_512_align2 = nofpexcept V_PK_MUL_F32 8, %67, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
%42:areg_512_align2 = COPY %52
%42:areg_512_align2 = V_MFMA_F32_32X32X8F16_mac_e64 %43, %43, %42, 0, 0, 0, implicit $mode, implicit $exec
$vcc = COPY %47
S_CBRANCH_VCCNZ %bb.1, implicit killed $vcc
S_BRANCH %bb.2
bb.2:
S_ENDPGM 0
...
The problem is %42:areg_512_align2 = COPY %52. This expands to a set of 16 instructions to copy every lane. This could be rewritten to undef %42.sub0_sub1:areg_512_align2 = COPY %52, which should result in only 2 instructions for the live lanes.
In the general case, we should expand partially undefined copies similar to what SplitKit does, with a sequence of copies for the minimum set of live lanes. I'm not sure where this should go; I guess the coalescer could do it?