-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Description
We (Modular) integrated LLVM commit 2108c623e618265c4146c405f196953a9c157e73 to our build which includes the changes to turn on amdgpu-uniform-intrinsic-combine by default (#162819).
The kernel located here regressed as a result: #166657 (comment)
Some of the readfirstlane intrinsics were removed from this kernel, but then later passes of the compiler ended up using VGPRs for some of the calculations, and this in turned triggered si-fix-sgpr-copies to generate code like this at every buffer load instruction:
.LBB0_9: ; Parent Loop BB0_7 Depth=1
; => This Inner Loop Header: Depth=2
v_readfirstlane_b32 s4, v0
v_readfirstlane_b32 s5, v1
v_readfirstlane_b32 s6, v254
v_readfirstlane_b32 s7, v255
v_cmp_eq_u64_e32 vcc, s[4:5], v[0:1]
s_nop 0
v_cmp_eq_u64_e64 s[2:3], s[6:7], v[254:255]
s_and_b64 s[2:3], vcc, s[2:3]
s_and_saveexec_b64 s[2:3], s[2:3]
buffer_load_dwordx4 v[6:9], v10, s[4:7], s48 offen
s_xor_b64 exec, exec, s[2:3]
s_cbranch_execnz .LBB0_9
; %bb.10: ; in Loop: Header=BB0_7 Depth=1
s_cmp_lg_u32 s55, 0
s_mov_b64 exec, s[38:39]
s_cselect_b32 s55, 1, 0
s_mov_b64 s[38:39], exec
.LBB0_11: We have worked around locally by turning off the pass, but the pass has value and we would like it to be enabled. Either the original determination that this was uniform was wrong or the instruction selection later should have picked a scalar form of some instruction to avoid generating the above sequence.
I was comparing the code from:
llc -O3 -mcpu=gfx950 mha.ll --amdgpu-enable-uniform-intrinsic-combine=false
llc -O3 -mcpu=gfx950 mha.ll --amdgpu-enable-uniform-intrinsic-combine=true