MachineSSAUpdater: use all vreg attributes instead of reg class only #78431

petar-avramovic · 2024-01-17T11:50:37Z

When initializing MachineSSAUpdater save all attributes of current
virtual register and create new virtual registers with same attributes.
Now new virtual registers have same both register class or bank and LLT.
Previously new virtual registers had same register class but LLT was not
set (LLT was set to default/empty LLT).
Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers
created by MachineSSAUpdater need to have both register class and LLT.

patch 4 from: #73337

llvmbot · 2024-01-17T11:51:08Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Petar Avramovic (petar-avramovic)

Changes

GlobalISel works with registers that could have register class, register bank and LLT as attributes.
When initializing MachineSSAUpdater save actual register and use it to clone attributes for all new registers instead of only using its register class.

patch 4 from: #73337

Patch is 95.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78431.diff

12 Files Affected:

(modified) llvm/include/llvm/CodeGen/MachineSSAUpdater.h (+2-3)
(modified) llvm/lib/CodeGen/MachineSSAUpdater.cpp (+19-25)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll (+53-46)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll (+130-68)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir (+54-54)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.ll (+84-33)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir (+48-48)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.ll (+49-33)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-i1.mir (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-temporal-divergent-reg.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergent-control-flow.ll (+25-18)

diff --git a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
index bbd09d7d151ba0..765cabdb313097 100644
--- a/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
+++ b/llvm/include/llvm/CodeGen/MachineSSAUpdater.h
@@ -40,8 +40,8 @@ class MachineSSAUpdater {
   //typedef DenseMap<MachineBasicBlock*, Register> AvailableValsTy;
   void *AV = nullptr;
 
-  /// VRC - Register class of the current virtual register.
-  const TargetRegisterClass *VRC = nullptr;
+  /// RegAttrs - current virtual register, new registers copy its attributes.
+  Register RegAttrs;
 
   /// InsertedPHIs - If this is non-null, the MachineSSAUpdater adds all PHI
   /// nodes that it creates to the vector.
@@ -62,7 +62,6 @@ class MachineSSAUpdater {
   /// Initialize - Reset this object to get ready for a new set of SSA
   /// updates.
   void Initialize(Register V);
-  void Initialize(const TargetRegisterClass *RC);
 
   /// AddAvailableValue - Indicate that a rewritten value is available at the
   /// end of the specified block with the specified value.
diff --git a/llvm/lib/CodeGen/MachineSSAUpdater.cpp b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
index 48076663ddf538..48537057e2031a 100644
--- a/llvm/lib/CodeGen/MachineSSAUpdater.cpp
+++ b/llvm/lib/CodeGen/MachineSSAUpdater.cpp
@@ -51,17 +51,13 @@ MachineSSAUpdater::~MachineSSAUpdater() {
 
 /// Initialize - Reset this object to get ready for a new set of SSA
 /// updates.
-void MachineSSAUpdater::Initialize(const TargetRegisterClass *RC) {
+void MachineSSAUpdater::Initialize(Register V) {
   if (!AV)
     AV = new AvailableValsTy();
   else
     getAvailableVals(AV).clear();
 
-  VRC = RC;
-}
-
-void MachineSSAUpdater::Initialize(Register V) {
-  Initialize(MRI->getRegClass(V));
+  RegAttrs = V;
 }
 
 /// HasValueForBlock - Return true if the MachineSSAUpdater already has a value for
@@ -115,13 +111,12 @@ Register LookForIdenticalPHI(MachineBasicBlock *BB,
 /// InsertNewDef - Insert an empty PHI or IMPLICIT_DEF instruction which define
 /// a value of the given register class at the start of the specified basic
 /// block. It returns the virtual register defined by the instruction.
-static
-MachineInstrBuilder InsertNewDef(unsigned Opcode,
-                           MachineBasicBlock *BB, MachineBasicBlock::iterator I,
-                           const TargetRegisterClass *RC,
-                           MachineRegisterInfo *MRI,
-                           const TargetInstrInfo *TII) {
-  Register NewVR = MRI->createVirtualRegister(RC);
+static MachineInstrBuilder InsertNewDef(unsigned Opcode, MachineBasicBlock *BB,
+                                        MachineBasicBlock::iterator I,
+                                        Register RegAttrs,
+                                        MachineRegisterInfo *MRI,
+                                        const TargetInstrInfo *TII) {
+  Register NewVR = MRI->cloneVirtualRegister(RegAttrs);
   return BuildMI(*BB, I, DebugLoc(), TII->get(Opcode), NewVR);
 }
 
@@ -158,9 +153,9 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
     if (ExistingValueOnly)
       return Register();
     // Insert an implicit_def to represent an undef value.
-    MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
-                                        BB, BB->getFirstTerminator(),
-                                        VRC, MRI, TII);
+    MachineInstr *NewDef =
+        InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstTerminator(),
+                     RegAttrs, MRI, TII);
     return NewDef->getOperand(0).getReg();
   }
 
@@ -197,8 +192,8 @@ Register MachineSSAUpdater::GetValueInMiddleOfBlock(MachineBasicBlock *BB,
 
   // Otherwise, we do need a PHI: insert one now.
   MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
-  MachineInstrBuilder InsertedPHI = InsertNewDef(TargetOpcode::PHI, BB,
-                                                 Loc, VRC, MRI, TII);
+  MachineInstrBuilder InsertedPHI =
+      InsertNewDef(TargetOpcode::PHI, BB, Loc, RegAttrs, MRI, TII);
 
   // Fill in all the predecessors of the PHI.
   for (unsigned i = 0, e = PredValues.size(); i != e; ++i)
@@ -300,10 +295,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
   static Register GetUndefVal(MachineBasicBlock *BB,
                               MachineSSAUpdater *Updater) {
     // Insert an implicit_def to represent an undef value.
-    MachineInstr *NewDef = InsertNewDef(TargetOpcode::IMPLICIT_DEF,
-                                        BB, BB->getFirstNonPHI(),
-                                        Updater->VRC, Updater->MRI,
-                                        Updater->TII);
+    MachineInstr *NewDef =
+        InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstNonPHI(),
+                     Updater->RegAttrs, Updater->MRI, Updater->TII);
     return NewDef->getOperand(0).getReg();
   }
 
@@ -312,9 +306,9 @@ class SSAUpdaterTraits<MachineSSAUpdater> {
   static Register CreateEmptyPHI(MachineBasicBlock *BB, unsigned NumPreds,
                                  MachineSSAUpdater *Updater) {
     MachineBasicBlock::iterator Loc = BB->empty() ? BB->end() : BB->begin();
-    MachineInstr *PHI = InsertNewDef(TargetOpcode::PHI, BB, Loc,
-                                     Updater->VRC, Updater->MRI,
-                                     Updater->TII);
+    MachineInstr *PHI =
+        InsertNewDef(TargetOpcode::PHI, BB, Loc, Updater->RegAttrs,
+                     Updater->MRI, Updater->TII);
     return PHI->getOperand(0).getReg();
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
index 38a4e81b5c2596..88508f7140b679 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll
@@ -1,6 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
 ; RUN: llc -global-isel -amdgpu-global-isel-risky-select -mtriple=amdgcn-amd-amdpal -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
-; REQUIRES: do-not-run-me
 
 ; Divergent phis that don't require lowering using lane mask merging
 
@@ -66,15 +65,16 @@ exit:
 define amdgpu_ps void @divergent_i1_phi_uniform_branch_simple(ptr addrspace(1) %out, i32 %tid, i32 inreg %cond) {
 ; GFX10-LABEL: divergent_i1_phi_uniform_branch_simple:
 ; GFX10:       ; %bb.0: ; %A
+; GFX10-NEXT:    v_cmp_le_u32_e64 s1, 6, v2
 ; GFX10-NEXT:    s_cmp_lg_u32 s0, 0
-; GFX10-NEXT:    s_cbranch_scc0 .LBB1_2
-; GFX10-NEXT:  ; %bb.1:
-; GFX10-NEXT:    v_cmp_le_u32_e64 s0, 6, v2
-; GFX10-NEXT:    s_branch .LBB1_3
-; GFX10-NEXT:  .LBB1_2: ; %B
-; GFX10-NEXT:    v_cmp_gt_u32_e64 s0, 1, v2
-; GFX10-NEXT:  .LBB1_3: ; %exit
-; GFX10-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s0
+; GFX10-NEXT:    s_cbranch_scc1 .LBB1_2
+; GFX10-NEXT:  ; %bb.1: ; %B
+; GFX10-NEXT:    v_cmp_gt_u32_e32 vcc_lo, 1, v2
+; GFX10-NEXT:    s_andn2_b32 s0, s1, exec_lo
+; GFX10-NEXT:    s_and_b32 s1, exec_lo, vcc_lo
+; GFX10-NEXT:    s_or_b32 s1, s0, s1
+; GFX10-NEXT:  .LBB1_2: ; %exit
+; GFX10-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s1
 ; GFX10-NEXT:    v_add_nc_u32_e32 v2, 2, v2
 ; GFX10-NEXT:    global_store_dword v[0:1], v2, off
 ; GFX10-NEXT:    s_endpgm
@@ -101,23 +101,27 @@ define void @divergent_i1_phi_used_inside_loop(float %val, ptr %addr) {
 ; GFX10-LABEL: divergent_i1_phi_used_inside_loop:
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    s_mov_b32 s4, 0
+; GFX10-NEXT:    s_mov_b32 s5, 0
 ; GFX10-NEXT:    v_mov_b32_e32 v3, 1
-; GFX10-NEXT:    v_mov_b32_e32 v4, s4
+; GFX10-NEXT:    v_mov_b32_e32 v4, s5
+; GFX10-NEXT:    ; implicit-def: $sgpr6
 ; GFX10-NEXT:  .LBB2_1: ; %loop
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v5, v4
 ; GFX10-NEXT:    v_xor_b32_e32 v3, 1, v3
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v5, v4
 ; GFX10-NEXT:    v_add_nc_u32_e32 v4, 1, v4
+; GFX10-NEXT:    v_and_b32_e32 v6, 1, v3
 ; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v5, v0
-; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
-; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:    v_cmp_ne_u32_e64 s4, 0, v6
+; GFX10-NEXT:    s_or_b32 s5, vcc_lo, s5
+; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
+; GFX10-NEXT:    s_and_b32 s4, exec_lo, s4
+; GFX10-NEXT:    s_or_b32 s6, s6, s4
+; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s5
 ; GFX10-NEXT:    s_cbranch_execnz .LBB2_1
 ; GFX10-NEXT:  ; %bb.2: ; %exit
-; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_and_b32_e32 v0, 1, v3
-; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v0
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
+; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s5
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s6
 ; GFX10-NEXT:    flat_store_dword v[1:2], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -144,44 +148,49 @@ define void @divergent_i1_phi_used_inside_loop_bigger_loop_body(float %val, floa
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    v_cmp_lt_f32_e32 vcc_lo, 1.0, v1
-; GFX10-NEXT:    s_mov_b32 s4, 0
-; GFX10-NEXT:    v_mov_b32_e32 v8, 0x3e8
-; GFX10-NEXT:    v_mov_b32_e32 v9, s4
-; GFX10-NEXT:    v_cndmask_b32_e64 v1, 0, 1, vcc_lo
+; GFX10-NEXT:    s_mov_b32 s5, 0
+; GFX10-NEXT:    v_mov_b32_e32 v1, 0x3e8
+; GFX10-NEXT:    v_mov_b32_e32 v8, s5
+; GFX10-NEXT:    ; implicit-def: $sgpr6
+; GFX10-NEXT:    v_cndmask_b32_e64 v9, 0, 1, vcc_lo
 ; GFX10-NEXT:    s_branch .LBB3_2
 ; GFX10-NEXT:  .LBB3_1: ; %loop_body
 ; GFX10-NEXT:    ; in Loop: Header=BB3_2 Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v10, v9
-; GFX10-NEXT:    v_xor_b32_e32 v1, 1, v1
-; GFX10-NEXT:    v_add_nc_u32_e32 v9, 1, v9
-; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v10, v0
-; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
-; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v9, v8
+; GFX10-NEXT:    s_xor_b32 s4, s4, -1
+; GFX10-NEXT:    v_add_nc_u32_e32 v8, 1, v8
+; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v9, v0
+; GFX10-NEXT:    v_cndmask_b32_e64 v9, 0, 1, s4
+; GFX10-NEXT:    s_or_b32 s5, vcc_lo, s5
+; GFX10-NEXT:    s_andn2_b32 s6, s6, exec_lo
+; GFX10-NEXT:    s_and_b32 s4, exec_lo, s4
+; GFX10-NEXT:    s_or_b32 s6, s6, s4
+; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s5
 ; GFX10-NEXT:    s_cbranch_execz .LBB3_6
 ; GFX10-NEXT:  .LBB3_2: ; %loop_start
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cmp_ge_i32_e32 vcc_lo, 0x3e8, v9
-; GFX10-NEXT:    s_mov_b32 s5, 1
+; GFX10-NEXT:    v_and_b32_e32 v9, 1, v9
+; GFX10-NEXT:    v_cmp_ge_i32_e32 vcc_lo, 0x3e8, v8
+; GFX10-NEXT:    s_mov_b32 s7, 1
+; GFX10-NEXT:    v_cmp_ne_u32_e64 s4, 0, v9
 ; GFX10-NEXT:    s_cbranch_vccz .LBB3_4
 ; GFX10-NEXT:  ; %bb.3: ; %else
 ; GFX10-NEXT:    ; in Loop: Header=BB3_2 Depth=1
-; GFX10-NEXT:    s_mov_b32 s5, 0
-; GFX10-NEXT:    flat_store_dword v[6:7], v8
+; GFX10-NEXT:    s_mov_b32 s7, 0
+; GFX10-NEXT:    flat_store_dword v[6:7], v1
 ; GFX10-NEXT:  .LBB3_4: ; %Flow
 ; GFX10-NEXT:    ; in Loop: Header=BB3_2 Depth=1
-; GFX10-NEXT:    s_xor_b32 s5, s5, 1
-; GFX10-NEXT:    s_and_b32 s5, s5, 1
-; GFX10-NEXT:    s_cmp_lg_u32 s5, 0
+; GFX10-NEXT:    s_xor_b32 s7, s7, 1
+; GFX10-NEXT:    s_and_b32 s7, s7, 1
+; GFX10-NEXT:    s_cmp_lg_u32 s7, 0
 ; GFX10-NEXT:    s_cbranch_scc1 .LBB3_1
 ; GFX10-NEXT:  ; %bb.5: ; %if
 ; GFX10-NEXT:    ; in Loop: Header=BB3_2 Depth=1
-; GFX10-NEXT:    flat_store_dword v[4:5], v8
+; GFX10-NEXT:    flat_store_dword v[4:5], v1
 ; GFX10-NEXT:    s_branch .LBB3_1
 ; GFX10-NEXT:  .LBB3_6: ; %exit
-; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_and_b32_e32 v0, 1, v1
-; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v0
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
+; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s5
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s6
 ; GFX10-NEXT:    flat_store_dword v[2:3], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -221,8 +230,8 @@ exit:
 define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3 x i32> inreg %.WorkgroupId, <3 x i32> %.LocalInvocationId) #0 {
 ; GFX10-LABEL: single_lane_execution_attribute:
 ; GFX10:       ; %bb.0: ; %.entry
-; GFX10-NEXT:    s_mov_b32 s12, 0
 ; GFX10-NEXT:    s_getpc_b64 s[4:5]
+; GFX10-NEXT:    s_mov_b32 s12, 0
 ; GFX10-NEXT:    s_mov_b32 s13, -1
 ; GFX10-NEXT:    s_mov_b32 s2, s0
 ; GFX10-NEXT:    s_and_b64 s[4:5], s[4:5], s[12:13]
@@ -230,7 +239,6 @@ define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3
 ; GFX10-NEXT:    v_mbcnt_lo_u32_b32 v1, -1, 0
 ; GFX10-NEXT:    s_or_b64 s[2:3], s[4:5], s[2:3]
 ; GFX10-NEXT:    s_load_dwordx8 s[4:11], s[2:3], 0x0
-; GFX10-NEXT:    s_mov_b32 s2, 1
 ; GFX10-NEXT:    v_mbcnt_hi_u32_b32 v1, -1, v1
 ; GFX10-NEXT:    v_lshlrev_b32_e32 v2, 2, v1
 ; GFX10-NEXT:    v_and_b32_e32 v3, 1, v1
@@ -257,13 +265,12 @@ define amdgpu_cs void @single_lane_execution_attribute(i32 inreg %.userdata0, <3
 ; GFX10-NEXT:    s_cbranch_vccnz .LBB4_2
 ; GFX10-NEXT:  ; %bb.3: ; %.preheader._crit_edge
 ; GFX10-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v2
+; GFX10-NEXT:    s_mov_b32 s13, 0
 ; GFX10-NEXT:    s_or_b32 s2, s0, vcc_lo
 ; GFX10-NEXT:    v_cndmask_b32_e64 v3, 0, 1, s2
-; GFX10-NEXT:    s_mov_b32 s2, 0
 ; GFX10-NEXT:  .LBB4_4: ; %Flow
-; GFX10-NEXT:    s_and_b32 s2, s2, 1
-; GFX10-NEXT:    s_cmp_lg_u32 s2, 0
-; GFX10-NEXT:    s_cbranch_scc0 .LBB4_6
+; GFX10-NEXT:    s_and_b32 vcc_lo, exec_lo, s13
+; GFX10-NEXT:    s_cbranch_vccz .LBB4_6
 ; GFX10-NEXT:  ; %bb.5: ; %.19
 ; GFX10-NEXT:    v_cndmask_b32_e64 v1, 0, 1, s0
 ; GFX10-NEXT:    v_or_b32_e32 v3, 2, v1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
index 56f2812b590a8d..a58773a7c9228a 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.mir
@@ -143,8 +143,8 @@ body: |
   ; GFX10-NEXT:   [[S_OR_B32_:%[0-9]+]]:sreg_32(s1) = S_OR_B32 [[S_ANDN2_B32_]](s1), [[S_AND_B32_]](s1), implicit-def $scc
   ; GFX10-NEXT: {{  $}}
   ; GFX10-NEXT: bb.2:
-  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32 = PHI [[ICMP]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1
-  ; GFX10-NEXT:   [[COPY6:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]]
+  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[ICMP]](s1), %bb.0, [[S_OR_B32_]](s1), %bb.1
+  ; GFX10-NEXT:   [[COPY6:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]](s1)
   ; GFX10-NEXT:   [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 2
   ; GFX10-NEXT:   [[C4:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
   ; GFX10-NEXT:   [[SELECT:%[0-9]+]]:_(s32) = G_SELECT [[COPY6]](s1), [[C4]], [[C3]]
@@ -202,11 +202,11 @@ body: |
   ; GFX10-NEXT: bb.1:
   ; GFX10-NEXT:   successors: %bb.2(0x04000000), %bb.1(0x7c000000)
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32 = PHI [[DEF]](s1), %bb.0, %22(s1), %bb.1
+  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[DEF]](s1), %bb.0, %22(s1), %bb.1
   ; GFX10-NEXT:   [[PHI1:%[0-9]+]]:_(s32) = G_PHI %7(s32), %bb.1, [[C1]](s32), %bb.0
   ; GFX10-NEXT:   [[PHI2:%[0-9]+]]:_(s32) = G_PHI [[C1]](s32), %bb.0, %9(s32), %bb.1
   ; GFX10-NEXT:   [[PHI3:%[0-9]+]]:_(s1) = G_PHI [[C]](s1), %bb.0, %11(s1), %bb.1
-  ; GFX10-NEXT:   [[COPY3:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]]
+  ; GFX10-NEXT:   [[COPY3:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]](s1)
   ; GFX10-NEXT:   [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 true
   ; GFX10-NEXT:   [[XOR:%[0-9]+]]:_(s1) = G_XOR [[PHI3]], [[C2]]
   ; GFX10-NEXT:   [[COPY4:%[0-9]+]]:sreg_32(s1) = COPY [[XOR]](s1)
@@ -297,11 +297,11 @@ body: |
   ; GFX10-NEXT: bb.1:
   ; GFX10-NEXT:   successors: %bb.4(0x40000000), %bb.2(0x40000000)
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32 = PHI [[DEF]](s1), %bb.0, %39(s1), %bb.5
+  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32(s1) = PHI [[DEF]](s1), %bb.0, %39(s1), %bb.5
   ; GFX10-NEXT:   [[PHI1:%[0-9]+]]:_(s32) = G_PHI %15(s32), %bb.5, [[C]](s32), %bb.0
   ; GFX10-NEXT:   [[PHI2:%[0-9]+]]:_(s32) = G_PHI [[C]](s32), %bb.0, %17(s32), %bb.5
   ; GFX10-NEXT:   [[PHI3:%[0-9]+]]:sreg_32(s1) = G_PHI [[FCMP]](s1), %bb.0, %19(s1), %bb.5
-  ; GFX10-NEXT:   [[COPY8:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]]
+  ; GFX10-NEXT:   [[COPY8:%[0-9]+]]:sreg_32(s1) = COPY [[PHI]](s1)
   ; GFX10-NEXT:   [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 true
   ; GFX10-NEXT:   [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 1000
   ; GFX10-NEXT:   [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(sle), [[PHI2]](s32), [[C3]]
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
index 6d29abafda4091..b07056f30af185 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.ll
@@ -1,6 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
 ; RUN: llc -global-isel -mtriple=amdgcn-amd-amdpal -amdgpu-global-isel-risky-select -mcpu=gfx1010 < %s | FileCheck -check-prefix=GFX10 %s
-; REQUIRES: do-not-run-me
 
 ; This file contains various tests that have divergent i1s used outside of
 ; the loop. These are lane masks is sgpr and need to have correct value in
@@ -17,22 +16,29 @@ define void @divergent_i1_phi_used_outside_loop(float %val, float %pre.cond.val,
 ; GFX10-NEXT:    v_cmp_lt_f32_e32 vcc_lo, 1.0, v1
 ; GFX10-NEXT:    s_mov_b32 s4, 0
 ; GFX10-NEXT:    v_mov_b32_e32 v1, s4
-; GFX10-NEXT:    v_cndmask_b32_e64 v4, 0, 1, vcc_lo
+; GFX10-NEXT:    s_andn2_b32 s5, s4, exec_lo
+; GFX10-NEXT:    s_and_b32 s6, exec_lo, vcc_lo
+; GFX10-NEXT:    s_or_b32 s6, s5, s6
+; GFX10-NEXT:    ; implicit-def: $sgpr5
 ; GFX10-NEXT:  .LBB0_1: ; %loop
 ; GFX10-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX10-NEXT:    v_cvt_f32_u32_e32 v6, v1
-; GFX10-NEXT:    v_mov_b32_e32 v5, v4
+; GFX10-NEXT:    v_cvt_f32_u32_e32 v4, v1
+; GFX10-NEXT:    s_xor_b32 s7, s6, -1
 ; GFX10-NEXT:    v_add_nc_u32_e32 v1, 1, v1
-; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v6, v0
-; GFX10-NEXT:    v_xor_b32_e32 v4, 1, v5
+; GFX10-NEXT:    v_cmp_gt_f32_e32 vcc_lo, v4, v0
 ; GFX10-NEXT:    s_or_b32 s4, vcc_lo, s4
+; GFX10-NEXT:    s_andn2_b32 s8, s6, exec_lo
+; GFX10-NEXT:    s_and_b32 s7, exec_lo, s7
+; GFX10-NEXT:    s_andn2_b32 s5, s5, exec_lo
+; GFX10-NEXT:    s_and_b32 s6, exec_lo, s6
+; GFX10-NEXT:    s_or_b32 s7, s8, s7
+; GFX10-NEXT:    s_or_b32 s5, s5, s6
+; GFX10-NEXT:    s_mov_b32 s6, s7
 ; GFX10-NEXT:    s_andn2_b32 exec_lo, exec_lo, s4
 ; GFX10-NEXT:    s_cbranch_execnz .LBB0_1
 ; GFX10-NEXT:  ; %bb.2: ; %exit
 ; GFX10-NEXT:    s_or_b32 exec_lo, exec_lo, s4
-; GFX10-NEXT:    v_and_b32_e32 v0, 1, v5
-; GFX10-NEXT:    v_cmp_ne_u32_e32 vcc_lo, 0, v0
-; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, vcc_lo
+; GFX10-NEXT:    v_cndmask_b32_e64 v0, 0, 1.0, s5
 ; GFX10-NEXT:    flat_store_dword v[2:3], v0
 ; GFX10-NEXT:    s_waitcnt lgkmcnt(0)
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
@@ -60,8 +66,11 @@ define void @divergent_i1_phi_used_outside_loop_larger_loop_body(float %val, ptr
 ; GFX10:       ; %bb.0: ; %entry
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    s_mov_b32 s4, -1
-; GFX10-NEXT:    v_mov_b32_e32 v5, 1
+; GFX10-NEXT:    ; implicit-def: $sgpr6
 ; GFX10-NEXT:    v_mov_b32_e32 v0, s4
+; GFX10-NEXT:    s_andn2_b32 s5, s4, exec_lo
+; GFX10-NEXT:    s_and_b32 s4, exec_lo, -1
+; GFX10-NEXT:    s_or_b32 s4, s5, s4
 ; GFX10-NEXT:    s_branch .LBB1_2
 ; GFX10-NEXT:  .LBB1_1: ; %loop.cond
 ; GFX10-NEXT:    ; in Loop: Header=BB1_2 Depth=1
@@ -70,20 +79,26 @@ define void @divergent_i1_phi_used_outside_loop_larger_loop_body(float %val, ptr
 ; GFX10-NEXT:    v_add_co_u32 v1, s4, v1, 4
 ; GFX10-NEXT:    v_add_co_ci_u32_e64 v2, s4, 0, v2, s4
 ; GFX10-NEXT:    v_cmp_le_i32_e32 vcc_lo, 10, v0
-; GFX10-NEXT:    v_cndmask_b32_e64 v5, 0, 1, s6
+; GFX10-NEXT:    s_andn2_b32 s7, s5, exec_lo
+; GFX10-NEXT:    s_and_b32 s8, exec_lo, s6
+; G...
[truncated]

arsenm · 2024-01-18T05:19:38Z

llvm/include/llvm/CodeGen/MachineSSAUpdater.h

-  /// VRC - Register class of the current virtual register.
-  const TargetRegisterClass *VRC = nullptr;
+  /// RegAttrs - current virtual register, new registers copy its attributes.
+  Register RegAttrs;


Relying on an existing register (which may mutate?) may be cumbersome. Maybe would be better to store the RegClassOrRegBank + LLT?

Proposed changes are in #73337, had to update amdgpu to do the same

llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-phis-no-lane-mask-merging.ll

arsenm · 2024-01-18T05:21:17Z

llvm/lib/CodeGen/MachineSSAUpdater.cpp

-                                        BB, BB->getFirstTerminator(),
-                                        VRC, MRI, TII);
+    MachineInstr *NewDef =
+        InsertNewDef(TargetOpcode::IMPLICIT_DEF, BB, BB->getFirstTerminator(),


If it's a generic virtual register, should switch to G_IMPLICIT_DEF/G_PHI

arsenm

Seems out of date, and needs to swap in G_* opcodes for generic registers

petar-avramovic · 2024-02-05T13:33:42Z

Seems out of date, and needs to swap in G_* opcodes for generic registers

For global-isel lane-mask registers are not generic registers, they have regclass + LLT instead of only regclass.
PHI lane masks should be fully instruction selected (all reg operands need to have reg class) when they are introduced.
Undef lane-masks should also be inst-selected, so use of IMPLICIT_DEF is fine.

arsenm · 2024-02-05T13:35:45Z

Seems out of date, and needs to swap in G_* opcodes for generic registers

For global-isel lane-mask registers are not generic registers, they have regclass + LLT instead of only regclass. PHI lane masks should be fully instruction selected (all reg operands need to have reg class) when they are introduced. Undef lane-masks should also be inst-selected, so use of IMPLICIT_DEF is fine.

If that's the case, then this patch is unnecessary? Allowing MachineSSAUpdater in GlobalISel means handling generic virtual registers, and emitting the appropriate G_* opcodes instead of the post-selection equivalents.

arsenm · 2024-02-05T13:38:10Z

Seems out of date, and needs to swap in G_* opcodes for generic registers

For global-isel lane-mask registers are not generic registers, they have regclass + LLT instead of only regclass. PHI lane masks should be fully instruction selected (all reg operands need to have reg class) when they are introduced. Undef lane-masks should also be inst-selected, so use of IMPLICIT_DEF is fine.

If that's the case, then this patch is unnecessary? Allowing MachineSSAUpdater in GlobalISel means handling generic virtual registers, and emitting the appropriate G_* opcodes instead of the post-selection equivalents.

I guess preserve-LLT is one property, but is much more narrow than the title suggests. The title implies it will handle emitting G_* opcodes

When initializing MachineSSAUpdater save all attributes of current virtual register and create new virtual registers with same attributes. Now new virtual registers have same both register class or bank and LLT. Previously new virtual registers had same register class but LLT was not set (LLT was set to default/empty LLT). Required by GlobalISel for AMDGPU, new 'lane mask' virtual registers created by MachineSSAUpdater need to have both register class and LLT. patch 4 from: llvm#73337

petar-avramovic · 2024-02-21T14:52:44Z

Rebase and ping, previously updated commit summary to not mention adapt for global-isel

Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: llvm#75340 patch 2: llvm#75349 patch 3: llvm#80003 patch 4: llvm#78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.

Basic implementation of lane mask merging for GlobalISel. Lane masks on GlobalISel are registers with sgpr register class and S1 LLT - required by machine uniformity analysis. Implements equivalent of lowerPhis from SILowerI1Copies.cpp in: patch 1: #75340 patch 2: #75349 patch 3: #80003 patch 4: #78431 patch 5: is in this commit: AMDGPU/GlobalISelDivergenceLowering: constrain incoming registers Previously, in PHIs that represent lane masks, incoming registers taken as-is were not selected as lane masks. Such registers are not being merged with another lane mask and most often only have S1 LLT. Implement constrainAsLaneMask by constraining incoming registers taken as-is with lane mask attributes, essentially transforming them to lane masks. This is final step in having PHI instructions created in this pass to be fully instruction-selected.

llvmbot added backend:AMDGPU llvm:globalisel labels Jan 17, 2024

petar-avramovic requested review from jayfoad, arsenm, nhaehnle and Pierre-vh January 17, 2024 11:51

petar-avramovic mentioned this pull request Jan 17, 2024

AMDGPU/GlobalISel: lane masks merging #73337

Merged

arsenm reviewed Jan 18, 2024

View reviewed changes

arsenm requested changes Feb 5, 2024

View reviewed changes

petar-avramovic force-pushed the global-isel-lane-masks-patch-4 branch from 2e3de82 to 37966ec Compare February 5, 2024 13:28

petar-avramovic force-pushed the global-isel-lane-masks-patch-4 branch from 37966ec to 48b2e50 Compare February 5, 2024 14:20

petar-avramovic changed the title ~~GlobalISel: adapt MachineSSAUpdater for use in GlobalISel path~~ MachineSSAUpdater: use all vreg attributes instead of reg class only Feb 5, 2024

petar-avramovic force-pushed the global-isel-lane-masks-patch-4 branch from 48b2e50 to dcdc98f Compare February 21, 2024 14:50

arsenm approved these changes Feb 25, 2024

View reviewed changes

petar-avramovic merged commit 433f8e7 into llvm:main Feb 26, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MachineSSAUpdater: use all vreg attributes instead of reg class only #78431

MachineSSAUpdater: use all vreg attributes instead of reg class only #78431

petar-avramovic commented Jan 17, 2024 •

edited

llvmbot commented Jan 17, 2024 •

edited

arsenm Jan 18, 2024

petar-avramovic Jan 18, 2024

arsenm Jan 18, 2024

arsenm left a comment

petar-avramovic commented Feb 5, 2024

arsenm commented Feb 5, 2024

arsenm commented Feb 5, 2024

petar-avramovic commented Feb 21, 2024

MachineSSAUpdater: use all vreg attributes instead of reg class only #78431

MachineSSAUpdater: use all vreg attributes instead of reg class only #78431

Conversation

petar-avramovic commented Jan 17, 2024 • edited

llvmbot commented Jan 17, 2024 • edited

arsenm Jan 18, 2024

Choose a reason for hiding this comment

petar-avramovic Jan 18, 2024

Choose a reason for hiding this comment

arsenm Jan 18, 2024

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

petar-avramovic commented Feb 5, 2024

arsenm commented Feb 5, 2024

arsenm commented Feb 5, 2024

petar-avramovic commented Feb 21, 2024

petar-avramovic commented Jan 17, 2024 •

edited

llvmbot commented Jan 17, 2024 •

edited