[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

vpykhtin · 2023-10-23T18:17:41Z

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec

Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:

def in COPY has no reglass information from the instruction description.
use in V_LSHL_ADD_U32_e64 has VS_32 class from the instruction description.

Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:

Try to rewrite partial reg %0:SGPR_96
  sub1:SGPR_32 ; <- regclass for %1.sub1:sgpr_96
  sub1:SGPR_32 & VS_32 = SGPR_32 ; <- common class for %1.sub1:sgpr_96 and VS_32 from V_LSHL_ADD_U32_e64 opnd
  Shift 32, reg align 32
  sub1:SGPR_32 -> whole reg, num regclasses 1
  Success %0:SGPR_96 -> %4:SGPR_32

And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:

SGPR_128:sub0_sub1 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16

which isn't even allocatable.

I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.

llvmbot · 2023-10-23T18:18:49Z

@llvm/pr-subscribers-backend-amdgpu

Author: Valery Pykhtin (vpykhtin)

Changes

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec

Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:

def in COPY has no reglass information from the instruction description.
use in V_LSHL_ADD_U32_e64 has VS_32 class from the instruction description.

Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:

Try to rewrite partial reg %0:SGPR_96
  sub1:SGPR_32 ; &lt;- regclass for %1.sub1:sgpr_96
  sub1:SGPR_32 &amp; VS_32 = SGPR_32 ; &lt;- common class for %1.sub1:sgpr_96 and VS_32 from V_LSHL_ADD_U32_e64 opnd
  Shift 32, reg align 32
  sub1:SGPR_32 -&gt; whole reg, num regclasses 1
  Success %0:SGPR_96 -&gt; %4:SGPR_32

And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:

SGPR_128:sub0_sub1 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16

which isn't even allocatable.

I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.

Patch is 321.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69957.diff

4 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp (+30-29)
(modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir (+15-16)
(modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir (+1806-1806)
(modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir (+35-25)

diff --git a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
index 99db7e4af9fd1c9..c94e5c16fb2cf2e 100644
--- a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
@@ -101,17 +101,16 @@ class GCNRewritePartialRegUses : public MachineFunctionPass {
   /// find new regclass such that:
   ///   1. It has subregs obtained by shifting each OldSubReg by RShift number
   ///      of bits to the right. Every "shifted" subreg should have the same
-  ///      SubRegRC. SubRegRC can be null, in this case it initialized using
-  ///      getSubRegisterClass. If CoverSubregIdx is not zero it's a subreg that
-  ///      "covers" all other subregs in pairs. Basically such subreg becomes a
-  ///      whole register.
+  ///      SubRegRC. If CoverSubregIdx is not zero it's a subreg that "covers"
+  ///      all other subregs in pairs. Basically such subreg becomes a whole
+  ///      register.
   ///   2. Resulting register class contains registers of minimal size but not
   ///      less than RegNumBits.
   ///
   /// SubRegs is map of OldSubReg -> [SubRegRC, NewSubReg] and is used as in/out
   /// parameter:
   ///   OldSubReg - input parameter,
-  ///   SubRegRC  - in/out, should be changed for unknown regclass,
+  ///   SubRegRC  - input parameter (cannot be null),
   ///   NewSubReg - output, contains shifted subregs on return.
   const TargetRegisterClass *
   getRegClassWithShiftedSubregs(const TargetRegisterClass *RC, unsigned RShift,
@@ -228,19 +227,7 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
   BitVector ClassMask(getAllocatableAndAlignedRegClassMask(RCAlign));
   for (auto &[OldSubReg, SRI] : SubRegs) {
     auto &[SubRegRC, NewSubReg] = SRI;
-
-    // Register class may be unknown, for example:
-    //   undef %0.sub4:sgpr_1024 = S_MOV_B32 01
-    //   %0.sub5:sgpr_1024 = S_MOV_B32 02
-    //   %1:vreg_64 = COPY %0.sub4_sub5
-    // Register classes for subregs 'sub4' and 'sub5' are known from the
-    // description of destination operand of S_MOV_B32 instruction but the
-    // class for the subreg 'sub4_sub5' isn't specified by the COPY instruction.
-    if (!SubRegRC)
-      SubRegRC = TRI->getSubRegisterClass(RC, OldSubReg);
-
-    if (!SubRegRC)
-      return nullptr;
+    assert(SubRegRC);
 
     LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(OldSubReg) << ':'
                       << TRI->getRegClassName(SubRegRC)
@@ -248,6 +235,8 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
                       << " -> ");
 
     if (OldSubReg == CoverSubregIdx) {
+      // Covering subreg will become a full register, RC should be allocatable.
+      assert(SubRegRC->isAllocatable());
       NewSubReg = AMDGPU::NoSubRegister;
       LLVM_DEBUG(dbgs() << "whole reg");
     } else {
@@ -425,7 +414,7 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
     return false;
 
   for (MachineOperand &MO : Range) {
-    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit.
+    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit. [1]
       return false;
   }
 
@@ -433,21 +422,33 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
   LLVM_DEBUG(dbgs() << "Try to rewrite partial reg " << printReg(Reg, TRI)
                     << ':' << TRI->getRegClassName(RC) << '\n');
 
-  // Collect used subregs and constrained reg classes infered from instruction
+  // Collect used subregs and their reg classes infered from instruction
   // operands.
   SubRegMap SubRegs;
   for (MachineOperand &MO : MRI->reg_nodbg_operands(Reg)) {
-    assert(MO.getSubReg() != AMDGPU::NoSubRegister);
-    auto *OpDescRC = getOperandRegClass(MO);
-    const auto [I, Inserted] = SubRegs.try_emplace(MO.getSubReg(), OpDescRC);
-    if (!Inserted && OpDescRC) {
-      SubRegInfo &SRI = I->second;
-      SRI.RC = SRI.RC ? TRI->getCommonSubClass(SRI.RC, OpDescRC) : OpDescRC;
-      if (!SRI.RC) {
-        LLVM_DEBUG(dbgs() << "  Couldn't find common target regclass\n");
-        return false;
+    const unsigned SubReg = MO.getSubReg();
+    assert(SubReg != AMDGPU::NoSubRegister); // Due to [1].
+    LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(SubReg) << ':');
+
+    const auto [I, Inserted] = SubRegs.try_emplace(SubReg);
+    const TargetRegisterClass *&SubRegRC = I->second.RC;
+
+    if (Inserted)
+      SubRegRC = TRI->getSubRegisterClass(RC, SubReg);
+
+    if (SubRegRC) {
+      if (const TargetRegisterClass *OpDescRC = getOperandRegClass(MO)) {
+        LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << " & "
+                          << TRI->getRegClassName(OpDescRC) << " = ");
+        SubRegRC = TRI->getCommonSubClass(SubRegRC, OpDescRC);
       }
     }
+
+    if (!SubRegRC) {
+      LLVM_DEBUG(dbgs() << "couldn't find target regclass\n");
+      return false;
+    } else
+      LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << '\n');
   }
 
   auto *NewRC = getMinSizeReg(RC, SubRegs);
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
index 4e1f912a9d6f98e..85d0c054754d03d 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
@@ -7,7 +7,6 @@
     unreachable, !dbg !11
   }
 
-  ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
   declare void @llvm.dbg.value(metadata, metadata, metadata) #0
 
   attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
@@ -36,21 +35,21 @@ name:            test_vreg_96_w64
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
-    ; CHECK-NEXT: DBG_VALUE %3.sub0, $noreg, !9, !DIExpression(), debug-location !11
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %3.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %3, debug-location !DILocation(line: 3, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %4, debug-location !DILocation(line: 6, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %5, debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub0, $noreg, !9, !DIExpression(), debug-location !11
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]], debug-location !DILocation(line: 3, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]], debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]], debug-location !DILocation(line: 6, column: 1, scope: !5)
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
     DBG_VALUE %0.sub0, $noreg, !9, !DIExpression(), debug-location !11
     %0.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
index d51e63f92e69141..037f39df8c3e06e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
@@ -6,26 +6,26 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_subregs_composition_vreg_1024
-    ; CHECK: undef %5.sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub0_sub1
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub1_sub2
-    ; CHECK-NEXT: undef %6.sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %6.sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub0_sub1_sub2
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub1_sub2_sub3
-    ; CHECK-NEXT: undef %7.sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
-    ; CHECK-NEXT: %7.sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub0_sub1_sub2_sub3
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: undef %8.sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
-    ; CHECK-NEXT: %8.sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub0_sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: undef %9.sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
-    ; CHECK-NEXT: %9.sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub0_sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub2_sub3_sub4_sub5_sub6_sub7
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub0_sub1
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub1_sub2
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub0_sub1_sub2
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub1_sub2_sub3
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub0_sub1_sub2_sub3
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub0_sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub0_sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub2_sub3_sub4_sub5_sub6_sub7
     undef %0.sub1:vreg_1024 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_1024 = V_MOV_B32_e32 02, implicit $exec
     S_NOP 0, implicit %0.sub1_sub2
@@ -97,12 +97,12 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %2.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_96 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -140,15 +140,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -168,14 +168,14 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w96
-    ; CHECK: undef %2.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %2.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_128 = V_MOV_B32_e32 02, implicit $exec
@@ -215,15 +215,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_160 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_160 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -243,18 +243,18 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w96
-    ; CHECK: undef %3.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %4.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_96 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub2:vreg_96 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEX...
[truncated]

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp

vpykhtin · 2023-10-31T20:35:46Z

@arsenm let's submit this and I create another PR to cleanup unnecessary register class deduction, since TRI->getSubRegisterClass works correctly now.

vpykhtin · 2023-11-06T16:57:26Z

Ping.

vpykhtin · 2023-11-13T13:11:38Z

Ping.

perlfu

I think the code change looks fine.
I left a few questions inline about existing code -- you don't necessarily need to address these.

Could you regenerate the tests as a separate NFC commit (or PR), so this PR only shows test changes from the code change?

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp

perlfu · 2023-11-15T07:59:27Z

llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir

@@ -1,32 +1,31 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs -stop-after=rewrite-partial-reg-uses %s -o - | FileCheck -check-prefix=CHECK %s
-# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs %s -o /dev/null 2>&1


Is this intentional?

Right, the MIR in the test isn't fully valid to be emitted in asm so I deleted this line.

vpykhtin · 2023-11-15T18:24:40Z

I think the code change looks fine. I left a few questions inline about existing code -- you don't necessarily need to address these.

Could you regenerate the tests as a separate NFC commit (or PR), so this PR only shows test changes from the code change?

Thank you for the review, Carl!

I can regenerate the tests but I'm not sure how to do it best here: submit the updated tests, rebase this PR's branch on top of the commit and then force push the branch? Or I can simply merge main with the updated tests commit into this PR's branch?

perlfu · 2023-11-16T03:30:01Z

I can regenerate the tests but I'm not sure how to do it best here: submit the updated tests, rebase this PR's branch on top of the commit and then force push the branch? Or I can simply merge main with the updated tests commit into this PR's branch?

I do not know what is expected/common practice, I suspect either approach is valid.
I have certainly seen people merge main before, but typically I opt for a rebase and force push.

vpykhtin · 2023-11-16T11:06:49Z

I do not know what is expected/common practice, I suspect either approach is valid. I have certainly seen people merge main before, but typically I opt for a rebase and force push.

Thanks, done.

perlfu

LGTM

…ed instead of scalar. (llvm#69957) For the following testcase: undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32 %3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ... GCNRewritePartialRegUses produced: %4:vgpr_32 = COPY undef %1:sgpr_32 dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ... Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This patch fixes that. Note that GCNRewritePartialRegUses pass isn't enabled by default yet.

Let's try once again after #69957 has landed.

llvmbot added the backend:AMDGPU label Oct 23, 2023

vpykhtin requested a review from arsenm October 23, 2023 18:18

arsenm reviewed Oct 24, 2023

View reviewed changes

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp Outdated Show resolved Hide resolved

perlfu reviewed Nov 15, 2023

View reviewed changes

vpykhtin added 2 commits November 16, 2023 11:53

add test to illustrate the problem

dc4538e

fix by taking original subreg regclass into account.

22a5879

vpykhtin force-pushed the fix_rewrite_partial_reg_uses_conservative branch from e62a4d1 to 22a5879 Compare November 16, 2023 11:05

perlfu approved these changes Nov 16, 2023

View reviewed changes

vpykhtin merged commit 667ba7f into llvm:main Nov 16, 2023
3 checks passed

vpykhtin mentioned this pull request Nov 21, 2023

[AMDGPU] Enable GCNRewritePartialRegUses pass by default. #72975

Merged

vpykhtin added a commit that referenced this pull request Dec 14, 2023

[AMDGPU] Enable GCNRewritePartialRegUses pass by default. (#72975)

dd05129

Let's try once again after #69957 has landed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

vpykhtin commented Oct 23, 2023

llvmbot commented Oct 23, 2023

vpykhtin commented Oct 31, 2023

vpykhtin commented Nov 6, 2023

vpykhtin commented Nov 13, 2023

perlfu left a comment

perlfu Nov 15, 2023

vpykhtin Nov 15, 2023

vpykhtin commented Nov 15, 2023

perlfu commented Nov 16, 2023

vpykhtin commented Nov 16, 2023

perlfu left a comment

[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

Conversation

vpykhtin commented Oct 23, 2023

llvmbot commented Oct 23, 2023

vpykhtin commented Oct 31, 2023

vpykhtin commented Nov 6, 2023

vpykhtin commented Nov 13, 2023

perlfu left a comment

Choose a reason for hiding this comment

perlfu Nov 15, 2023

Choose a reason for hiding this comment

vpykhtin Nov 15, 2023

Choose a reason for hiding this comment

vpykhtin commented Nov 15, 2023

perlfu commented Nov 16, 2023

vpykhtin commented Nov 16, 2023

perlfu left a comment

Choose a reason for hiding this comment