Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957

Merged

Conversation

vpykhtin
Copy link
Contributor

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec

Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:

  • def in COPY has no reglass information from the instruction description.
  • use in V_LSHL_ADD_U32_e64 has VS_32 class from the instruction description.

Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:

Try to rewrite partial reg %0:SGPR_96
  sub1:SGPR_32 ; <- regclass for %1.sub1:sgpr_96
  sub1:SGPR_32 & VS_32 = SGPR_32 ; <- common class for %1.sub1:sgpr_96 and VS_32 from V_LSHL_ADD_U32_e64 opnd
  Shift 32, reg align 32
  sub1:SGPR_32 -> whole reg, num regclasses 1
  Success %0:SGPR_96 -> %4:SGPR_32

And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:

SGPR_128:sub0_sub1 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16

which isn't even allocatable.

I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 23, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Valery Pykhtin (vpykhtin)

Changes

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec

Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:

  • def in COPY has no reglass information from the instruction description.
  • use in V_LSHL_ADD_U32_e64 has VS_32 class from the instruction description.

Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:

Try to rewrite partial reg %0:SGPR_96
  sub1:SGPR_32 ; &lt;- regclass for %1.sub1:sgpr_96
  sub1:SGPR_32 &amp; VS_32 = SGPR_32 ; &lt;- common class for %1.sub1:sgpr_96 and VS_32 from V_LSHL_ADD_U32_e64 opnd
  Shift 32, reg align 32
  sub1:SGPR_32 -&gt; whole reg, num regclasses 1
  Success %0:SGPR_96 -&gt; %4:SGPR_32

And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:

SGPR_128:sub0_sub1 -&gt; SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16

which isn't even allocatable.

I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.


Patch is 321.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69957.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp (+30-29)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir (+15-16)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir (+1806-1806)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses.mir (+35-25)
diff --git a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
index 99db7e4af9fd1c9..c94e5c16fb2cf2e 100644
--- a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
@@ -101,17 +101,16 @@ class GCNRewritePartialRegUses : public MachineFunctionPass {
   /// find new regclass such that:
   ///   1. It has subregs obtained by shifting each OldSubReg by RShift number
   ///      of bits to the right. Every "shifted" subreg should have the same
-  ///      SubRegRC. SubRegRC can be null, in this case it initialized using
-  ///      getSubRegisterClass. If CoverSubregIdx is not zero it's a subreg that
-  ///      "covers" all other subregs in pairs. Basically such subreg becomes a
-  ///      whole register.
+  ///      SubRegRC. If CoverSubregIdx is not zero it's a subreg that "covers"
+  ///      all other subregs in pairs. Basically such subreg becomes a whole
+  ///      register.
   ///   2. Resulting register class contains registers of minimal size but not
   ///      less than RegNumBits.
   ///
   /// SubRegs is map of OldSubReg -> [SubRegRC, NewSubReg] and is used as in/out
   /// parameter:
   ///   OldSubReg - input parameter,
-  ///   SubRegRC  - in/out, should be changed for unknown regclass,
+  ///   SubRegRC  - input parameter (cannot be null),
   ///   NewSubReg - output, contains shifted subregs on return.
   const TargetRegisterClass *
   getRegClassWithShiftedSubregs(const TargetRegisterClass *RC, unsigned RShift,
@@ -228,19 +227,7 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
   BitVector ClassMask(getAllocatableAndAlignedRegClassMask(RCAlign));
   for (auto &[OldSubReg, SRI] : SubRegs) {
     auto &[SubRegRC, NewSubReg] = SRI;
-
-    // Register class may be unknown, for example:
-    //   undef %0.sub4:sgpr_1024 = S_MOV_B32 01
-    //   %0.sub5:sgpr_1024 = S_MOV_B32 02
-    //   %1:vreg_64 = COPY %0.sub4_sub5
-    // Register classes for subregs 'sub4' and 'sub5' are known from the
-    // description of destination operand of S_MOV_B32 instruction but the
-    // class for the subreg 'sub4_sub5' isn't specified by the COPY instruction.
-    if (!SubRegRC)
-      SubRegRC = TRI->getSubRegisterClass(RC, OldSubReg);
-
-    if (!SubRegRC)
-      return nullptr;
+    assert(SubRegRC);
 
     LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(OldSubReg) << ':'
                       << TRI->getRegClassName(SubRegRC)
@@ -248,6 +235,8 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
                       << " -> ");
 
     if (OldSubReg == CoverSubregIdx) {
+      // Covering subreg will become a full register, RC should be allocatable.
+      assert(SubRegRC->isAllocatable());
       NewSubReg = AMDGPU::NoSubRegister;
       LLVM_DEBUG(dbgs() << "whole reg");
     } else {
@@ -425,7 +414,7 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
     return false;
 
   for (MachineOperand &MO : Range) {
-    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit.
+    if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit. [1]
       return false;
   }
 
@@ -433,21 +422,33 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
   LLVM_DEBUG(dbgs() << "Try to rewrite partial reg " << printReg(Reg, TRI)
                     << ':' << TRI->getRegClassName(RC) << '\n');
 
-  // Collect used subregs and constrained reg classes infered from instruction
+  // Collect used subregs and their reg classes infered from instruction
   // operands.
   SubRegMap SubRegs;
   for (MachineOperand &MO : MRI->reg_nodbg_operands(Reg)) {
-    assert(MO.getSubReg() != AMDGPU::NoSubRegister);
-    auto *OpDescRC = getOperandRegClass(MO);
-    const auto [I, Inserted] = SubRegs.try_emplace(MO.getSubReg(), OpDescRC);
-    if (!Inserted && OpDescRC) {
-      SubRegInfo &SRI = I->second;
-      SRI.RC = SRI.RC ? TRI->getCommonSubClass(SRI.RC, OpDescRC) : OpDescRC;
-      if (!SRI.RC) {
-        LLVM_DEBUG(dbgs() << "  Couldn't find common target regclass\n");
-        return false;
+    const unsigned SubReg = MO.getSubReg();
+    assert(SubReg != AMDGPU::NoSubRegister); // Due to [1].
+    LLVM_DEBUG(dbgs() << "  " << TRI->getSubRegIndexName(SubReg) << ':');
+
+    const auto [I, Inserted] = SubRegs.try_emplace(SubReg);
+    const TargetRegisterClass *&SubRegRC = I->second.RC;
+
+    if (Inserted)
+      SubRegRC = TRI->getSubRegisterClass(RC, SubReg);
+
+    if (SubRegRC) {
+      if (const TargetRegisterClass *OpDescRC = getOperandRegClass(MO)) {
+        LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << " & "
+                          << TRI->getRegClassName(OpDescRC) << " = ");
+        SubRegRC = TRI->getCommonSubClass(SubRegRC, OpDescRC);
       }
     }
+
+    if (!SubRegRC) {
+      LLVM_DEBUG(dbgs() << "couldn't find target regclass\n");
+      return false;
+    } else
+      LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << '\n');
   }
 
   auto *NewRC = getMinSizeReg(RC, SubRegs);
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
index 4e1f912a9d6f98e..85d0c054754d03d 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
@@ -7,7 +7,6 @@
     unreachable, !dbg !11
   }
 
-  ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
   declare void @llvm.dbg.value(metadata, metadata, metadata) #0
 
   attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
@@ -36,21 +35,21 @@ name:            test_vreg_96_w64
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
-    ; CHECK-NEXT: DBG_VALUE %3.sub0, $noreg, !9, !DIExpression(), debug-location !11
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %3.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %3, debug-location !DILocation(line: 3, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %4.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %4, debug-location !DILocation(line: 6, column: 1, scope: !5)
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
-    ; CHECK-NEXT: S_NOP 0, implicit %5, debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub0, $noreg, !9, !DIExpression(), debug-location !11
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]], debug-location !DILocation(line: 3, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]], debug-location !DILocation(line: 6, column: 1, scope: !5)
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]], debug-location !DILocation(line: 6, column: 1, scope: !5)
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
     DBG_VALUE %0.sub0, $noreg, !9, !DIExpression(), debug-location !11
     %0.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
index d51e63f92e69141..037f39df8c3e06e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
@@ -6,26 +6,26 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_subregs_composition_vreg_1024
-    ; CHECK: undef %5.sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub0_sub1
-    ; CHECK-NEXT: S_NOP 0, implicit %5.sub1_sub2
-    ; CHECK-NEXT: undef %6.sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %6.sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub0_sub1_sub2
-    ; CHECK-NEXT: S_NOP 0, implicit %6.sub1_sub2_sub3
-    ; CHECK-NEXT: undef %7.sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
-    ; CHECK-NEXT: %7.sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub0_sub1_sub2_sub3
-    ; CHECK-NEXT: S_NOP 0, implicit %7.sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: undef %8.sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
-    ; CHECK-NEXT: %8.sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub0_sub1_sub2_sub3_sub4
-    ; CHECK-NEXT: S_NOP 0, implicit %8.sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: undef %9.sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
-    ; CHECK-NEXT: %9.sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub0_sub1_sub2_sub3_sub4_sub5
-    ; CHECK-NEXT: S_NOP 0, implicit %9.sub2_sub3_sub4_sub5_sub6_sub7
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub0_sub1
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub1_sub2
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub0_sub1_sub2
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub1_sub2_sub3
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub0_sub1_sub2_sub3
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub0_sub1_sub2_sub3_sub4
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub0_sub1_sub2_sub3_sub4_sub5
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub2_sub3_sub4_sub5_sub6_sub7
     undef %0.sub1:vreg_1024 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_1024 = V_MOV_B32_e32 02, implicit $exec
     S_NOP 0, implicit %0.sub1_sub2
@@ -97,12 +97,12 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_96_w64
-    ; CHECK: undef %2.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_96 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_96 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -140,15 +140,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -168,14 +168,14 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_128_w96
-    ; CHECK: undef %2.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %2.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %2.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %2
-    ; CHECK-NEXT: undef %3.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
     undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
     %0.sub2:vreg_128 = V_MOV_B32_e32 02, implicit $exec
@@ -215,15 +215,15 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w64
-    ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+    ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
+    ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
     undef %0.sub0:vreg_160 = V_MOV_B32_e32 00, implicit $exec
     %0.sub1:vreg_160 = V_MOV_B32_e32 01, implicit $exec
     S_NOP 0, implicit %0.sub0_sub1
@@ -243,18 +243,18 @@ tracksRegLiveness: true
 body:             |
   bb.0:
     ; CHECK-LABEL: name: test_vreg_160_w96
-    ; CHECK: undef %3.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
-    ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
-    ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %3
-    ; CHECK-NEXT: undef %4.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
-    ; CHECK-NEXT: %4.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
-    ; CHECK-NEXT: %4.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %4
-    ; CHECK-NEXT: undef %5.sub0:vreg_96 = V_MOV_B32_e32 22, implicit $exec
-    ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 23, implicit $exec
-    ; CHECK-NEXT: %5.sub2:vreg_96 = V_MOV_B32_e32 24, implicit $exec
-    ; CHECK-NEXT: S_NOP 0, implicit %5
+    ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+    ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+    ; CHECK-NEX...
[truncated]

@vpykhtin
Copy link
Contributor Author

@arsenm let's submit this and I create another PR to cleanup unnecessary register class deduction, since TRI->getSubRegisterClass works correctly now.

@vpykhtin
Copy link
Contributor Author

vpykhtin commented Nov 6, 2023

Ping.

1 similar comment
@vpykhtin
Copy link
Contributor Author

Ping.

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code change looks fine.
I left a few questions inline about existing code -- you don't necessarily need to address these.

Could you regenerate the tests as a separate NFC commit (or PR), so this PR only shows test changes from the code change?

llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp Outdated Show resolved Hide resolved
@@ -1,32 +1,31 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs -stop-after=rewrite-partial-reg-uses %s -o - | FileCheck -check-prefix=CHECK %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs %s -o /dev/null 2>&1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the MIR in the test isn't fully valid to be emitted in asm so I deleted this line.

@vpykhtin
Copy link
Contributor Author

I think the code change looks fine. I left a few questions inline about existing code -- you don't necessarily need to address these.

Could you regenerate the tests as a separate NFC commit (or PR), so this PR only shows test changes from the code change?

Thank you for the review, Carl!

I can regenerate the tests but I'm not sure how to do it best here: submit the updated tests, rebase this PR's branch on top of the commit and then force push the branch? Or I can simply merge main with the updated tests commit into this PR's branch?

@perlfu
Copy link
Contributor

perlfu commented Nov 16, 2023

I can regenerate the tests but I'm not sure how to do it best here: submit the updated tests, rebase this PR's branch on top of the commit and then force push the branch? Or I can simply merge main with the updated tests commit into this PR's branch?

I do not know what is expected/common practice, I suspect either approach is valid.
I have certainly seen people merge main before, but typically I opt for a rebase and force push.

@vpykhtin vpykhtin force-pushed the fix_rewrite_partial_reg_uses_conservative branch from e62a4d1 to 22a5879 Compare November 16, 2023 11:05
@vpykhtin
Copy link
Contributor Author

I do not know what is expected/common practice, I suspect either approach is valid. I have certainly seen people merge main before, but typically I opt for a rebase and force push.

Thanks, done.

Copy link
Contributor

@perlfu perlfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vpykhtin vpykhtin merged commit 667ba7f into llvm:main Nov 16, 2023
3 checks passed
sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
…ed instead of scalar. (llvm#69957)

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ...

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ...

Register class for %4 is incorrect: there should be sgpr_32 instead of
vgpr_32 because the original %1 had scalar regclass. This patch fixes
that.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
…ed instead of scalar. (llvm#69957)

For the following testcase:

undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ...

GCNRewritePartialRegUses produced:

%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ...

Register class for %4 is incorrect: there should be sgpr_32 instead of
vgpr_32 because the original %1 had scalar regclass. This patch fixes
that.

Note that GCNRewritePartialRegUses pass isn't enabled by default yet.
vpykhtin added a commit that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants