-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957
[AMDGPU] Fix GCNRewritePartialRegUses pass: vector regclass is selected instead of scalar. #69957
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Valery Pykhtin (vpykhtin) ChangesFor the following testcase: undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32 GCNRewritePartialRegUses produced: %4:vgpr_32 = COPY undef %1:sgpr_32 Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:
Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:
And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:
which isn't even allocatable. I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works. Note that GCNRewritePartialRegUses pass isn't enabled by default yet. Patch is 321.63 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69957.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
index 99db7e4af9fd1c9..c94e5c16fb2cf2e 100644
--- a/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRewritePartialRegUses.cpp
@@ -101,17 +101,16 @@ class GCNRewritePartialRegUses : public MachineFunctionPass {
/// find new regclass such that:
/// 1. It has subregs obtained by shifting each OldSubReg by RShift number
/// of bits to the right. Every "shifted" subreg should have the same
- /// SubRegRC. SubRegRC can be null, in this case it initialized using
- /// getSubRegisterClass. If CoverSubregIdx is not zero it's a subreg that
- /// "covers" all other subregs in pairs. Basically such subreg becomes a
- /// whole register.
+ /// SubRegRC. If CoverSubregIdx is not zero it's a subreg that "covers"
+ /// all other subregs in pairs. Basically such subreg becomes a whole
+ /// register.
/// 2. Resulting register class contains registers of minimal size but not
/// less than RegNumBits.
///
/// SubRegs is map of OldSubReg -> [SubRegRC, NewSubReg] and is used as in/out
/// parameter:
/// OldSubReg - input parameter,
- /// SubRegRC - in/out, should be changed for unknown regclass,
+ /// SubRegRC - input parameter (cannot be null),
/// NewSubReg - output, contains shifted subregs on return.
const TargetRegisterClass *
getRegClassWithShiftedSubregs(const TargetRegisterClass *RC, unsigned RShift,
@@ -228,19 +227,7 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
BitVector ClassMask(getAllocatableAndAlignedRegClassMask(RCAlign));
for (auto &[OldSubReg, SRI] : SubRegs) {
auto &[SubRegRC, NewSubReg] = SRI;
-
- // Register class may be unknown, for example:
- // undef %0.sub4:sgpr_1024 = S_MOV_B32 01
- // %0.sub5:sgpr_1024 = S_MOV_B32 02
- // %1:vreg_64 = COPY %0.sub4_sub5
- // Register classes for subregs 'sub4' and 'sub5' are known from the
- // description of destination operand of S_MOV_B32 instruction but the
- // class for the subreg 'sub4_sub5' isn't specified by the COPY instruction.
- if (!SubRegRC)
- SubRegRC = TRI->getSubRegisterClass(RC, OldSubReg);
-
- if (!SubRegRC)
- return nullptr;
+ assert(SubRegRC);
LLVM_DEBUG(dbgs() << " " << TRI->getSubRegIndexName(OldSubReg) << ':'
<< TRI->getRegClassName(SubRegRC)
@@ -248,6 +235,8 @@ GCNRewritePartialRegUses::getRegClassWithShiftedSubregs(
<< " -> ");
if (OldSubReg == CoverSubregIdx) {
+ // Covering subreg will become a full register, RC should be allocatable.
+ assert(SubRegRC->isAllocatable());
NewSubReg = AMDGPU::NoSubRegister;
LLVM_DEBUG(dbgs() << "whole reg");
} else {
@@ -425,7 +414,7 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
return false;
for (MachineOperand &MO : Range) {
- if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit.
+ if (MO.getSubReg() == AMDGPU::NoSubRegister) // Whole reg used, quit. [1]
return false;
}
@@ -433,21 +422,33 @@ bool GCNRewritePartialRegUses::rewriteReg(Register Reg) const {
LLVM_DEBUG(dbgs() << "Try to rewrite partial reg " << printReg(Reg, TRI)
<< ':' << TRI->getRegClassName(RC) << '\n');
- // Collect used subregs and constrained reg classes infered from instruction
+ // Collect used subregs and their reg classes infered from instruction
// operands.
SubRegMap SubRegs;
for (MachineOperand &MO : MRI->reg_nodbg_operands(Reg)) {
- assert(MO.getSubReg() != AMDGPU::NoSubRegister);
- auto *OpDescRC = getOperandRegClass(MO);
- const auto [I, Inserted] = SubRegs.try_emplace(MO.getSubReg(), OpDescRC);
- if (!Inserted && OpDescRC) {
- SubRegInfo &SRI = I->second;
- SRI.RC = SRI.RC ? TRI->getCommonSubClass(SRI.RC, OpDescRC) : OpDescRC;
- if (!SRI.RC) {
- LLVM_DEBUG(dbgs() << " Couldn't find common target regclass\n");
- return false;
+ const unsigned SubReg = MO.getSubReg();
+ assert(SubReg != AMDGPU::NoSubRegister); // Due to [1].
+ LLVM_DEBUG(dbgs() << " " << TRI->getSubRegIndexName(SubReg) << ':');
+
+ const auto [I, Inserted] = SubRegs.try_emplace(SubReg);
+ const TargetRegisterClass *&SubRegRC = I->second.RC;
+
+ if (Inserted)
+ SubRegRC = TRI->getSubRegisterClass(RC, SubReg);
+
+ if (SubRegRC) {
+ if (const TargetRegisterClass *OpDescRC = getOperandRegClass(MO)) {
+ LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << " & "
+ << TRI->getRegClassName(OpDescRC) << " = ");
+ SubRegRC = TRI->getCommonSubClass(SubRegRC, OpDescRC);
}
}
+
+ if (!SubRegRC) {
+ LLVM_DEBUG(dbgs() << "couldn't find target regclass\n");
+ return false;
+ } else
+ LLVM_DEBUG(dbgs() << TRI->getRegClassName(SubRegRC) << '\n');
}
auto *NewRC = getMinSizeReg(RC, SubRegs);
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
index 4e1f912a9d6f98e..85d0c054754d03d 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-dbg.mir
@@ -7,7 +7,6 @@
unreachable, !dbg !11
}
- ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare void @llvm.dbg.value(metadata, metadata, metadata) #0
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
@@ -36,21 +35,21 @@ name: test_vreg_96_w64
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_96_w64
- ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
- ; CHECK-NEXT: DBG_VALUE %3.sub0, $noreg, !9, !DIExpression(), debug-location !11
- ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
- ; CHECK-NEXT: DBG_VALUE %3.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
- ; CHECK-NEXT: S_NOP 0, implicit %3, debug-location !DILocation(line: 3, column: 1, scope: !5)
- ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
- ; CHECK-NEXT: DBG_VALUE %4.sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
- ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
- ; CHECK-NEXT: DBG_VALUE %4.sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
- ; CHECK-NEXT: S_NOP 0, implicit %4, debug-location !DILocation(line: 6, column: 1, scope: !5)
- ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
- ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
- ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
- ; CHECK-NEXT: DBG_VALUE %5, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
- ; CHECK-NEXT: S_NOP 0, implicit %5, debug-location !DILocation(line: 6, column: 1, scope: !5)
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub0, $noreg, !9, !DIExpression(), debug-location !11
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 2, column: 1, scope: !5)
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]], debug-location !DILocation(line: 3, column: 1, scope: !5)
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub0, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_1]].sub1, $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]], debug-location !DILocation(line: 6, column: 1, scope: !5)
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec, debug-location !DILocation(line: 4, column: 1, scope: !5)
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !5)
+ ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec, debug-location !DILocation(line: 5, column: 1, scope: !5)
+ ; CHECK-NEXT: DBG_VALUE [[V_MOV_B32_e32_2]], $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !5)
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]], debug-location !DILocation(line: 6, column: 1, scope: !5)
undef %0.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec, debug-location !11
DBG_VALUE %0.sub0, $noreg, !9, !DIExpression(), debug-location !11
%0.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec, debug-location !DILocation(line: 2, column: 1, scope: !5)
diff --git a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
index d51e63f92e69141..037f39df8c3e06e 100644
--- a/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
+++ b/llvm/test/CodeGen/AMDGPU/rewrite-partial-reg-uses-gen.mir
@@ -6,26 +6,26 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_subregs_composition_vreg_1024
- ; CHECK: undef %5.sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %5.sub0_sub1
- ; CHECK-NEXT: S_NOP 0, implicit %5.sub1_sub2
- ; CHECK-NEXT: undef %6.sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %6.sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %6.sub0_sub1_sub2
- ; CHECK-NEXT: S_NOP 0, implicit %6.sub1_sub2_sub3
- ; CHECK-NEXT: undef %7.sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
- ; CHECK-NEXT: %7.sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %7.sub0_sub1_sub2_sub3
- ; CHECK-NEXT: S_NOP 0, implicit %7.sub1_sub2_sub3_sub4
- ; CHECK-NEXT: undef %8.sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
- ; CHECK-NEXT: %8.sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %8.sub0_sub1_sub2_sub3_sub4
- ; CHECK-NEXT: S_NOP 0, implicit %8.sub1_sub2_sub3_sub4_sub5
- ; CHECK-NEXT: undef %9.sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
- ; CHECK-NEXT: %9.sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %9.sub0_sub1_sub2_sub3_sub4_sub5
- ; CHECK-NEXT: S_NOP 0, implicit %9.sub2_sub3_sub4_sub5_sub6_sub7
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub0_sub1
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]].sub1_sub2
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_128 = V_MOV_B32_e32 11, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_128 = V_MOV_B32_e32 12, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub0_sub1_sub2
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]].sub1_sub2_sub3
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_160 = V_MOV_B32_e32 21, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_160 = V_MOV_B32_e32 22, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub0_sub1_sub2_sub3
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]].sub1_sub2_sub3_sub4
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_3:%[0-9]+]].sub0:vreg_192 = V_MOV_B32_e32 31, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]].sub1:vreg_192 = V_MOV_B32_e32 32, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub0_sub1_sub2_sub3_sub4
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_3]].sub1_sub2_sub3_sub4_sub5
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_4:%[0-9]+]].sub0:vreg_256 = V_MOV_B32_e32 41, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_4:%[0-9]+]].sub2:vreg_256 = V_MOV_B32_e32 43, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub0_sub1_sub2_sub3_sub4_sub5
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_4]].sub2_sub3_sub4_sub5_sub6_sub7
undef %0.sub1:vreg_1024 = V_MOV_B32_e32 01, implicit $exec
%0.sub2:vreg_1024 = V_MOV_B32_e32 02, implicit $exec
S_NOP 0, implicit %0.sub1_sub2
@@ -97,12 +97,12 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_96_w64
- ; CHECK: undef %2.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: %2.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %2
- ; CHECK-NEXT: undef %3.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %3
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
undef %0.sub0:vreg_96 = V_MOV_B32_e32 00, implicit $exec
%0.sub1:vreg_96 = V_MOV_B32_e32 01, implicit $exec
S_NOP 0, implicit %0.sub0_sub1
@@ -140,15 +140,15 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_128_w64
- ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %3
- ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %4
- ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
- ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %5
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 22, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
%0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
S_NOP 0, implicit %0.sub0_sub1
@@ -168,14 +168,14 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_128_w96
- ; CHECK: undef %2.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: %2.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: %2.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %2
- ; CHECK-NEXT: undef %3.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %3
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
undef %0.sub0:vreg_128 = V_MOV_B32_e32 00, implicit $exec
%0.sub1:vreg_128 = V_MOV_B32_e32 01, implicit $exec
%0.sub2:vreg_128 = V_MOV_B32_e32 02, implicit $exec
@@ -215,15 +215,15 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_160_w64
- ; CHECK: undef %3.sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: %3.sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %3
- ; CHECK-NEXT: undef %4.sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %4.sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %4
- ; CHECK-NEXT: undef %5.sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
- ; CHECK-NEXT: %5.sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %5
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_1:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 11, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 12, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_1]]
+ ; CHECK-NEXT: undef [[V_MOV_B32_e32_2:%[0-9]+]].sub0:vreg_64 = V_MOV_B32_e32 23, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]].sub1:vreg_64 = V_MOV_B32_e32 24, implicit $exec
+ ; CHECK-NEXT: S_NOP 0, implicit [[V_MOV_B32_e32_2]]
undef %0.sub0:vreg_160 = V_MOV_B32_e32 00, implicit $exec
%0.sub1:vreg_160 = V_MOV_B32_e32 01, implicit $exec
S_NOP 0, implicit %0.sub0_sub1
@@ -243,18 +243,18 @@ tracksRegLiveness: true
body: |
bb.0:
; CHECK-LABEL: name: test_vreg_160_w96
- ; CHECK: undef %3.sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
- ; CHECK-NEXT: %3.sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
- ; CHECK-NEXT: %3.sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %3
- ; CHECK-NEXT: undef %4.sub0:vreg_96 = V_MOV_B32_e32 11, implicit $exec
- ; CHECK-NEXT: %4.sub1:vreg_96 = V_MOV_B32_e32 12, implicit $exec
- ; CHECK-NEXT: %4.sub2:vreg_96 = V_MOV_B32_e32 13, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %4
- ; CHECK-NEXT: undef %5.sub0:vreg_96 = V_MOV_B32_e32 22, implicit $exec
- ; CHECK-NEXT: %5.sub1:vreg_96 = V_MOV_B32_e32 23, implicit $exec
- ; CHECK-NEXT: %5.sub2:vreg_96 = V_MOV_B32_e32 24, implicit $exec
- ; CHECK-NEXT: S_NOP 0, implicit %5
+ ; CHECK: undef [[V_MOV_B32_e32_:%[0-9]+]].sub0:vreg_96 = V_MOV_B32_e32 0, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub1:vreg_96 = V_MOV_B32_e32 1, implicit $exec
+ ; CHECK-NEXT: [[V_MOV_B32_e32_:%[0-9]+]].sub2:vreg_96 = V_MOV_B32_e32 2, implicit $exec
+ ; CHECK-NEX...
[truncated]
|
@arsenm let's submit this and I create another PR to cleanup unnecessary register class deduction, since TRI->getSubRegisterClass works correctly now. |
Ping. |
1 similar comment
Ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code change looks fine.
I left a few questions inline about existing code -- you don't necessarily need to address these.
Could you regenerate the tests as a separate NFC commit (or PR), so this PR only shows test changes from the code change?
@@ -1,32 +1,31 @@ | |||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | |||
# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs -stop-after=rewrite-partial-reg-uses %s -o - | FileCheck -check-prefix=CHECK %s | |||
# RUN: llc -mtriple=amdgcn-amd-amdhsa -amdgpu-enable-rewrite-partial-reg-uses=true -verify-machineinstrs -start-before=rename-independent-subregs %s -o /dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the MIR in the test isn't fully valid to be emitted in asm so I deleted this line.
Thank you for the review, Carl! I can regenerate the tests but I'm not sure how to do it best here: submit the updated tests, rebase this PR's branch on top of the commit and then force push the branch? Or I can simply merge main with the updated tests commit into this PR's branch? |
I do not know what is expected/common practice, I suspect either approach is valid. |
e62a4d1
to
22a5879
Compare
Thanks, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ed instead of scalar. (llvm#69957) For the following testcase: undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32 %3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ... GCNRewritePartialRegUses produced: %4:vgpr_32 = COPY undef %1:sgpr_32 dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ... Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This patch fixes that. Note that GCNRewritePartialRegUses pass isn't enabled by default yet.
…ed instead of scalar. (llvm#69957) For the following testcase: undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32 %3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, ... GCNRewritePartialRegUses produced: %4:vgpr_32 = COPY undef %1:sgpr_32 dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, ... Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This patch fixes that. Note that GCNRewritePartialRegUses pass isn't enabled by default yet.
Let's try once again after #69957 has landed.
For the following testcase:
undef %1.sub1:sgpr_96 = COPY undef %0:sgpr_32
%3:vgpr_32 = V_LSHL_ADD_U32_e64 %1.sub1:sgpr_96, 2, undef %2:vgpr_32, implicit $exec
GCNRewritePartialRegUses produced:
%4:vgpr_32 = COPY undef %1:sgpr_32
dead %2:vgpr_32 = V_LSHL_ADD_U32_e64 %4, 2, undef %3:vgpr_32, implicit $exec
Register class for %4 is incorrect: there should be sgpr_32 instead of vgpr_32 because the original %1 had scalar regclass. This happens because %4 participate only in two instructions from which:
Thus GCNRewritePartialRegUses used only VS_32 class and selected vector subclass as the largest - what it missed here is that it should take into account the regclass for %1.sub1:sgpr_96. This patch fixes that, debug output after the fix:
And here starts another story. All of this deduction of final regclass using regclasses from instruction operands description is really not needed because TRI->getSubRegisterClass for %1.sub1:sgpr_96 is already returning the right class - SGPR_32 as we see in the debug output above but that haven't been always like that. I did a bisect search and found that TRI->getSubRegisterClass behavior has been fixed by the #67245, thanks to @kosarev. Before his fix getSubRegisterClass returned strange classes like:
SGPR_128:sub0_sub1 -> SReg_1_with_sub0_and_SReg_1_with_lo16_in_SGPR_LO16
which isn't even allocatable.
I'm going to cleanup the implementation but I would like to do this in a separate commit. I've tested my patch before the #67245 and it also works.
Note that GCNRewritePartialRegUses pass isn't enabled by default yet.