-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Clear kills for aliasing registers after forming V_CMPX #68675
Conversation
This fixes machine verification problems when V_CMPX formation effectively moves a "V_CMP vgprN" instruction past other instructions that may kill a subregister or superregister of vgprN. Fixes #68221
@llvm/pr-subscribers-backend-amdgpu Author: Jay Foad (jayfoad) ChangesThis fixes machine verification problems when V_CMPX formation Fixes #68221 Full diff: https://github.com/llvm/llvm-project/pull/68675.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
index 04c9a6457944c5f..4730cec8d9abceb 100644
--- a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
+++ b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
@@ -594,10 +594,14 @@ bool SIOptimizeExecMasking::optimizeVCMPSaveExecSequence(
TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::clamp);
// The kill flags may no longer be correct.
- if (Src0->isReg())
- MRI->clearKillFlags(Src0->getReg());
- if (Src1->isReg())
- MRI->clearKillFlags(Src1->getReg());
+ if (Src0->isReg()) {
+ for (MCRegAliasIterator I(Src0->getReg(), TRI, true); I.isValid(); ++I)
+ MRI->clearKillFlags(*I);
+ }
+ if (Src1->isReg()) {
+ for (MCRegAliasIterator I(Src1->getReg(), TRI, true); I.isValid(); ++I)
+ MRI->clearKillFlags(*I);
+ }
SaveExecInstr.eraseFromParent();
VCmp.eraseFromParent();
diff --git a/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir
new file mode 100644
index 000000000000000..0b9f824775cc3c9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir
@@ -0,0 +1,41 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass si-optimize-exec-masking -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: main
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: main
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000)
+ ; CHECK-NEXT: liveins: $vgpr0_vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: BUFFER_STORE_DWORDX2_OFFSET_exact $vgpr0_vgpr1, killed $sgpr0_sgpr1_sgpr2_sgpr3, 0, 344, 1, 0, implicit $exec
+ ; CHECK-NEXT: $sgpr0 = S_MOV_B32 $exec_lo
+ ; CHECK-NEXT: V_CMPX_EQ_U32_nosdst_e64 0, $vgpr0, implicit-def $exec, implicit $exec
+ ; CHECK-NEXT: S_CBRANCH_EXECZ %bb.2, implicit $exec
+ ; CHECK-NEXT: S_BRANCH %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.2(0x80000000)
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.2:
+ ; CHECK-NEXT: S_ENDPGM 0
+ bb.0:
+ successors: %bb.1, %bb.2
+ liveins: $vgpr0_vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
+
+ $vcc_lo = V_CMP_EQ_U32_e64 0, $vgpr0, implicit $exec
+ BUFFER_STORE_DWORDX2_OFFSET_exact killed $vgpr0_vgpr1, killed $sgpr0_sgpr1_sgpr2_sgpr3, 0, 344, 1, 0, implicit $exec
+ $sgpr0 = COPY $exec_lo, implicit-def $exec_lo
+ $sgpr0 = S_AND_B32 killed $sgpr0, killed $vcc_lo, implicit-def dead $scc
+ $exec_lo = S_MOV_B32_term killed $sgpr0
+ S_CBRANCH_EXECZ %bb.2, implicit $exec
+ S_BRANCH %bb.1
+
+ bb.1:
+
+ bb.2:
+ S_ENDPGM 0
+...
|
This is an alternative to #68293. |
if (Src1->isReg()) | ||
MRI->clearKillFlags(Src1->getReg()); | ||
if (Src0->isReg()) { | ||
for (MCRegAliasIterator I(Src0->getReg(), TRI, true); I.isValid(); ++I) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like introducing new uses of MCRegAliasIterator
because it is potentially expensive, but it does seem like the simplest way to implement this fix.
An alternative might be to scan all operands of all instructions between the old v_cmp and the new v_cmpx, and clear kill flags if they overlap with Src0 or Src1.
Sounds reasonable to me, and I assume this fix will land, but I'll let others discuss their opinion :-) |
Closed in favour of #68293. |
This fixes machine verification problems when V_CMPX formation
effectively moves a "V_CMP vgprN" instruction past other instructions
that may kill a subregister or superregister of vgprN.
Fixes #68221