Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Clear kills for aliasing registers after forming V_CMPX #68675

Closed
wants to merge 1 commit into from
Closed

[AMDGPU] Clear kills for aliasing registers after forming V_CMPX #68675

wants to merge 1 commit into from

Conversation

jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Oct 10, 2023

This fixes machine verification problems when V_CMPX formation
effectively moves a "V_CMP vgprN" instruction past other instructions
that may kill a subregister or superregister of vgprN.

Fixes #68221

This fixes machine verification problems when V_CMPX formation
effectively moves a "V_CMP vgprN" instruction past other instructions
that may kill a subregister or superregister of vgprN.

Fixes #68221
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 10, 2023

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

This fixes machine verification problems when V_CMPX formation
effectively moves a "V_CMP vgprN" instruction past other instructions
that may kill a subregister or superregister of vgprN.

Fixes #68221


Full diff: https://github.com/llvm/llvm-project/pull/68675.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp (+8-4)
  • (added) llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir (+41)
diff --git a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
index 04c9a6457944c5f..4730cec8d9abceb 100644
--- a/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
+++ b/llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
@@ -594,10 +594,14 @@ bool SIOptimizeExecMasking::optimizeVCMPSaveExecSequence(
   TryAddImmediateValueFromNamedOperand(AMDGPU::OpName::clamp);
 
   // The kill flags may no longer be correct.
-  if (Src0->isReg())
-    MRI->clearKillFlags(Src0->getReg());
-  if (Src1->isReg())
-    MRI->clearKillFlags(Src1->getReg());
+  if (Src0->isReg()) {
+    for (MCRegAliasIterator I(Src0->getReg(), TRI, true); I.isValid(); ++I)
+      MRI->clearKillFlags(*I);
+  }
+  if (Src1->isReg()) {
+    for (MCRegAliasIterator I(Src1->getReg(), TRI, true); I.isValid(); ++I)
+      MRI->clearKillFlags(*I);
+  }
 
   SaveExecInstr.eraseFromParent();
   VCmp.eraseFromParent();
diff --git a/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir
new file mode 100644
index 000000000000000..0b9f824775cc3c9
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx-set-kill.mir
@@ -0,0 +1,41 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 3
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1100 -run-pass si-optimize-exec-masking -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: main
+tracksRegLiveness: true
+body:             |
+  ; CHECK-LABEL: name: main
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT:   liveins: $vgpr0_vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   BUFFER_STORE_DWORDX2_OFFSET_exact $vgpr0_vgpr1, killed $sgpr0_sgpr1_sgpr2_sgpr3, 0, 344, 1, 0, implicit $exec
+  ; CHECK-NEXT:   $sgpr0 = S_MOV_B32 $exec_lo
+  ; CHECK-NEXT:   V_CMPX_EQ_U32_nosdst_e64 0, $vgpr0, implicit-def $exec, implicit $exec
+  ; CHECK-NEXT:   S_CBRANCH_EXECZ %bb.2, implicit $exec
+  ; CHECK-NEXT:   S_BRANCH %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.2(0x80000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT:   S_ENDPGM 0
+  bb.0:
+    successors: %bb.1, %bb.2
+    liveins: $vgpr0_vgpr1, $sgpr0_sgpr1_sgpr2_sgpr3
+
+    $vcc_lo = V_CMP_EQ_U32_e64 0, $vgpr0, implicit $exec
+    BUFFER_STORE_DWORDX2_OFFSET_exact killed $vgpr0_vgpr1, killed $sgpr0_sgpr1_sgpr2_sgpr3, 0, 344, 1, 0, implicit $exec
+    $sgpr0 = COPY $exec_lo, implicit-def $exec_lo
+    $sgpr0 = S_AND_B32 killed $sgpr0, killed $vcc_lo, implicit-def dead $scc
+    $exec_lo = S_MOV_B32_term killed $sgpr0
+    S_CBRANCH_EXECZ %bb.2, implicit $exec
+    S_BRANCH %bb.1
+
+  bb.1:
+
+  bb.2:
+    S_ENDPGM 0
+...

@jayfoad
Copy link
Contributor Author

jayfoad commented Oct 10, 2023

This is an alternative to #68293.

if (Src1->isReg())
MRI->clearKillFlags(Src1->getReg());
if (Src0->isReg()) {
for (MCRegAliasIterator I(Src0->getReg(), TRI, true); I.isValid(); ++I)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like introducing new uses of MCRegAliasIterator because it is potentially expensive, but it does seem like the simplest way to implement this fix.

An alternative might be to scan all operands of all instructions between the old v_cmp and the new v_cmpx, and clear kill flags if they overlap with Src0 or Src1.

@tsymalla
Copy link
Contributor

Sounds reasonable to me, and I assume this fix will land, but I'll let others discuss their opinion :-)

@jayfoad
Copy link
Contributor Author

jayfoad commented Nov 2, 2023

Closed in favour of #68293.

@jayfoad jayfoad closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIOptimizeExecMasking causes Bad machine code: Using an undefined physical register
3 participants