Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TwoAddressInstruction] Propagate undef flags for partial defs #79286

Merged
merged 5 commits into from
Feb 7, 2024

Conversation

perlfu
Copy link
Contributor

@perlfu perlfu commented Jan 24, 2024

If part of a register (lowered from REG_SEQUENCE) is undefined then we should propagate undef flags to uses of those lanes. This is only performed when live intervals are present as it requires live intervals to correctly match uses to defs, and the primary goal is to allow precise computation of subrange intervals.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 24, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Carl Ritson (perlfu)

Changes

If part of a register (lowered from REG_SEQUENCE) is undefined then we should propagate undef flags to uses of those lanes. This is only performed when live intervals are present as it requires live intervals to correctly match uses to defs, and the primary goal is to allow precise computation of subrange intervals.


Full diff: https://github.com/llvm/llvm-project/pull/79286.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/TwoAddressInstructionPass.cpp (+24-6)
  • (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll (+1)
diff --git a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
index 74d7904aee33a2d..0b8bef24d1a7bb6 100644
--- a/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
+++ b/llvm/lib/CodeGen/TwoAddressInstructionPass.cpp
@@ -1929,21 +1929,26 @@ eliminateRegSequence(MachineBasicBlock::iterator &MBBI) {
   Register DstReg = MI.getOperand(0).getReg();
 
   SmallVector<Register, 4> OrigRegs;
+  VNInfo *DefVN = nullptr;
   if (LIS) {
     OrigRegs.push_back(MI.getOperand(0).getReg());
     for (unsigned i = 1, e = MI.getNumOperands(); i < e; i += 2)
       OrigRegs.push_back(MI.getOperand(i).getReg());
+    if (LIS->hasInterval(DstReg)) {
+      DefVN = LIS->getInterval(DstReg).Query(
+        LIS->getInstructionIndex(MI)).valueOut();
+    }
   }
 
+  LaneBitmask UndefLanes = LaneBitmask::getNone();
   bool DefEmitted = false;
-  bool DefIsPartial = false;
   for (unsigned i = 1, e = MI.getNumOperands(); i < e; i += 2) {
     MachineOperand &UseMO = MI.getOperand(i);
     Register SrcReg = UseMO.getReg();
     unsigned SubIdx = MI.getOperand(i+1).getImm();
     // Nothing needs to be inserted for undef operands.
     if (UseMO.isUndef()) {
-      DefIsPartial = true;
+      UndefLanes |= TRI->getSubRegIndexLaneMask(SubIdx);
       continue;
     }
 
@@ -1991,11 +1996,24 @@ eliminateRegSequence(MachineBasicBlock::iterator &MBBI) {
       MI.removeOperand(j);
   } else {
     if (LIS) {
-      // Force interval recomputation if we moved from full definition
-      // of register to partial.
-      if (DefIsPartial && LIS->hasInterval(DstReg) &&
-          MRI->shouldTrackSubRegLiveness(DstReg))
+      // Force live interval recomputation if we moved to a partial defintion
+      // of the register.  Undef flags must be propagate to uses of undefined
+      // subregister for accurate interval computation.
+      if (UndefLanes.any() && DefVN && MRI->shouldTrackSubRegLiveness(DstReg)) {
+        auto &LI = LIS->getInterval(DstReg);
+        for (MachineOperand &UseOp : MRI->use_operands(DstReg)) {
+          unsigned SubReg = UseOp.getSubReg();
+          if (UseOp.isUndef() || !SubReg)
+            continue;
+          auto *VN = LI.getVNInfoAt(LIS->getInstructionIndex(*UseOp.getParent()));
+          if (DefVN != VN)
+            continue;
+          LaneBitmask LaneMask = TRI->getSubRegIndexLaneMask(SubReg);
+          if ((UndefLanes & LaneMask).any())
+            UseOp.setIsUndef(true);
+        }
         LIS->removeInterval(DstReg);
+      }
       LIS->RemoveMachineInstrFromMaps(MI);
     }
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
index ac153183be642a2..3f11c122a681463 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement.ll
@@ -1,5 +1,6 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GPRIDX %s
+; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -early-live-intervals -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,GPRIDX %s
 ; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -verify-machineinstrs < %s | FileCheck -check-prefixes=GCN,MOVREL %s
 ; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1010 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10PLUS,GFX10 %s
 ; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx1100 -amdgpu-enable-delay-alu=0 -verify-machineinstrs < %s | FileCheck -check-prefixes=GFX10PLUS,GFX11 %s

Copy link

github-actions bot commented Jan 24, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@perlfu perlfu force-pushed the twoaddr-lis-propagate-undef branch from 19bba42 to 27f8e24 Compare January 24, 2024 13:24
Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is required for correctness, but it is not clear to me what the failure is without this. Do you have a small MIR example that needs this?

@perlfu
Copy link
Contributor Author

perlfu commented Jan 25, 2024

I've switch this to a concrete MIR example.

The root cause could be seen as an issue/bug with GlobalISel code gen, although I think this change is still beneficial.
In this case there is an extractelement on a v7 type, which must be widened to v8 for code gen:
%ext = extractelement <7 x double> %vec, i32 %sel

The result is that GlobalISel generates a REG_SEQUENCE with an undef (see last element):

%34:vreg_512 = REG_SEQUENCE %16, %subreg.sub0_sub1, %17, %subreg.sub2_sub3, %18, %subreg.sub4_sub5, %19, %subreg.sub6_sub7, %20, %subreg.sub8_sub9, %21, %subreg.sub10_sub11, %22, %subreg.sub12_sub13, undef %35:vreg_64, %subreg.sub14_sub15

Then generates code to extract that element if selected:

%80:vgpr_32 = V_CNDMASK_B32_e64 0, %76, 0, %34.sub14, %79, implicit $exec
%81:vgpr_32 = V_CNDMASK_B32_e64 0, %77, 0, %34.sub15, %79, implicit $exec

Without propagating the undef flags, the above instructions will fail the verifier if LiveIntervals are available.

If part of a register (lowered from REG_SEQUENCE) is undefined then
we should propagate undef flags to uses of those lanes.
This is only performed when live intervals are present as it
requires live intervals to correctly match uses to defs, and the
primary goal is to allow precise computation of subrange intervals.
@perlfu perlfu force-pushed the twoaddr-lis-propagate-undef branch from b9bc693 to 95b7bab Compare February 7, 2024 06:22
@perlfu
Copy link
Contributor Author

perlfu commented Feb 7, 2024

  • Rebase
  • Address reviewer comments: remove test IR, fix comment typo

@perlfu perlfu merged commit 9bda1de into llvm:main Feb 7, 2024
4 checks passed
@perlfu perlfu deleted the twoaddr-lis-propagate-undef branch February 7, 2024 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants