[AMDGPU] Prevent hang in SIFoldOperands #82099

choikwa · 2024-02-17T07:40:54Z

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

@jrbyrnes @bcahoon

llvmbot · 2024-02-17T07:41:23Z

@llvm/pr-subscribers-backend-amdgpu

Author: choikwa (choikwa)

Changes

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

@jrbyrnes @bcahoon

Full diff: https://github.com/llvm/llvm-project/pull/82099.diff

1 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+8-3)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 8bf05682cbe7ea..808412809c9a77 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/DepthFirstIterator.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineOperand.h"
+#include <unordered_set>
 
 #define DEBUG_TYPE "si-fold-operands"
 using namespace llvm;
@@ -74,7 +75,7 @@ class SIFoldOperands : public MachineFunctionPass {
   const SIRegisterInfo *TRI;
   const GCNSubtarget *ST;
   const SIMachineFunctionInfo *MFI;
-
+  
   bool frameIndexMayFold(const MachineInstr &UseMI, int OpNo,
                          const MachineOperand &OpToFold) const;
 
@@ -772,7 +773,7 @@ void SIFoldOperands::foldOperand(
   if (UseMI->isRegSequence()) {
     Register RegSeqDstReg = UseMI->getOperand(0).getReg();
     unsigned RegSeqDstSubReg = UseMI->getOperand(UseOpIdx + 1).getImm();
-
+    static std::unordered_set<MachineInstr*> seenMI;
     for (auto &RSUse : make_early_inc_range(MRI->use_nodbg_operands(RegSeqDstReg))) {
       MachineInstr *RSUseMI = RSUse.getParent();
 
@@ -782,7 +783,11 @@ void SIFoldOperands::foldOperand(
 
       if (RSUse.getSubReg() != RegSeqDstSubReg)
         continue;
-
+      
+      if (seenMI.count(RSUseMI) != 0)
+        continue;
+      seenMI.insert(RSUseMI);
+      
       foldOperand(OpToFold, RSUseMI, RSUseMI->getOperandNo(&RSUse), FoldList,
                   CopiesToReplace);
     }

github-actions · 2024-02-17T07:43:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Pierre-vh

Thanks, needs a testcase as well. Please add a .mir testcase, there's already a few for si-fold-operands I think

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

jayfoad · 2024-02-19T11:51:22Z

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

What does "share a common use operand" mean? Are you saying instruction A uses the result of B, and B uses the result of A? Is there a PHI involved?

choikwa · 2024-02-19T15:43:31Z

An example would be

UseMI: %49:vgpr_32 = V_MUL_HI_U32_e64 %5:vgpr_32, %35.sub0:sreg_64, implicit $exec

and

RSUseMI: %77:vgpr_32 = V_MUL_HI_U32_e64 %75:vgpr_32, %35.sub0:sreg_64, implicit $exec

jayfoad · 2024-02-19T16:02:01Z

An example would be

UseMI: %49:vgpr_32 = V_MUL_HI_U32_e64 %5:vgpr_32, %35.sub0:sreg_64, implicit $exec

and

RSUseMI: %77:vgpr_32 = V_MUL_HI_U32_e64 %75:vgpr_32, %35.sub0:sreg_64, implicit $exec

I don't understand how this would cause infinite recursion.

We only go into the RSUseMI code if UseMI is a REG_SEQUENCE instruction.

choikwa · 2024-02-19T16:45:34Z

You are right, I think I was incorrectly explaining the behaviour:
The first entry point was from this:
%35:sreg_64 = REG_SEQUENCE killed %34:sreg_32, %subreg.sub0, %33:sreg_32, %subreg.sub1

And the subsequent UseMI is just RSUseMI and does not go into the isRegSequence() path. However, the cycle still exists with RSUseMI flipflopping between %77 and %49.

jayfoad · 2024-02-19T16:55:49Z

Please try to get an actual test case.

choikwa · 2024-02-20T06:57:10Z

Addressed feedback w/ latest commit.

jayfoad · 2024-02-20T10:09:02Z

Thanks for the test case! Now can you please try to explain what goes wrong in more detail? I took a very quick look and it seems like this loop never terminates:

    for (auto &RSUse : make_early_inc_range(MRI->use_nodbg_operands(RegSeqDstReg)))

But why? If we understand the problem better, there may be a simpler fix.

choikwa · 2024-02-20T15:23:21Z

So I did more digging and the search yielded some interesting findings. It turns out that the early_inc iterator (I hand-converted) was incrementing and returning previous use operand iterator after going through foldOperand call. And the reason I think has to do with tryAddToFoldList at the end of foldOperand calling commuteInstruction. There may be some weird interaction going on with commutating the operands for V_MUL instructions and iterating use operands of REG_SEQUENCE that affects the iterator increment to cause infinite loop. It doesn't look like this interaction is intentional -- disabling commutation prevents the infinite loop. Perhaps option to disable commutation in tryAddToFoldList is the answer?

Pierre-vh · 2024-02-21T07:28:20Z

I think it goes like this:

defusechain_iterator works on a given MachineOperand *
We commute operands within an instruction so that pointer stays valid, but now points to a completely different operand
defusechain_iterator keeps going on that operand (basically iterates a different range)
We keep commuting further and going back and forth between those two ranges

That makes sense I think, and the better fix would be to stop commuting inside that loop, or commute in bulk after the loop.

jayfoad · 2024-02-21T10:41:33Z

Yes that explanation makes sense, but this problem has already been solved in a different way in the main loop that calls foldOperand on each use of a register, in foldInstOperand:

  SmallVector<MachineOperand *, 4> UsesToProcess;
  for (auto &Use : MRI->use_nodbg_operands(Dst.getReg()))
    UsesToProcess.push_back(&Use);
  for (auto *U : UsesToProcess) {
    MachineInstr *UseMI = U->getParent();
    foldOperand(OpToFold, UseMI, UseMI->getOperandNo(U), FoldList,
                CopiesToReplace);
  }

So perhaps we should just copy that solution here? (I.e. copy the uses into a temporary vector, to avoid any problem with the list being mutated while we are iterating over it.)

choikwa · 2024-02-21T19:37:16Z

Updated by caching the uses

jayfoad

LGTM, thanks.

jayfoad · 2024-02-21T20:06:09Z

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

+
+    // Grab the use operands first
+    SmallVector<MachineOperand *, 4> UsesToProcess;
+    for (auto &Use : MRI->use_nodbg_operands(RegSeqDstReg)) {


Nit: don't need braces around a single physical line.

choikwa · 2024-02-21T23:50:22Z

some NFC's

piotrAMD · 2024-02-22T08:35:51Z

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

@@ -219,10 +219,8 @@ bool SIFoldOperands::canUseImmWithOpSel(FoldCandidate &Fold) const {
  default:
    return false;
  case AMDGPU::OPERAND_REG_IMM_V2FP16:
-  case AMDGPU::OPERAND_REG_IMM_V2BF16:


Rebase issue? This was added in #82435.

Thanks for catching! not sure if I ever touched that but it seems concerning if rebase did that.

arsenm · 2024-02-26T15:50:27Z

llvm/test/CodeGen/AMDGPU/si-fold-reg-sequence.mir

+body:             |
+  bb.0:
+    liveins: $vgpr0_vgpr1, $vgpr2
+    %33:sreg_32 = S_MOV_B32 0


would be good to compact the register numbers with -run-pass=none

foldOperands() for REG_SEQUENCE has recursion that can trigger infinite loop as the method can modify use operand order which messes up the range-based for loop. Cache the uses for processing beforehand so that iterators don't get messed up. Added repro mir testcase.

foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop as the method can modify the operand order, which messes up the range-based for loop. This patch fixes the issue by caching the uses for processing beforehand, and then iterating over the cache rather using the instruction iterator. Change-Id: Iac081f4e363984cfd9917672e7d93107c51c97ac

llvmbot added the backend:AMDGPU label Feb 17, 2024

choikwa force-pushed the sifoldcycle branch from 6cc8018 to dff426c Compare February 17, 2024 08:38

choikwa commented Feb 18, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved

Pierre-vh requested changes Feb 19, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved

choikwa force-pushed the sifoldcycle branch from dff426c to 20ded73 Compare February 20, 2024 06:13

choikwa changed the title ~~[AMDGPU] Prevent cyclic behaviour in SIFoldOperands~~ [AMDGPU] Prevent hang in SIFoldOperands Feb 20, 2024

choikwa force-pushed the sifoldcycle branch from 20ded73 to 36c55f4 Compare February 21, 2024 19:36

jayfoad approved these changes Feb 21, 2024

View reviewed changes

choikwa force-pushed the sifoldcycle branch 2 times, most recently from 5164ef4 to 3f8006d Compare February 21, 2024 23:49

jayfoad approved these changes Feb 22, 2024

View reviewed changes

piotrAMD reviewed Feb 22, 2024

View reviewed changes

choikwa force-pushed the sifoldcycle branch from 3f8006d to 13eb08b Compare February 22, 2024 16:10

arsenm reviewed Feb 26, 2024

View reviewed changes

choikwa force-pushed the sifoldcycle branch from 13eb08b to a8bba69 Compare February 26, 2024 16:09

choikwa requested a review from Pierre-vh February 27, 2024 08:34

Pierre-vh approved these changes Feb 27, 2024

View reviewed changes

bcahoon merged commit 04db60d into llvm:main Feb 27, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU] Prevent hang in SIFoldOperands #82099

[AMDGPU] Prevent hang in SIFoldOperands #82099

choikwa commented Feb 17, 2024

llvmbot commented Feb 17, 2024

github-actions bot commented Feb 17, 2024 •

edited

Pierre-vh left a comment

jayfoad commented Feb 19, 2024

choikwa commented Feb 19, 2024

jayfoad commented Feb 19, 2024 •

edited

choikwa commented Feb 19, 2024

jayfoad commented Feb 19, 2024

choikwa commented Feb 20, 2024

jayfoad commented Feb 20, 2024

choikwa commented Feb 20, 2024 •

edited

Pierre-vh commented Feb 21, 2024

jayfoad commented Feb 21, 2024

choikwa commented Feb 21, 2024

jayfoad left a comment

jayfoad Feb 21, 2024

choikwa commented Feb 21, 2024

piotrAMD Feb 22, 2024

choikwa Feb 22, 2024

arsenm Feb 26, 2024

[AMDGPU] Prevent hang in SIFoldOperands #82099

[AMDGPU] Prevent hang in SIFoldOperands #82099

Conversation

choikwa commented Feb 17, 2024

llvmbot commented Feb 17, 2024

github-actions bot commented Feb 17, 2024 • edited

Pierre-vh left a comment

Choose a reason for hiding this comment

jayfoad commented Feb 19, 2024

choikwa commented Feb 19, 2024

jayfoad commented Feb 19, 2024 • edited

choikwa commented Feb 19, 2024

jayfoad commented Feb 19, 2024

choikwa commented Feb 20, 2024

jayfoad commented Feb 20, 2024

choikwa commented Feb 20, 2024 • edited

Pierre-vh commented Feb 21, 2024

jayfoad commented Feb 21, 2024

choikwa commented Feb 21, 2024

jayfoad left a comment

Choose a reason for hiding this comment

jayfoad Feb 21, 2024

Choose a reason for hiding this comment

choikwa commented Feb 21, 2024

piotrAMD Feb 22, 2024

Choose a reason for hiding this comment

choikwa Feb 22, 2024

Choose a reason for hiding this comment

arsenm Feb 26, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 17, 2024 •

edited

jayfoad commented Feb 19, 2024 •

edited

choikwa commented Feb 20, 2024 •

edited