Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Prevent hang in SIFoldOperands #82099

Merged
merged 1 commit into from
Feb 27, 2024
Merged

Conversation

choikwa
Copy link
Contributor

@choikwa choikwa commented Feb 17, 2024

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

@jrbyrnes @bcahoon

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 17, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: choikwa (choikwa)

Changes

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

@jrbyrnes @bcahoon


Full diff: https://github.com/llvm/llvm-project/pull/82099.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+8-3)
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 8bf05682cbe7ea..808412809c9a77 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -15,6 +15,7 @@
 #include "llvm/ADT/DepthFirstIterator.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineOperand.h"
+#include <unordered_set>
 
 #define DEBUG_TYPE "si-fold-operands"
 using namespace llvm;
@@ -74,7 +75,7 @@ class SIFoldOperands : public MachineFunctionPass {
   const SIRegisterInfo *TRI;
   const GCNSubtarget *ST;
   const SIMachineFunctionInfo *MFI;
-
+  
   bool frameIndexMayFold(const MachineInstr &UseMI, int OpNo,
                          const MachineOperand &OpToFold) const;
 
@@ -772,7 +773,7 @@ void SIFoldOperands::foldOperand(
   if (UseMI->isRegSequence()) {
     Register RegSeqDstReg = UseMI->getOperand(0).getReg();
     unsigned RegSeqDstSubReg = UseMI->getOperand(UseOpIdx + 1).getImm();
-
+    static std::unordered_set<MachineInstr*> seenMI;
     for (auto &RSUse : make_early_inc_range(MRI->use_nodbg_operands(RegSeqDstReg))) {
       MachineInstr *RSUseMI = RSUse.getParent();
 
@@ -782,7 +783,11 @@ void SIFoldOperands::foldOperand(
 
       if (RSUse.getSubReg() != RegSeqDstSubReg)
         continue;
-
+      
+      if (seenMI.count(RSUseMI) != 0)
+        continue;
+      seenMI.insert(RSUseMI);
+      
       foldOperand(OpToFold, RSUseMI, RSUseMI->getOperandNo(&RSUse), FoldList,
                   CopiesToReplace);
     }

Copy link

github-actions bot commented Feb 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, needs a testcase as well. Please add a .mir testcase, there's already a few for si-fold-operands I think

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp Outdated Show resolved Hide resolved
@jayfoad
Copy link
Contributor

jayfoad commented Feb 19, 2024

In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.

What does "share a common use operand" mean? Are you saying instruction A uses the result of B, and B uses the result of A? Is there a PHI involved?

@choikwa
Copy link
Contributor Author

choikwa commented Feb 19, 2024

An example would be

UseMI: %49:vgpr_32 = V_MUL_HI_U32_e64 %5:vgpr_32, %35.sub0:sreg_64, implicit $exec

and

RSUseMI: %77:vgpr_32 = V_MUL_HI_U32_e64 %75:vgpr_32, %35.sub0:sreg_64, implicit $exec

@jayfoad
Copy link
Contributor

jayfoad commented Feb 19, 2024

An example would be

UseMI: %49:vgpr_32 = V_MUL_HI_U32_e64 %5:vgpr_32, %35.sub0:sreg_64, implicit $exec

and

RSUseMI: %77:vgpr_32 = V_MUL_HI_U32_e64 %75:vgpr_32, %35.sub0:sreg_64, implicit $exec

I don't understand how this would cause infinite recursion.

We only go into the RSUseMI code if UseMI is a REG_SEQUENCE instruction.

@choikwa
Copy link
Contributor Author

choikwa commented Feb 19, 2024

You are right, I think I was incorrectly explaining the behaviour:
The first entry point was from this:
%35:sreg_64 = REG_SEQUENCE killed %34:sreg_32, %subreg.sub0, %33:sreg_32, %subreg.sub1

And the subsequent UseMI is just RSUseMI and does not go into the isRegSequence() path. However, the cycle still exists with RSUseMI flipflopping between %77 and %49.

@jayfoad
Copy link
Contributor

jayfoad commented Feb 19, 2024

Please try to get an actual test case.

@choikwa
Copy link
Contributor Author

choikwa commented Feb 20, 2024

Addressed feedback w/ latest commit.

@choikwa choikwa changed the title [AMDGPU] Prevent cyclic behaviour in SIFoldOperands [AMDGPU] Prevent hang in SIFoldOperands Feb 20, 2024
@jayfoad
Copy link
Contributor

jayfoad commented Feb 20, 2024

Thanks for the test case! Now can you please try to explain what goes wrong in more detail? I took a very quick look and it seems like this loop never terminates:

    for (auto &RSUse : make_early_inc_range(MRI->use_nodbg_operands(RegSeqDstReg)))

But why? If we understand the problem better, there may be a simpler fix.

@choikwa
Copy link
Contributor Author

choikwa commented Feb 20, 2024

So I did more digging and the search yielded some interesting findings. It turns out that the early_inc iterator (I hand-converted) was incrementing and returning previous use operand iterator after going through foldOperand call. And the reason I think has to do with tryAddToFoldList at the end of foldOperand calling commuteInstruction. There may be some weird interaction going on with commutating the operands for V_MUL instructions and iterating use operands of REG_SEQUENCE that affects the iterator increment to cause infinite loop. It doesn't look like this interaction is intentional -- disabling commutation prevents the infinite loop. Perhaps option to disable commutation in tryAddToFoldList is the answer?

@Pierre-vh
Copy link
Contributor

I think it goes like this:

  • defusechain_iterator works on a given MachineOperand *
  • We commute operands within an instruction so that pointer stays valid, but now points to a completely different operand
  • defusechain_iterator keeps going on that operand (basically iterates a different range)
  • We keep commuting further and going back and forth between those two ranges

That makes sense I think, and the better fix would be to stop commuting inside that loop, or commute in bulk after the loop.

@jayfoad
Copy link
Contributor

jayfoad commented Feb 21, 2024

Yes that explanation makes sense, but this problem has already been solved in a different way in the main loop that calls foldOperand on each use of a register, in foldInstOperand:

  SmallVector<MachineOperand *, 4> UsesToProcess;
  for (auto &Use : MRI->use_nodbg_operands(Dst.getReg()))
    UsesToProcess.push_back(&Use);
  for (auto *U : UsesToProcess) {
    MachineInstr *UseMI = U->getParent();
    foldOperand(OpToFold, UseMI, UseMI->getOperandNo(U), FoldList,
                CopiesToReplace);
  }

So perhaps we should just copy that solution here? (I.e. copy the uses into a temporary vector, to avoid any problem with the list being mutated while we are iterating over it.)

@choikwa
Copy link
Contributor Author

choikwa commented Feb 21, 2024

Updated by caching the uses

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.


// Grab the use operands first
SmallVector<MachineOperand *, 4> UsesToProcess;
for (auto &Use : MRI->use_nodbg_operands(RegSeqDstReg)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't need braces around a single physical line.

@choikwa choikwa force-pushed the sifoldcycle branch 2 times, most recently from 5164ef4 to 3f8006d Compare February 21, 2024 23:49
@choikwa
Copy link
Contributor Author

choikwa commented Feb 21, 2024

some NFC's

@@ -219,10 +219,8 @@ bool SIFoldOperands::canUseImmWithOpSel(FoldCandidate &Fold) const {
default:
return false;
case AMDGPU::OPERAND_REG_IMM_V2FP16:
case AMDGPU::OPERAND_REG_IMM_V2BF16:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase issue? This was added in #82435.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching! not sure if I ever touched that but it seems concerning if rebase did that.

body: |
bb.0:
liveins: $vgpr0_vgpr1, $vgpr2
%33:sreg_32 = S_MOV_B32 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to compact the register numbers with -run-pass=none

foldOperands() for REG_SEQUENCE has recursion that can trigger infinite loop
as the method can modify use operand order which messes up the range-based
for loop. Cache the uses for processing beforehand so that iterators don't
get messed up.
Added repro mir testcase.
@bcahoon bcahoon merged commit 04db60d into llvm:main Feb 27, 2024
3 of 4 checks passed
rocm-ci pushed a commit to ROCm/llvm-project that referenced this pull request Apr 16, 2024
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop
as the method can modify the operand order, which messes up the range-based
for loop. This patch fixes the issue by caching the uses for processing beforehand,
and then iterating over the cache rather using the instruction iterator.

Change-Id: Iac081f4e363984cfd9917672e7d93107c51c97ac
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Apr 25, 2024
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop
as the method can modify the operand order, which messes up the range-based
for loop. This patch fixes the issue by caching the uses for processing beforehand,
and then iterating over the cache rather using the instruction iterator.

Change-Id: Iac081f4e363984cfd9917672e7d93107c51c97ac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants