-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Prevent hang in SIFoldOperands #82099
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: choikwa (choikwa) ChangesIn SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set. @jrbyrnes @bcahoon Full diff: https://github.com/llvm/llvm-project/pull/82099.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 8bf05682cbe7ea..808412809c9a77 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -15,6 +15,7 @@
#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineOperand.h"
+#include <unordered_set>
#define DEBUG_TYPE "si-fold-operands"
using namespace llvm;
@@ -74,7 +75,7 @@ class SIFoldOperands : public MachineFunctionPass {
const SIRegisterInfo *TRI;
const GCNSubtarget *ST;
const SIMachineFunctionInfo *MFI;
-
+
bool frameIndexMayFold(const MachineInstr &UseMI, int OpNo,
const MachineOperand &OpToFold) const;
@@ -772,7 +773,7 @@ void SIFoldOperands::foldOperand(
if (UseMI->isRegSequence()) {
Register RegSeqDstReg = UseMI->getOperand(0).getReg();
unsigned RegSeqDstSubReg = UseMI->getOperand(UseOpIdx + 1).getImm();
-
+ static std::unordered_set<MachineInstr*> seenMI;
for (auto &RSUse : make_early_inc_range(MRI->use_nodbg_operands(RegSeqDstReg))) {
MachineInstr *RSUseMI = RSUse.getParent();
@@ -782,7 +783,11 @@ void SIFoldOperands::foldOperand(
if (RSUse.getSubReg() != RegSeqDstSubReg)
continue;
-
+
+ if (seenMI.count(RSUseMI) != 0)
+ continue;
+ seenMI.insert(RSUseMI);
+
foldOperand(OpToFold, RSUseMI, RSUseMI->getOperandNo(&RSUse), FoldList,
CopiesToReplace);
}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, needs a testcase as well. Please add a .mir testcase, there's already a few for si-fold-operands
I think
What does "share a common use operand" mean? Are you saying instruction A uses the result of B, and B uses the result of A? Is there a PHI involved? |
An example would be UseMI: %49:vgpr_32 = V_MUL_HI_U32_e64 %5:vgpr_32, %35.sub0:sreg_64, implicit $exec and RSUseMI: %77:vgpr_32 = V_MUL_HI_U32_e64 %75:vgpr_32, %35.sub0:sreg_64, implicit $exec |
I don't understand how this would cause infinite recursion. We only go into the RSUseMI code if UseMI is a REG_SEQUENCE instruction. |
You are right, I think I was incorrectly explaining the behaviour: And the subsequent UseMI is just RSUseMI and does not go into the isRegSequence() path. However, the cycle still exists with RSUseMI flipflopping between %77 and %49. |
Please try to get an actual test case. |
Addressed feedback w/ latest commit. |
Thanks for the test case! Now can you please try to explain what goes wrong in more detail? I took a very quick look and it seems like this loop never terminates:
But why? If we understand the problem better, there may be a simpler fix. |
So I did more digging and the search yielded some interesting findings. It turns out that the early_inc iterator (I hand-converted) was incrementing and returning previous use operand iterator after going through foldOperand call. And the reason I think has to do with tryAddToFoldList at the end of foldOperand calling commuteInstruction. There may be some weird interaction going on with commutating the operands for V_MUL instructions and iterating use operands of REG_SEQUENCE that affects the iterator increment to cause infinite loop. It doesn't look like this interaction is intentional -- disabling commutation prevents the infinite loop. Perhaps option to disable commutation in tryAddToFoldList is the answer? |
I think it goes like this:
That makes sense I think, and the better fix would be to stop commuting inside that loop, or commute in bulk after the loop. |
Yes that explanation makes sense, but this problem has already been solved in a different way in the main loop that calls
So perhaps we should just copy that solution here? (I.e. copy the uses into a temporary vector, to avoid any problem with the list being mutated while we are iterating over it.) |
Updated by caching the uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
|
||
// Grab the use operands first | ||
SmallVector<MachineOperand *, 4> UsesToProcess; | ||
for (auto &Use : MRI->use_nodbg_operands(RegSeqDstReg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't need braces around a single physical line.
5164ef4
to
3f8006d
Compare
some NFC's |
@@ -219,10 +219,8 @@ bool SIFoldOperands::canUseImmWithOpSel(FoldCandidate &Fold) const { | |||
default: | |||
return false; | |||
case AMDGPU::OPERAND_REG_IMM_V2FP16: | |||
case AMDGPU::OPERAND_REG_IMM_V2BF16: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase issue? This was added in #82435.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching! not sure if I ever touched that but it seems concerning if rebase did that.
body: | | ||
bb.0: | ||
liveins: $vgpr0_vgpr1, $vgpr2 | ||
%33:sreg_32 = S_MOV_B32 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to compact the register numbers with -run-pass=none
foldOperands() for REG_SEQUENCE has recursion that can trigger infinite loop as the method can modify use operand order which messes up the range-based for loop. Cache the uses for processing beforehand so that iterators don't get messed up. Added repro mir testcase.
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop as the method can modify the operand order, which messes up the range-based for loop. This patch fixes the issue by caching the uses for processing beforehand, and then iterating over the cache rather using the instruction iterator. Change-Id: Iac081f4e363984cfd9917672e7d93107c51c97ac
foldOperands() for REG_SEQUENCE has recursion that can trigger an infinite loop as the method can modify the operand order, which messes up the range-based for loop. This patch fixes the issue by caching the uses for processing beforehand, and then iterating over the cache rather using the instruction iterator. Change-Id: Iac081f4e363984cfd9917672e7d93107c51c97ac
In SIFoldOperands::foldOperand, the recursion in REG_SEQUENCE handling could result in infinite loop if UseMI and RSUseMI share a common use operand, flipflopping between two instructions until stack overflows. The fix is to prevent a cycle by using static seenMI set.
@jrbyrnes @bcahoon