Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU: fix isSafeToSink expecting exactly one predecessor #89224

Merged
merged 1 commit into from
May 10, 2024

Conversation

petar-avramovic
Copy link
Collaborator

@petar-avramovic petar-avramovic commented Apr 18, 2024

isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.

Fixes #89200

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 18, 2024

@llvm/pr-subscribers-llvm-adt
@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)

Changes

isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Machine cycle has information of cycle exit blocks (blocks that are
destinations of cycle exit branches).
After structurize-cfg, cycle should have exactly one exit block and
exit block should have exactly one predecessor.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors. We need predecessor that belongs
to the cycle.


Full diff: https://github.com/llvm/llvm-project/pull/89224.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+13-3)
  • (added) llvm/test/CodeGen/AMDGPU/GlobalISel/is-safe-to-sink-bug.ll (+93)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index f4b21b7dfac391..a37f1358425522 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -217,10 +217,20 @@ bool SIInstrInfo::isSafeToSink(MachineInstr &MI,
         SmallVector<MachineBasicBlock *, 1> ExitBlocks;
         FromCycle->getExitBlocks(ExitBlocks);
         assert(ExitBlocks.size() == 1);
-        assert(ExitBlocks[0]->getSinglePredecessor());
-
+        // After structurize-cfg, cycle exit block should have exactly one
+        // predecessor (the cycle exit). Early-tailduplication can removed that
+        // block so we have to search for predecessor that is in cycle.
+        // most of the times there is only one predecessor.
+        const MachineBasicBlock *CycleExitMBB = nullptr;
+        for (MachineBasicBlock *MBB : ExitBlocks[0]->predecessors()) {
+          if (FromCycle->contains(MBB)) {
+            assert(CycleExitMBB == nullptr);
+            CycleExitMBB = MBB;
+          }
+        }
+        assert(CycleExitMBB);
         // FromCycle has divergent exit condition.
-        if (hasDivergentBranch(ExitBlocks[0]->getSinglePredecessor())) {
+        if (hasDivergentBranch(CycleExitMBB)) {
           return false;
         }
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/is-safe-to-sink-bug.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/is-safe-to-sink-bug.ll
new file mode 100644
index 00000000000000..d3bc661f5940b6
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/is-safe-to-sink-bug.ll
@@ -0,0 +1,93 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=amdgcn -mcpu=gfx1030 -global-isel -verify-machineinstrs < %s | FileCheck %s
+
+; early-tailduplication deletes cycle exit block created by structurize-cfg
+; that had exactly one predecessor. Now, new cycle exit block has two
+; predecessors, we need to find predecessor that belongs to the cycle.
+
+define amdgpu_ps void @_amdgpu_ps_main(i1 %arg) {
+; CHECK-LABEL: _amdgpu_ps_main:
+; CHECK:       ; %bb.0: ; %bb
+; CHECK-NEXT:    s_mov_b32 s4, 0
+; CHECK-NEXT:    v_and_b32_e32 v0, 1, v0
+; CHECK-NEXT:    s_mov_b32 s5, s4
+; CHECK-NEXT:    s_mov_b32 s6, s4
+; CHECK-NEXT:    s_mov_b32 s7, s4
+; CHECK-NEXT:    s_mov_b32 s8, SCRATCH_RSRC_DWORD0
+; CHECK-NEXT:    s_buffer_load_dword s1, s[4:7], 0x0
+; CHECK-NEXT:    s_mov_b32 s9, SCRATCH_RSRC_DWORD1
+; CHECK-NEXT:    s_mov_b32 s10, -1
+; CHECK-NEXT:    s_mov_b32 s11, 0x31c16000
+; CHECK-NEXT:    s_add_u32 s8, s8, s0
+; CHECK-NEXT:    v_cmp_ne_u32_e64 s0, 0, v0
+; CHECK-NEXT:    s_addc_u32 s9, s9, 0
+; CHECK-NEXT:    s_mov_b32 s32, 0
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:    s_cmp_ge_i32 s1, 0
+; CHECK-NEXT:    s_cbranch_scc0 .LBB0_2
+; CHECK-NEXT:  .LBB0_1: ; %bb12
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, 1.0, 0, s4
+; CHECK-NEXT:    v_mov_b32_e32 v1, 0
+; CHECK-NEXT:    v_mov_b32_e32 v2, 0
+; CHECK-NEXT:    v_mov_b32_e32 v3, 0
+; CHECK-NEXT:    v_mov_b32_e32 v4, 0
+; CHECK-NEXT:    s_mov_b64 s[0:1], s[8:9]
+; CHECK-NEXT:    s_mov_b64 s[2:3], s[10:11]
+; CHECK-NEXT:    s_swappc_b64 s[30:31], 0
+; CHECK-NEXT:  .LBB0_2: ; %bb2.preheader
+; CHECK-NEXT:    s_mov_b32 s1, 0
+; CHECK-NEXT:    v_mov_b32_e32 v0, s1
+; CHECK-NEXT:    s_branch .LBB0_4
+; CHECK-NEXT:    .p2align 6
+; CHECK-NEXT:  .LBB0_3: ; %bb6
+; CHECK-NEXT:    ; in Loop: Header=BB0_4 Depth=1
+; CHECK-NEXT:    s_or_b32 exec_lo, exec_lo, s3
+; CHECK-NEXT:    s_and_b32 s2, 1, s2
+; CHECK-NEXT:    v_or_b32_e32 v1, 1, v0
+; CHECK-NEXT:    v_cmp_ne_u32_e64 s2, 0, s2
+; CHECK-NEXT:    v_cmp_gt_i32_e32 vcc_lo, 0, v0
+; CHECK-NEXT:    v_mov_b32_e32 v0, v1
+; CHECK-NEXT:    s_and_b32 s4, s2, s1
+; CHECK-NEXT:    s_andn2_b32 s1, s1, exec_lo
+; CHECK-NEXT:    s_and_b32 s2, exec_lo, s4
+; CHECK-NEXT:    s_or_b32 s1, s1, s2
+; CHECK-NEXT:    s_cbranch_vccz .LBB0_1
+; CHECK-NEXT:  .LBB0_4: ; %bb2
+; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
+; CHECK-NEXT:    s_mov_b32 s2, 0
+; CHECK-NEXT:    s_and_saveexec_b32 s3, s0
+; CHECK-NEXT:    s_cbranch_execz .LBB0_3
+; CHECK-NEXT:  ; %bb.5: ; %bb5
+; CHECK-NEXT:    ; in Loop: Header=BB0_4 Depth=1
+; CHECK-NEXT:    s_mov_b32 s2, 1
+; CHECK-NEXT:    s_branch .LBB0_3
+bb:
+  %i = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> zeroinitializer, i32 0, i32 0)
+  %i1 = icmp slt i32 %i, 0
+  br i1 %i1, label %bb2, label %bb12
+
+bb2:
+  %i3 = phi i1 [ %i9, %bb6 ], [ false, %bb ]
+  %i4 = phi i32 [ %i10, %bb6 ], [ 0, %bb ]
+  br i1 %arg, label %bb5, label %bb6
+
+bb5:
+  br label %bb6
+
+bb6:
+  %i7 = phi i32 [ 0, %bb2 ], [ 1, %bb5 ]
+  %i8 = icmp ne i32 %i7, 0
+  %i9 = select i1 %i8, i1 %i3, i1 false
+  %i10 = or i32 %i4, 1
+  %i11 = icmp slt i32 %i4, 0
+  br i1 %i11, label %bb2, label %bb12
+
+bb12:
+  %i13 = phi i1 [ false, %bb ], [ %i9, %bb6 ]
+  %i14 = select i1 %i13, float 0.000000e+00, float 1.000000e+00
+  %i15 = insertelement <4 x float> zeroinitializer, float %i14, i64 0
+  call amdgpu_gfx addrspace(4) void null(<4 x float> %i15, i32 0)
+  unreachable
+}
+
+declare i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32>, i32, i32 immarg)

@petar-avramovic petar-avramovic linked an issue Apr 18, 2024 that may be closed by this pull request
@jayfoad jayfoad requested a review from ssahasra April 18, 2024 12:31
@@ -217,10 +217,20 @@ bool SIInstrInfo::isSafeToSink(MachineInstr &MI,
SmallVector<MachineBasicBlock *, 1> ExitBlocks;
FromCycle->getExitBlocks(ExitBlocks);
assert(ExitBlocks.size() == 1);
assert(ExitBlocks[0]->getSinglePredecessor());

// After structurize-cfg, cycle exit block should have exactly one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it simpler to write this code so that it makes no assumptions about structurized CFGs? E.g.

  FromCycle->getExitBlocks(ExitBlocks);
  for (ExitBlock : ExitBlocks)
    for (Pred : ExitBlock->predecessors())
      if (FromCycle->contains(Pred) && hasDivergentBranch(Pred))
        return false;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, write it like that for simplicity and delete asserts. Can we leave some of the comments? I don't like fully hiding expected CFG as it could give wrong impression that we can deal with cycle with two divergent exits or similar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could give wrong impression that we can deal with cycle with two divergent exits or similar

This code can deal with all those cases, can't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your proposal without asserts would return something even if input was not "structurized-cfg". Do we want to return something even when CFG is something we don't handle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if this code can easily handle arbitrary CFGs then it should do so.

If other parts of the code cannot handle arbitrary CFGs, then they should detect the problems.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whether we return a block should be independent of structrize-cfg. But we should keep the comment for our own future help.

Copy link
Collaborator

@ssahasra ssahasra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please wait for @jayfoad

// After structurize-cfg, there should be exactly one cycle exit. Also,
// cycle exit block should have exactly one predecessor, the cycle exit.
// Early-tailduplication can removed that block so we have to search for
// predecessor that is in cycle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment here is not useful anymore. The logic should work in general CFG. Meanwhile, I would like to see the code cleaner. Instead of getting exit blocks and check its predecessors. Let's just check against cycle exiting blocks.

FromCycle->getExitingBlocks(ExitingBlocks);
for (auto *Exiting : ExistingBlocks)
  if (hasDivergentBranch(Exiting))
    return false;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoopInfo has getExitingBlocks but CycleInfo does not.

I would be happy with removing the comment but @ssahasra suggested to keep it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like getExitingBlocks more.
btw, is that the formal name for blocks that contain branch that exits the cycle?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I switch to using LoopInfo then? For our targets cycles at this point should all be natural loops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maintain it is fully invalid to expect anything out of the MIR CFG. If it isn't enforced by the machine verifier, it isn't required. You cannot assume natural loops

petar-avramovic added a commit to petar-avramovic/llvm-project that referenced this pull request May 9, 2024
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
for (BlockT *Block : blocks()) {
for (BlockT *Succ : successors(Block)) {
if (!contains(Succ))
TmpStorage.push_back(Block);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break from the inner loop here.


// FromCycle has divergent exit condition.
if (hasDivergentBranch(ExitBlocks[0]->getSinglePredecessor())) {
if (hasDivergentBranch(ExitingBlocks[0])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return false if any exiting block has a divergent branch

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me but please check with @ssahasra and @ruiling.

for (MachineBasicBlock *ExitingBlock : ExitingBlocks) {
if (hasDivergentBranch(ExitingBlock)) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't need braces around a single physical line.

petar-avramovic added a commit to petar-avramovic/llvm-project that referenced this pull request May 9, 2024
isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
Copy link
Contributor

@ruiling ruiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this. LGTM

@@ -214,14 +214,13 @@ bool SIInstrInfo::isSafeToSink(MachineInstr &MI,
// does not contain SuccToSinkTo and also has divergent exit condition.
while (FromCycle && !FromCycle->contains(ToCycle)) {
// After structurize-cfg, there should be exactly one cycle exit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just notice the old comment is still there, please help remove it. Thanks!

isSafeToSink needs to check if machine cycle has divergent exit branch
but first it needs the MBB that contains cycle exit branch.
Early-tailduplication can delete exit block created by structurize-cfg
so there is still exactly one cycle exit block but the new cycle exit
block can have multiple predecessors.
Simplify search for MBBs that contain cycle exit branch by introducing
helper method getExitingBlocks in GenericCycle.
@petar-avramovic petar-avramovic merged commit f5e4927 into llvm:main May 10, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in SIInstrInfo::isSafeToSink
6 participants