-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[SimplifyCFG] Fix SimplifyCFG pass to skip folding when both blocks contain convergence intrinsics.
#166452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-transforms Author: Lucie Choi (luciechoi) ChangesFixes a bug #165642 LLVM Spec states that only a single convergence intrinsic can be included in a basic block. This PR fixes the issue in Full diff: https://github.com/llvm/llvm-project/pull/166452.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 11db0ec487328..c1b6140abb471 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -230,6 +230,15 @@ bool llvm::MergeBlockIntoPredecessor(BasicBlock *BB, DomTreeUpdater *DTU,
// Don't break self-loops.
if (PredBB == BB) return false;
+ // Don't break if both the basic block and the predecessor contain convergent
+ // intrinsics.
+ for (Instruction &I : *BB)
+ if (isa<ConvergenceControlInst>(I)) {
+ for (Instruction &I : *PredBB)
+ if (isa<ConvergenceControlInst>(I))
+ return false;
+ }
+
// Don't break unwinding instructions or terminators with other side-effects.
Instruction *PTI = PredBB->getTerminator();
if (PTI->isSpecialTerminator() || PTI->mayHaveSideEffects())
diff --git a/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
new file mode 100644
index 0000000000000..d5ae64f6897e3
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -S -passes=simplifycfg | FileCheck %s
+
+declare token @llvm.experimental.convergence.entry() #0
+
+define void @nested(i32 %tidx, i32 %tidy, ptr %array) #0 {
+; CHECK-LABEL: @nested(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = tail call token @llvm.experimental.convergence.entry()
+; CHECK-NEXT: [[TMP1:%.*]] = or i32 [[TIDY:%.*]], [[TIDX:%.*]]
+; CHECK-NEXT: [[OR_COND_I:%.*]] = icmp eq i32 [[TMP1]], 0
+; CHECK-NEXT: br label [[FOR_COND_I:%.*]]
+; CHECK: for.cond.i:
+; CHECK-NEXT: [[TMP2:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP0]]) ]
+; CHECK-NEXT: br label [[FOR_COND1_I:%.*]]
+; CHECK: for.cond1.i:
+; CHECK-NEXT: [[CMP2_I:%.*]] = phi i1 [ false, [[FOR_BODY4_I:%.*]] ], [ true, [[FOR_COND_I]] ]
+; CHECK-NEXT: [[TMP3:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TMP2]]) ]
+; CHECK-NEXT: br i1 [[CMP2_I]], label [[FOR_BODY4_I]], label [[EXIT:%.*]]
+; CHECK: for.body4.i:
+; CHECK-NEXT: br i1 [[OR_COND_I]], label [[IF_THEN_I:%.*]], label [[FOR_COND1_I]]
+; CHECK: if.then.i:
+; CHECK-NEXT: [[HLSL_WAVE_ACTIVE_MAX7_I:%.*]] = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token [[TMP3]]) ]
+; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[ARRAY:%.*]], i32 0
+; CHECK-NEXT: store i32 [[HLSL_WAVE_ACTIVE_MAX7_I]], ptr [[TMP4]], align 4
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+entry:
+ %0 = tail call token @llvm.experimental.convergence.entry()
+ %2 = or i32 %tidy, %tidx
+ %or.cond.i = icmp eq i32 %2, 0
+ br label %for.cond.i
+
+for.cond.i:
+ %3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %0) ]
+ br label %for.cond1.i
+
+for.cond1.i:
+ %cmp2.i = phi i1 [ false, %for.body4.i ], [ true, %for.cond.i ]
+ %4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %3) ]
+ br i1 %cmp2.i, label %for.body4.i, label %cleanup.i.loopexit
+
+for.body4.i:
+ br i1 %or.cond.i, label %if.then.i, label %for.cond1.i
+
+if.then.i:
+ %hlsl.wave.active.max7.i = call spir_func i32 @llvm.spv.wave.reduce.umax.i32(i32 0) [ "convergencectrl"(token %4) ]
+ %5 = getelementptr inbounds i32, ptr %array, i32 0
+ store i32 %hlsl.wave.active.max7.i, ptr %5, align 4
+ br label %cleanup.i
+
+cleanup.i.loopexit:
+ br label %cleanup.i
+
+cleanup.i:
+ br label %exit
+
+exit:
+ ret void
+}
+
+declare token @llvm.experimental.convergence.loop() #0
+
+declare i32 @llvm.spv.wave.reduce.umax.i32(i32) #0
+
+attributes #0 = { convergent }
|
|
It seems bad that the convergence control LangRef rules make it illegal to merge basic blocks connected by a single direct branch. If this every happens, it's a sign that the loop is degenerate, i.e. it runs no more than once. Could the backends requiring convergence be taught to tolerate the possibility of multiple convergence control intrinsics, perhaps simply by running a simple cleanup pass that eliminates degenerate loops and replaces the loop token with the token used to create it? Does that transform preserve semantics, or can it change observable behavior? It seems like the best outcome would be that we relax the LangRef rules so fewer passes are required to scan for this non-local information. |
@ssahasra Do you know which backends use the convergence tokens? Do you have any thoughts on modifying the spec? I think the SPIR-V backend could handle having two tokens defined in the same BB as long as the second is a loop that uses the first. We just have to do a type of copy propagation on it. However, I'm not convinced many transforms will benefit from the rule change. We still need any transform that replicates code to be skipped if it replicates the token. How many transformations merge two block without replicating code? We will still need to add a legality check to other passes. |
Keenuts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an initial comment regarding the test file (waiting to see if the discussion over the change makes progress)
llvm/test/Transforms/SimplifyCFG/skip-merging-duplicate-convergence-instrinsics.ll
Outdated
Show resolved
Hide resolved
6228972 to
0cad7f0
Compare
🐧 Linux x64 Test Results
Failed Tests(click on a test name to see its output) LLVMLLVM.Transforms/LoopUnroll/convergent.controlled.llIf these failures are unrelated to your changes (for example tests are broken or flaky at HEAD), please open an issue at https://github.com/llvm/llvm-project/issues and add the |
| for (Instruction &I : *BB) | ||
| if (isa<ConvergenceControlInst>(I)) { | ||
| for (Instruction &I : *PredBB) | ||
| if (isa<ConvergenceControlInst>(I)) | ||
| return false; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a move expensive check because it has to traverse potentially two basic blocks. We should move it later. I would make it the last check before deciding to merge.
Also the way this is written is a little unclear. First if we know that there can be at most one convergence token per basic block, we could make this two separate checks and take advantage of short-circuit evaluation:
bool HasConvergenceToken(const BasicBlock *BB) {
for (const Instruction &I : *BB)
if (isa<ConvergenceControlInst>(I))
return true;
return false;
}
Then the check becomes:
if (HasConvergenceToken(BB) && HasConvervenceToken(PredBB))
return false;
Fixes a bug #165642. Similar fix is being made in
IndVarSimplifypass to account for convergence tokens.LLVM Spec states that only a single convergence intrinsic can be included in a basic block.
This PR fixes the issue in
SimplifyCFGpass so that when a basic block and its predecessor both contain convergence intrinsics, it skips merging the two blocks.