-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SimplifyCFG] Prevent merging cbranch to cbranch if the branch probability from the first to second is too low. #69375
Conversation
…ranch probabililty from the first to second is too low.
@llvm/pr-subscribers-llvm-transforms Author: Valery Pykhtin (vpykhtin) ChangesAMDGPU target has faced the situation which can be illustrated with the following testcase:
SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive (as names state the first branch is uniform and the second is divergent and requires EXEC mask handling). @nhaehnle suggested to stop merging such branches on the basis of branch probability information. Threshold value is somewhat voluntaristic at the moment and probably should be selected as the lowest from the known used branch probability values in the LLVM code to preserve current behaviour. It may also happen that we end up using target dependent threshold. Full diff: https://github.com/llvm/llvm-project/pull/69375.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 35fead111aa9666..06e0fc91483a514 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -180,6 +180,11 @@ static cl::opt<unsigned> MaxSwitchCasesPerResult(
"max-switch-cases-per-result", cl::Hidden, cl::init(16),
cl::desc("Limit cases to analyze when converting a switch to select"));
+static cl::opt<unsigned> CondBranchToCondBranchWeightRatio(
+ "simplifycfg-cbranch-to-cbranch-weight-ratio", cl::Hidden, cl::init(10000),
+ cl::desc("Don't merge conditional branches if the branch probability from "
+ "the first to second is below of the reciprocal of this value"));
+
STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
STATISTIC(NumLinearMaps,
"Number of switch instructions turned into linear mapping");
@@ -4347,6 +4352,15 @@ static bool SimplifyCondBranchToCondBranch(BranchInst *PBI, BranchInst *BI,
if (PBI->getSuccessor(PBIOp) == BB)
return false;
+ // If predecessor's branch probability to BB is too low don't merge branches.
+ SmallVector<uint32_t, 2> PredWeights;
+ if (extractBranchWeights(*PBI, PredWeights)) {
+ auto BIWeight = PredWeights[PBIOp ^ 1];
+ auto CommonWeight = PredWeights[PBIOp];
+ if (CommonWeight / BIWeight > CondBranchToCondBranchWeightRatio)
+ return false;
+ }
+
// Do not perform this transformation if it would require
// insertion of a large number of select instructions. For targets
// without predication/cmovs, this is a big pessimization.
diff --git a/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll b/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll
new file mode 100644
index 000000000000000..6dcdfee21932f12
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll
@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=simplifycfg -simplifycfg-cbranch-to-cbranch-weight-ratio=100 -S | FileCheck %s
+
+declare void @bar()
+declare i1 @uniform_result(i1 %c)
+
+define void @dont_merge_cbranches1(i32 %V) {
+; CHECK-LABEL: @dont_merge_cbranches1(
+; CHECK-NEXT: [[DIVERGENT_COND:%.*]] = icmp ne i32 [[V:%.*]], 0
+; CHECK-NEXT: [[UNIFORM_COND:%.*]] = call i1 @uniform_result(i1 [[DIVERGENT_COND]])
+; CHECK-NEXT: br i1 [[UNIFORM_COND]], label [[BB2:%.*]], label [[EXIT:%.*]], !prof [[PROF0:![0-9]+]]
+; CHECK: bb2:
+; CHECK-NEXT: br i1 [[DIVERGENT_COND]], label [[BB3:%.*]], label [[EXIT]]
+; CHECK: bb3:
+; CHECK-NEXT: call void @bar()
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+ %divergent_cond = icmp ne i32 %V, 0
+ %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
+ br i1 %uniform_cond, label %bb2, label %exit, !prof !0
+bb2:
+ br i1 %divergent_cond, label %bb3, label %exit
+bb3:
+ call void @bar( )
+ br label %exit
+exit:
+ ret void
+}
+
+define void @dont_merge_cbranches2(i32 %V) {
+; CHECK-LABEL: @dont_merge_cbranches2(
+; CHECK-NEXT: [[DIVERGENT_COND:%.*]] = icmp ne i32 [[V:%.*]], 0
+; CHECK-NEXT: [[UNIFORM_COND:%.*]] = call i1 @uniform_result(i1 [[DIVERGENT_COND]])
+; CHECK-NEXT: br i1 [[UNIFORM_COND]], label [[EXIT:%.*]], label [[BB2:%.*]], !prof [[PROF1:![0-9]+]]
+; CHECK: bb2:
+; CHECK-NEXT: br i1 [[DIVERGENT_COND]], label [[BB3:%.*]], label [[EXIT]]
+; CHECK: bb3:
+; CHECK-NEXT: call void @bar()
+; CHECK-NEXT: br label [[EXIT]]
+; CHECK: exit:
+; CHECK-NEXT: ret void
+;
+ %divergent_cond = icmp ne i32 %V, 0
+ %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
+ br i1 %uniform_cond, label %exit, label %bb2, !prof !1
+bb2:
+ br i1 %divergent_cond, label %bb3, label %exit
+bb3:
+ call void @bar( )
+ br label %exit
+exit:
+ ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 1000}
+!1 = !{!"branch_weights", i32 1000, i32 1}
|
…ililty from the first to second is too low.
d5a0880
to
4becda1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I don't have better guidance on the threshold value either. The default choice seems pretty conservative which is probably a good thing. Perhaps somebody else wants to chime in, but this LGTM.
Any particular reason this should not use |
Thank you, that is what I have been looking for, updated. |
It looks like I should check MD_unpredictable as well |
@nikic or other reviewers, do you have objections to submit this patch? |
I've been puzzling over the rationale for looking at |
I'm curious either, I just decided to follow a 'standard' pattern in this source. |
Guys, I really need this change submitted, I'm going to submit it on monday if there're no objections. |
…ility from the first to second is too low. (llvm#69375) AMDGPU target has faced the situation which can be illustrated with the following testcase: define void @dont_merge_cbranches(i32 %V) { %divergent_cond = icmp ne i32 %V, 0 %uniform_cond = call i1 @uniform_result(i1 %divergent_cond) br i1 %uniform_cond, label %bb2, label %exit, !prof !0 bb2: br i1 %divergent_cond, label %bb3, label %exit bb3: call void @bar( ) br label %exit exit: ret void } !0 = !{!"branch_weights", i32 1, i32 100000} SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive. The merged branch becomes as expensive as the second. This patch prevents such merging if the branch to the second branch is unlikely to happen.
AMDGPU target has faced the situation which can be illustrated with the following testcase:
SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive (as names state the first branch is uniform and the second is divergent and requires EXEC mask handling). The merged branch becomes as expensive as the second.
@nhaehnle suggested to stop merging such branches on the basis of branch probability information.
Threshold value is somewhat voluntaristic at the moment and probably should be selected as the lowest from the known used branch probability values in the LLVM code to preserve current behaviour. It may also happen that we end up using target dependent threshold.