Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SimplifyCFG] Prevent merging cbranch to cbranch if the branch probability from the first to second is too low. #69375

Merged
merged 6 commits into from
Nov 13, 2023

Conversation

vpykhtin
Copy link
Contributor

@vpykhtin vpykhtin commented Oct 17, 2023

AMDGPU target has faced the situation which can be illustrated with the following testcase:

define void @dont_merge_cbranches(i32 %V) {
  %divergent_cond = icmp ne i32 %V, 0
  %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
  br i1 %uniform_cond, label %bb2, label %exit, !prof !0
bb2:
  br i1 %divergent_cond, label %bb3, label %exit
bb3:
  call void @bar( )
  br label %exit
exit:
  ret void
}
!0 = !{!"branch_weights", i32 1, i32 100000}

SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive (as names state the first branch is uniform and the second is divergent and requires EXEC mask handling). The merged branch becomes as expensive as the second.

@nhaehnle suggested to stop merging such branches on the basis of branch probability information.

Threshold value is somewhat voluntaristic at the moment and probably should be selected as the lowest from the known used branch probability values in the LLVM code to preserve current behaviour. It may also happen that we end up using target dependent threshold.

…ranch probabililty from the first to second is too low.
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 17, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Valery Pykhtin (vpykhtin)

Changes

AMDGPU target has faced the situation which can be illustrated with the following testcase:

define void @<!-- -->dont_merge_cbranches(i32 %V) {
  %divergent_cond = icmp ne i32 %V, 0
  %uniform_cond = call i1 @<!-- -->uniform_result(i1 %divergent_cond)
  br i1 %uniform_cond, label %bb2, label %exit, !prof !0
bb2:
  br i1 %divergent_cond, label %bb3, label %exit
bb3:
  call void @<!-- -->bar( )
  br label %exit
exit:
  ret void
}
!0 = !{!"branch_weights", i32 1, i32 100000}

SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive (as names state the first branch is uniform and the second is divergent and requires EXEC mask handling).

@nhaehnle suggested to stop merging such branches on the basis of branch probability information.

Threshold value is somewhat voluntaristic at the moment and probably should be selected as the lowest from the known used branch probability values in the LLVM code to preserve current behaviour. It may also happen that we end up using target dependent threshold.


Full diff: https://github.com/llvm/llvm-project/pull/69375.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+14)
  • (added) llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll (+58)
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 35fead111aa9666..06e0fc91483a514 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -180,6 +180,11 @@ static cl::opt<unsigned> MaxSwitchCasesPerResult(
     "max-switch-cases-per-result", cl::Hidden, cl::init(16),
     cl::desc("Limit cases to analyze when converting a switch to select"));
 
+static cl::opt<unsigned> CondBranchToCondBranchWeightRatio(
+    "simplifycfg-cbranch-to-cbranch-weight-ratio", cl::Hidden, cl::init(10000),
+    cl::desc("Don't merge conditional branches if the branch probability from "
+             "the first to second is below of the reciprocal of this value"));
+
 STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
 STATISTIC(NumLinearMaps,
           "Number of switch instructions turned into linear mapping");
@@ -4347,6 +4352,15 @@ static bool SimplifyCondBranchToCondBranch(BranchInst *PBI, BranchInst *BI,
   if (PBI->getSuccessor(PBIOp) == BB)
     return false;
 
+  // If predecessor's branch probability to BB is too low don't merge branches.
+  SmallVector<uint32_t, 2> PredWeights;
+  if (extractBranchWeights(*PBI, PredWeights)) {
+    auto BIWeight = PredWeights[PBIOp ^ 1];
+    auto CommonWeight = PredWeights[PBIOp];
+    if (CommonWeight / BIWeight > CondBranchToCondBranchWeightRatio)
+      return false;
+  }
+
   // Do not perform this transformation if it would require
   // insertion of a large number of select instructions. For targets
   // without predication/cmovs, this is a big pessimization.
diff --git a/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll b/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll
new file mode 100644
index 000000000000000..6dcdfee21932f12
--- /dev/null
+++ b/llvm/test/Transforms/SimplifyCFG/branch-cond-dont-merge.ll
@@ -0,0 +1,58 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=simplifycfg -simplifycfg-cbranch-to-cbranch-weight-ratio=100 -S | FileCheck %s
+
+declare void @bar()
+declare i1 @uniform_result(i1 %c)
+
+define void @dont_merge_cbranches1(i32 %V) {
+; CHECK-LABEL: @dont_merge_cbranches1(
+; CHECK-NEXT:    [[DIVERGENT_COND:%.*]] = icmp ne i32 [[V:%.*]], 0
+; CHECK-NEXT:    [[UNIFORM_COND:%.*]] = call i1 @uniform_result(i1 [[DIVERGENT_COND]])
+; CHECK-NEXT:    br i1 [[UNIFORM_COND]], label [[BB2:%.*]], label [[EXIT:%.*]], !prof [[PROF0:![0-9]+]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br i1 [[DIVERGENT_COND]], label [[BB3:%.*]], label [[EXIT]]
+; CHECK:       bb3:
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  %divergent_cond = icmp ne i32 %V, 0
+  %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
+  br i1 %uniform_cond, label %bb2, label %exit, !prof !0
+bb2:
+  br i1 %divergent_cond, label %bb3, label %exit
+bb3:
+  call void @bar( )
+  br label %exit
+exit:
+  ret void
+}
+
+define void @dont_merge_cbranches2(i32 %V) {
+; CHECK-LABEL: @dont_merge_cbranches2(
+; CHECK-NEXT:    [[DIVERGENT_COND:%.*]] = icmp ne i32 [[V:%.*]], 0
+; CHECK-NEXT:    [[UNIFORM_COND:%.*]] = call i1 @uniform_result(i1 [[DIVERGENT_COND]])
+; CHECK-NEXT:    br i1 [[UNIFORM_COND]], label [[EXIT:%.*]], label [[BB2:%.*]], !prof [[PROF1:![0-9]+]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br i1 [[DIVERGENT_COND]], label [[BB3:%.*]], label [[EXIT]]
+; CHECK:       bb3:
+; CHECK-NEXT:    call void @bar()
+; CHECK-NEXT:    br label [[EXIT]]
+; CHECK:       exit:
+; CHECK-NEXT:    ret void
+;
+  %divergent_cond = icmp ne i32 %V, 0
+  %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
+  br i1 %uniform_cond, label %exit, label %bb2, !prof !1
+bb2:
+  br i1 %divergent_cond, label %bb3, label %exit
+bb3:
+  call void @bar( )
+  br label %exit
+exit:
+  ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 1000}
+!1 = !{!"branch_weights", i32 1000, i32 1}

@vpykhtin vpykhtin self-assigned this Oct 18, 2023
Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I don't have better guidance on the threshold value either. The default choice seems pretty conservative which is probably a good thing. Perhaps somebody else wants to chime in, but this LGTM.

@nikic
Copy link
Contributor

nikic commented Oct 21, 2023

Any particular reason this should not use TTI->getPredictableBranchThreshold()?

@vpykhtin
Copy link
Contributor Author

Any particular reason this should not use TTI->getPredictableBranchThreshold()?

Thank you, that is what I have been looking for, updated.

@vpykhtin
Copy link
Contributor Author

It looks like I should check MD_unpredictable as well

@vpykhtin
Copy link
Contributor Author

@nikic or other reviewers, do you have objections to submit this patch?

@nhaehnle
Copy link
Collaborator

nhaehnle commented Nov 3, 2023

I've been puzzling over the rationale for looking at MD_unpredictable. How can a branch that is heavily biased to one side be unpredictable?

@vpykhtin
Copy link
Contributor Author

vpykhtin commented Nov 6, 2023

I've been puzzling over the rationale for looking at MD_unpredictable. How can a branch that is heavily biased to one side be unpredictable?

I'm curious either, I just decided to follow a 'standard' pattern in this source.

@vpykhtin
Copy link
Contributor Author

vpykhtin commented Nov 9, 2023

Guys, I really need this change submitted, I'm going to submit it on monday if there're no objections.

@vpykhtin vpykhtin merged commit f054947 into llvm:main Nov 13, 2023
3 checks passed
@DaniilSuchkov
Copy link
Contributor

Hi @vpykhtin! It looks like this change has caused a functional regression #72323

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
…ility from the first to second is too low. (llvm#69375)

AMDGPU target has faced the situation which can be illustrated with the
following testcase:

define void @dont_merge_cbranches(i32 %V) {
  %divergent_cond = icmp ne i32 %V, 0
  %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
  br i1 %uniform_cond, label %bb2, label %exit, !prof !0
bb2:
  br i1 %divergent_cond, label %bb3, label %exit
bb3:
  call void @bar( )
  br label %exit
exit:
  ret void
}
!0 = !{!"branch_weights", i32 1, i32 100000}

SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive. The merged branch becomes as expensive as the second.

This patch prevents such merging if the branch to the second branch is unlikely to happen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants