-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Add option for two-way branch optimization. #161419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-x86 Author: Rahman Lavaee (rlavaee) ChangesOur internal experiments show that in highly-optimized code, reversing the current compiler strategy for two-way branches can be beneficial (neutral to 0.2% win). Specifically, if we form a fallthrough (through the subsequent jmp) to the most likely successor, it can benefit static branch prediction since branches are initially assumed not-taken by most modern processors. This is especially important for binaries with split functions where the function is split into multiple code regions and speculative wrong predictions can incur high iTLB and icache misses. Though our experiments are still ongoing (specifically to analyze the impact on ARM and different types of PGO), we want to support controlling the optimizing via a flag Full diff: https://github.com/llvm/llvm-project/pull/161419.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
index e9c75f0753f89..d0b5f5145f384 100644
--- a/llvm/lib/CodeGen/MachineBlockPlacement.cpp
+++ b/llvm/lib/CodeGen/MachineBlockPlacement.cpp
@@ -153,6 +153,36 @@ static cl::opt<unsigned> MisfetchCost(
static cl::opt<unsigned> JumpInstCost("jump-inst-cost",
cl::desc("Cost of jump instructions."),
cl::init(1), cl::Hidden);
+
+// This enum controls how to optimize two-way branches (a conditional branch
+// immediately followed by an unconditional one). The goal is to optimize for
+// branch prediction and instruction cache efficiency.
+enum class TwoWayBranchOptStrategy {
+ // Do not reverse the condition. Leave the branch code as is.
+ None,
+ // For a two-way branch, make the hot path the fallthrough path. This is more
+ // friendly to static branch prediction (predict not-taken).
+ HotPathFallthrough,
+ // For a two-way branch, make the cold path the fallthrough path. This
+ // improves
+ // i-cache efficiency as the unconditional branch is fetched less often.
+ ColdPathFallthrough
+};
+
+static cl::opt<TwoWayBranchOptStrategy> TwoWayBranchOpt(
+ "two-way-branch-opt", cl::Hidden,
+ cl::desc("Select the optimization strategy for two-way conditional branches:"),
+ cl::values(
+ clEnumValN(TwoWayBranchOptStrategy::None, "none",
+ "Avoid optimizing the two-way branches."),
+ clEnumValN(
+ TwoWayBranchOptStrategy::HotPathFallthrough, "hot-fallthrough",
+ "Make the hot path the fallthrough path for two-way branches"),
+ clEnumValN(
+ TwoWayBranchOptStrategy::ColdPathFallthrough, "cold-fallthrough",
+ "Make the cold path the fallthrough path for two-way branches")),
+ cl::init(TwoWayBranchOptStrategy::ColdPathFallthrough));
+
static cl::opt<bool>
TailDupPlacement("tail-dup-placement",
cl::desc("Perform tail duplication during placement. "
@@ -2979,10 +3009,17 @@ void MachineBlockPlacement::optimizeBranches() {
// instructions which will benefit ICF.
if (llvm::shouldOptimizeForSize(ChainBB, PSI, MBFI.get()))
continue;
- // If ChainBB has a two-way branch, try to re-order the branches
- // such that we branch to the successor with higher probability first.
- if (MBPI->getEdgeProbability(ChainBB, TBB) >=
- MBPI->getEdgeProbability(ChainBB, FBB))
+ // ChainBB has a two-way branch. Reorder the branch based on
+ // `-two-way-branch-opt`;
+ auto TBBProb = MBPI->getEdgeProbability(ChainBB, TBB);
+ auto FBBProb = MBPI->getEdgeProbability(ChainBB, FBB);
+ bool ReverseBranch =
+ (TwoWayBranchOpt ==
+ TwoWayBranchOptStrategy::ColdPathFallthrough &&
+ (FBBProb > TBBProb)) ||
+ (TwoWayBranchOpt == TwoWayBranchOptStrategy::HotPathFallthrough &&
+ (TBBProb > FBBProb));
+ if (!ReverseBranch)
continue;
if (TII->reverseBranchCondition(Cond))
continue;
diff --git a/llvm/test/CodeGen/X86/code_placement_2_way_branch.ll b/llvm/test/CodeGen/X86/code_placement_2_way_branch.ll
new file mode 100644
index 0000000000000..3afa793e71ec7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/code_placement_2_way_branch.ll
@@ -0,0 +1,70 @@
+; RUN: llc -mtriple=x86_64-linux -verify-machineinstrs -two-way-branch-opt=cold-fallthrough < %s | FileCheck %s --check-prefixes=CHECK,COLD-FT
+; RUN: llc -mtriple=x86_64-linux -verify-machineinstrs -two-way-branch-opt=none < %s | FileCheck %s --check-prefixes=CHECK,COLD-FT
+; RUN: llc -mtriple=x86_64-linux -verify-machineinstrs -two-way-branch-opt=hot-fallthrough < %s | FileCheck %s --check-prefixes=CHECK,HOT-FT
+
+define void @foo() !prof !1 {
+; Test that two-way branches are optimized based on `-two-way-branch-opt`.
+;
+; +--------+ 5 +--------+
+; | if.then| <---- | entry |
+; +--------+ +--------+
+; | | |
+; | | | 10
+; | | v
+; | | +--------+
+; | | | if.else|
+; | | +--------+
+; | | |
+; | | | 10
+; | | v
+; | | 4 +--------+
+; | +---------> | if.end |
+; | +--------+
+; | |
+; | | 14
+; | v
+; | 1 +--------+
+; +------------> | end |
+; +--------+
+;
+; CHECK-LABEL: foo:
+; CHECK: if.else
+; CHECK: .LBB0_3: # %if.end
+; CHECK: .LBB0_4: # %end
+; CHECK: if.then
+; COLD-FT: jne .LBB0_3
+; HOT-FT: je .LBB0_4
+; COLD-FT: jmp .LBB0_4
+; HOT-FT: jmp .LBB0_3
+
+entry:
+ call void @e()
+ %call1 = call zeroext i1 @a()
+ br i1 %call1, label %if.then, label %if.else, !prof !2
+
+if.then:
+ call void @f()
+ %call2 = call zeroext i1 @a()
+ br i1 %call2, label %if.end, label %end, !prof !3
+
+if.else:
+ call void @g()
+ br label %if.end
+
+if.end:
+ call void @h()
+ br label %end
+
+end:
+ ret void
+}
+
+declare zeroext i1 @a()
+declare void @e()
+declare void @g()
+declare void @f()
+declare void @h()
+
+!1 = !{!"function_entry_count", i64 15}
+!2 = !{!"branch_weights", i32 5, i32 10}
+!3 = !{!"branch_weights", i32 4, i32 1}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With hot fallthrough, existing code-placement tests may behave differently. Can you run those and add the hot-fallthrough tests for those cases too after examining the results?
Our internal experiments show that in highly-optimized code, reversing the current compiler strategy for two-way branches can be beneficial (neutral to 0.2% win). Specifically, if we form a fallthrough (through the subsequent jmp) to the most likely successor, it can benefit static branch prediction since branches are initially assumed not-taken by most modern processors. This is especially important for binaries with split functions where the function is split into multiple code regions and speculative wrong predictions can incur high iTLB and icache misses.
Though our experiments are still ongoing (specifically to analyze the impact on ARM and different types of PGO), we want to support controlling the optimizing via a flag
two-way-branch-opt
which will take one of three values: none, hot-fallthrough, and cold-fallthrough. The current compiler strategy is cold-fallthrough and will remain intact.