-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[BOLT][ICP] Propagate parent hash to newly inserted BBs for BAT #171044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BOLT][ICP] Propagate parent hash to newly inserted BBs for BAT #171044
Conversation
|
@llvm/pr-subscribers-bolt Author: Jinjie Huang (Jinjie-Huang) ChangesCurrently, in BAT mode, if new BBs are added during the optimization pipeline (e.g., via ICP), their index and hash in the BAT section default to 0. The reason is that saveMetadata() executes before runOptimizationPasses()(seem to get the unoptimized binary layout), and this results in incorrect information in the YAML profile generated by continuous profiling (specifically duplicated zero BB index/hashes), which prevents the use of the infer-stale-profile option(code ). So this patch tries to resolve this by updating these newly inserted BBs using the index and hash of the original BB at the insertion point within ICP. This helps to correctly map the new blocks back to the original binary information. Some background & related PRs: Full diff: https://github.com/llvm/llvm-project/pull/171044.diff 6 Files Affected:
diff --git a/bolt/include/bolt/Core/BinaryContext.h b/bolt/include/bolt/Core/BinaryContext.h
index 4872ccc0f5d7b..300238ed71b12 100644
--- a/bolt/include/bolt/Core/BinaryContext.h
+++ b/bolt/include/bolt/Core/BinaryContext.h
@@ -64,6 +64,7 @@ using namespace object;
namespace bolt {
class BinaryFunction;
+class BoltAddressTranslation;
/// Information on loadable part of the file.
struct SegmentInfo {
@@ -280,6 +281,8 @@ class BinaryContext {
/// Internal helper for removing section name from a lookup table.
void deregisterSectionName(const BinarySection &Section);
+ BoltAddressTranslation *BAT = nullptr;
+
public:
static Expected<std::unique_ptr<BinaryContext>> createBinaryContext(
Triple TheTriple, std::shared_ptr<orc::SymbolStringPool> SSP,
@@ -1559,6 +1562,10 @@ class BinaryContext {
return *IOAddressMap;
}
+ void setBAT(BoltAddressTranslation *B) { BAT = B; }
+
+ BoltAddressTranslation *getBAT() const { return BAT; }
+
raw_ostream &outs() const { return Logger.Out; }
raw_ostream &errs() const { return Logger.Err; }
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index fcc578f35e322..54420951fb6a8 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -131,6 +131,15 @@ class BoltAddressTranslation {
translateSymbol(const BinaryContext &BC, const MCSymbol &Symbol,
uint32_t InputOffset) const;
+ // Updates the BAT BBHashMap for a list of newly inserted \p BBs by
+ // mapping the InputOffset of each new block to the Index and Hash of the
+ // \p OriginBB. This ensures that new blocks generated by optimizations (like
+ // ICP) are treated as part of the original block for address translation
+ // purposes.
+ void updateInsertedBBHashMap(
+ const BinaryBasicBlock &OriginBB,
+ const std::vector<std::unique_ptr<BinaryBasicBlock>> &BBs);
+
private:
/// Helper to update \p Map by inserting one or more BAT entries reflecting
/// \p BB for function located at \p FuncAddress. At least one entry will be
diff --git a/bolt/lib/Passes/IndirectCallPromotion.cpp b/bolt/lib/Passes/IndirectCallPromotion.cpp
index 8a01cb974c5da..48e85ade3e8a3 100644
--- a/bolt/lib/Passes/IndirectCallPromotion.cpp
+++ b/bolt/lib/Passes/IndirectCallPromotion.cpp
@@ -34,6 +34,7 @@ extern cl::OptionCategory BoltOptCategory;
extern cl::opt<IndirectCallPromotionType> ICP;
extern cl::opt<unsigned> Verbosity;
extern cl::opt<unsigned> ExecutionCountThreshold;
+extern cl::opt<bool> EnableBAT;
static cl::opt<unsigned> ICPJTRemainingPercentThreshold(
"icp-jt-remaining-percent-threshold",
@@ -1401,6 +1402,8 @@ Error IndirectCallPromotion::runOnFunctions(BinaryContext &BC) {
std::vector<std::unique_ptr<BinaryBasicBlock>> NewBBs =
rewriteCall(*BB, Inst, std::move(ICPcode), MethodInfo.second);
+ if (opts::EnableBAT)
+ BC.getBAT()->updateInsertedBBHashMap(*BB, NewBBs);
// Fix the CFG after inserting the new basic blocks.
BinaryBasicBlock *MergeBlock =
fixCFG(*BB, IsTailCall, IsJumpTable, std::move(NewBBs), Targets);
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index 7ad4e6a2e1411..a917a220e61a7 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -8,6 +8,7 @@
#include "bolt/Profile/BoltAddressTranslation.h"
#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
#include "llvm/ADT/APInt.h"
#include "llvm/Support/Errc.h"
#include "llvm/Support/Error.h"
@@ -637,5 +638,22 @@ BoltAddressTranslation::translateSymbol(const BinaryContext &BC,
return std::pair(ParentBF, SecondaryEntryId);
}
+void BoltAddressTranslation::updateInsertedBBHashMap(
+ const BinaryBasicBlock &OriginBB,
+ const std::vector<std::unique_ptr<BinaryBasicBlock>> &BBs) {
+ if (!opts::EnableBAT || !OriginBB.getFunction() || !OriginBB.getHash())
+ return;
+
+ auto &BBHashMap = getBBHashMap(OriginBB.getFunction()->getAddress());
+
+ for (const auto &BB : BBs) {
+ if (!BB)
+ continue;
+ // Update BBHashMap using the original BB's index and hash.
+ BBHashMap.addEntry(BB->getInputOffset(), OriginBB.getIndex(),
+ OriginBB.getHash());
+ }
+}
+
} // namespace bolt
} // namespace llvm
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 067a3e6636f0b..67620f03a9c8e 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -447,6 +447,7 @@ RewriteInstance::RewriteInstance(ELFObjectFileBase *File, const int Argc,
BC->MII.get(), BC->MRI.get(), BC->STI.get())));
BAT = std::make_unique<BoltAddressTranslation>();
+ BC->setBAT(BAT.get());
if (opts::UpdateDebugSections)
DebugInfoRewriter = std::make_unique<DWARFRewriter>(*BC);
diff --git a/bolt/test/X86/icp-inline.s b/bolt/test/X86/icp-inline.s
index c5106db5a5389..9c2802ef9730c 100644
--- a/bolt/test/X86/icp-inline.s
+++ b/bolt/test/X86/icp-inline.s
@@ -36,6 +36,17 @@
# CHECK-ICP-INLINE: callq *(%rcx,%rax,8)
# CHECK-ICP-INLINE-NOT: callq bar
# CHECK-ICP-INLINE: End of Function "main"
+
+# Checks BB hash has been updated correctly after ICP optimization
+# RUN: llvm-bolt %t.exe --icp=calls --icp-calls-topn=1 --inline-small-functions\
+# RUN: -o %t.null --lite=0 \
+# RUN: --inline-small-functions-bytes=4 --icp-inline --print-icp \
+# RUN: --data %t.fdata --enable-bat
+# RUN: llvm-bat-dump --dump-all %t.null | FileCheck %s --check-prefix=CHECK-BB-HASH
+# CHECK-BB-HASH: 0x17 -> 0xd hash: 0x5b24d9a60000
+# CHECK-BB-HASH: 0x1d -> 0xd hash: 0x5b24d9a60000
+# CHECK-BB-HASH: 0x1f -> 0xd hash: 0x5b24d9a60000
+
.globl bar
bar:
.cfi_startproc
|
|
Some background & related PRs: I believe the root cause behind both of these issues is actually the reason mentioned in this PR. In scenarios where new BBs are inserted, in addition to this PR, the first PR mentioned above is also required to fix the issue. |
cabf537 to
b6c62b6
Compare
| if (!BB) | ||
| continue; | ||
| // Update BB and BBHashMap using the original BB's index and hash. | ||
| BB->setHash(OriginBB.getHash()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update inserted BBs' information as well, since they may be used as OriginBB later.
aaupov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for investigating the issue. Generally I agree we should be associating the profile of ICP-created blocks with the original block containing the indirect call.
In terms of implementation, I think explicitly inserting the blocks into BBHashMap is not the right approach, as we'd need to do that in any optimization that copies or inserts blocks. Instead, since we associate the input address of the original indirect call with created blocks, let's use it to look up the block hash.
|
@aaupov HI, thanks for the suggestion! There's a subtle detail here: in ICP, the |
Currently, in BAT mode, if new BBs are added during the optimization pipeline (e.g., via ICP), their index and hash in the BAT section default to 0. The reason is that saveMetadata() executes before runOptimizationPasses() in order to get the unoptimized binary layout, and this results in incorrect information in the YAML profile (specifically duplicated zero BB index/hashes generated by first-round bat writer and continuous profile writer), which prevents the use of the infer-stale-profile option(code).
So this patch tries to resolve this by updating these newly inserted BBs using the index and hash from the original BB at the insertion point (during ICP). This helps to correctly map the new blocks back to the original binary information.