Skip to content

Conversation

@Jinjie-Huang
Copy link
Contributor

@Jinjie-Huang Jinjie-Huang commented Dec 7, 2025

Currently, in BAT mode, if new BBs are added during the optimization pipeline (e.g., via ICP), their index and hash in the BAT section default to 0. The reason is that saveMetadata() executes before runOptimizationPasses() in order to get the unoptimized binary layout, and this results in incorrect information in the YAML profile (specifically duplicated zero BB index/hashes generated by first-round bat writer and continuous profile writer), which prevents the use of the infer-stale-profile option(code).

So this patch tries to resolve this by updating these newly inserted BBs using the index and hash from the original BB at the insertion point (during ICP). This helps to correctly map the new blocks back to the original binary information.

@Jinjie-Huang Jinjie-Huang marked this pull request as ready for review December 7, 2025 15:28
@llvmbot llvmbot added the BOLT label Dec 7, 2025
@llvmbot
Copy link
Member

llvmbot commented Dec 7, 2025

@llvm/pr-subscribers-bolt

Author: Jinjie Huang (Jinjie-Huang)

Changes

Currently, in BAT mode, if new BBs are added during the optimization pipeline (e.g., via ICP), their index and hash in the BAT section default to 0. The reason is that saveMetadata() executes before runOptimizationPasses()(seem to get the unoptimized binary layout), and this results in incorrect information in the YAML profile generated by continuous profiling (specifically duplicated zero BB index/hashes), which prevents the use of the infer-stale-profile option(code ).

So this patch tries to resolve this by updating these newly inserted BBs using the index and hash of the original BB at the insertion point within ICP. This helps to correctly map the new blocks back to the original binary information.

Some background & related PRs:
Fix incorrect BB index mapping: PR (WIP)
Skip mapping zero-hash BBs in BAT: PR (WIP)
I believe the root cause behind both of these issues is actually the reason mentioned above.


Full diff: https://github.com/llvm/llvm-project/pull/171044.diff

6 Files Affected:

  • (modified) bolt/include/bolt/Core/BinaryContext.h (+7)
  • (modified) bolt/include/bolt/Profile/BoltAddressTranslation.h (+9)
  • (modified) bolt/lib/Passes/IndirectCallPromotion.cpp (+3)
  • (modified) bolt/lib/Profile/BoltAddressTranslation.cpp (+18)
  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+1)
  • (modified) bolt/test/X86/icp-inline.s (+11)
diff --git a/bolt/include/bolt/Core/BinaryContext.h b/bolt/include/bolt/Core/BinaryContext.h
index 4872ccc0f5d7b..300238ed71b12 100644
--- a/bolt/include/bolt/Core/BinaryContext.h
+++ b/bolt/include/bolt/Core/BinaryContext.h
@@ -64,6 +64,7 @@ using namespace object;
 namespace bolt {
 
 class BinaryFunction;
+class BoltAddressTranslation;
 
 /// Information on loadable part of the file.
 struct SegmentInfo {
@@ -280,6 +281,8 @@ class BinaryContext {
   /// Internal helper for removing section name from a lookup table.
   void deregisterSectionName(const BinarySection &Section);
 
+  BoltAddressTranslation *BAT = nullptr;
+
 public:
   static Expected<std::unique_ptr<BinaryContext>> createBinaryContext(
       Triple TheTriple, std::shared_ptr<orc::SymbolStringPool> SSP,
@@ -1559,6 +1562,10 @@ class BinaryContext {
     return *IOAddressMap;
   }
 
+  void setBAT(BoltAddressTranslation *B) { BAT = B; }
+
+  BoltAddressTranslation *getBAT() const { return BAT; }
+
   raw_ostream &outs() const { return Logger.Out; }
 
   raw_ostream &errs() const { return Logger.Err; }
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index fcc578f35e322..54420951fb6a8 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -131,6 +131,15 @@ class BoltAddressTranslation {
   translateSymbol(const BinaryContext &BC, const MCSymbol &Symbol,
                   uint32_t InputOffset) const;
 
+  // Updates the BAT BBHashMap for a list of newly inserted \p BBs by
+  // mapping the InputOffset of each new block to the Index and Hash of the
+  // \p OriginBB. This ensures that new blocks generated by optimizations (like
+  // ICP) are treated as part of the original block for address translation
+  // purposes.
+  void updateInsertedBBHashMap(
+      const BinaryBasicBlock &OriginBB,
+      const std::vector<std::unique_ptr<BinaryBasicBlock>> &BBs);
+
 private:
   /// Helper to update \p Map by inserting one or more BAT entries reflecting
   /// \p BB for function located at \p FuncAddress. At least one entry will be
diff --git a/bolt/lib/Passes/IndirectCallPromotion.cpp b/bolt/lib/Passes/IndirectCallPromotion.cpp
index 8a01cb974c5da..48e85ade3e8a3 100644
--- a/bolt/lib/Passes/IndirectCallPromotion.cpp
+++ b/bolt/lib/Passes/IndirectCallPromotion.cpp
@@ -34,6 +34,7 @@ extern cl::OptionCategory BoltOptCategory;
 extern cl::opt<IndirectCallPromotionType> ICP;
 extern cl::opt<unsigned> Verbosity;
 extern cl::opt<unsigned> ExecutionCountThreshold;
+extern cl::opt<bool> EnableBAT;
 
 static cl::opt<unsigned> ICPJTRemainingPercentThreshold(
     "icp-jt-remaining-percent-threshold",
@@ -1401,6 +1402,8 @@ Error IndirectCallPromotion::runOnFunctions(BinaryContext &BC) {
         std::vector<std::unique_ptr<BinaryBasicBlock>> NewBBs =
             rewriteCall(*BB, Inst, std::move(ICPcode), MethodInfo.second);
 
+        if (opts::EnableBAT)
+          BC.getBAT()->updateInsertedBBHashMap(*BB, NewBBs);
         // Fix the CFG after inserting the new basic blocks.
         BinaryBasicBlock *MergeBlock =
             fixCFG(*BB, IsTailCall, IsJumpTable, std::move(NewBBs), Targets);
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index 7ad4e6a2e1411..a917a220e61a7 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -8,6 +8,7 @@
 
 #include "bolt/Profile/BoltAddressTranslation.h"
 #include "bolt/Core/BinaryFunction.h"
+#include "bolt/Utils/CommandLineOpts.h"
 #include "llvm/ADT/APInt.h"
 #include "llvm/Support/Errc.h"
 #include "llvm/Support/Error.h"
@@ -637,5 +638,22 @@ BoltAddressTranslation::translateSymbol(const BinaryContext &BC,
   return std::pair(ParentBF, SecondaryEntryId);
 }
 
+void BoltAddressTranslation::updateInsertedBBHashMap(
+    const BinaryBasicBlock &OriginBB,
+    const std::vector<std::unique_ptr<BinaryBasicBlock>> &BBs) {
+  if (!opts::EnableBAT || !OriginBB.getFunction() || !OriginBB.getHash())
+    return;
+
+  auto &BBHashMap = getBBHashMap(OriginBB.getFunction()->getAddress());
+
+  for (const auto &BB : BBs) {
+    if (!BB)
+      continue;
+    // Update BBHashMap using the original BB's index and hash.
+    BBHashMap.addEntry(BB->getInputOffset(), OriginBB.getIndex(),
+                       OriginBB.getHash());
+  }
+}
+
 } // namespace bolt
 } // namespace llvm
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 067a3e6636f0b..67620f03a9c8e 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -447,6 +447,7 @@ RewriteInstance::RewriteInstance(ELFObjectFileBase *File, const int Argc,
                           BC->MII.get(), BC->MRI.get(), BC->STI.get())));
 
   BAT = std::make_unique<BoltAddressTranslation>();
+  BC->setBAT(BAT.get());
 
   if (opts::UpdateDebugSections)
     DebugInfoRewriter = std::make_unique<DWARFRewriter>(*BC);
diff --git a/bolt/test/X86/icp-inline.s b/bolt/test/X86/icp-inline.s
index c5106db5a5389..9c2802ef9730c 100644
--- a/bolt/test/X86/icp-inline.s
+++ b/bolt/test/X86/icp-inline.s
@@ -36,6 +36,17 @@
 # CHECK-ICP-INLINE:     callq *(%rcx,%rax,8)
 # CHECK-ICP-INLINE-NOT: callq bar
 # CHECK-ICP-INLINE:     End of Function "main"
+
+# Checks BB hash has been updated correctly after ICP optimization
+# RUN: llvm-bolt %t.exe --icp=calls --icp-calls-topn=1 --inline-small-functions\
+# RUN:   -o %t.null --lite=0 \
+# RUN:   --inline-small-functions-bytes=4 --icp-inline --print-icp \
+# RUN:   --data %t.fdata --enable-bat
+# RUN: llvm-bat-dump --dump-all %t.null | FileCheck %s --check-prefix=CHECK-BB-HASH
+# CHECK-BB-HASH: 0x17 -> 0xd hash: 0x5b24d9a60000
+# CHECK-BB-HASH: 0x1d -> 0xd hash: 0x5b24d9a60000
+# CHECK-BB-HASH: 0x1f -> 0xd hash: 0x5b24d9a60000
+
   .globl bar
 bar:
 	.cfi_startproc

@Jinjie-Huang
Copy link
Contributor Author

Jinjie-Huang commented Dec 8, 2025

Some background & related PRs:

  1. Fix incorrect BB index mapping: PR (WIP)
  2. Skip mapping inserted BBs in BAT as a fallback: PR (WIP)

I believe the root cause behind both of these issues is actually the reason mentioned in this PR. In scenarios where new BBs are inserted, in addition to this PR, the first PR mentioned above is also required to fix the issue.

@Jinjie-Huang Jinjie-Huang force-pushed the bolt_icp_update_hash_for_new_bbs branch from cabf537 to b6c62b6 Compare December 10, 2025 12:47
if (!BB)
continue;
// Update BB and BBHashMap using the original BB's index and hash.
BB->setHash(OriginBB.getHash());
Copy link
Contributor Author

@Jinjie-Huang Jinjie-Huang Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update inserted BBs' information as well, since they may be used as OriginBB later.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating the issue. Generally I agree we should be associating the profile of ICP-created blocks with the original block containing the indirect call.
In terms of implementation, I think explicitly inserting the blocks into BBHashMap is not the right approach, as we'd need to do that in any optimization that copies or inserts blocks. Instead, since we associate the input address of the original indirect call with created blocks, let's use it to look up the block hash.

@Jinjie-Huang
Copy link
Contributor Author

Jinjie-Huang commented Dec 13, 2025

@aaupov HI, thanks for the suggestion!

There's a subtle detail here: in ICP, the InputOffset for inserted BBs is derived from the original call instruction by default rather than the entry(code). This seems to make it difficult in BoltAddressTranslation::write() to look up the original BB's hash/index using the inserted/copied BB's InputOffset.
So, I'd like to propose that we could directly reuse the InputOffset of the original BB for any inserted/copied BBs(not sure how other passes handle this, but perhaps they should all be consistent). This would be also natively compatible with the final BAT writing process, and the hash/index of the original BB will then be automatically looked up in the BBHashMap, WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants