Skip to content

Conversation

wdx727
Copy link

@wdx727 wdx727 commented Sep 25, 2025

We have optimized the implementation of introducing the "matching and inference" technique into Propeller. In this new implementation, we have made every effort to avoid introducing new compilation parameters while ensuring compatibility with Propeller's current usage. Instead of creating a new profile format, we reused the existing one employed by Propeller. This new implementation is fully compatible with Propeller's current usage patterns and reduces the amount of code changes. For detailed information, please refer to the following RFC: https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.
We plan to submit the relevant changes in several pull requests (PRs). The current one is the first PR, which adds the basic block hash to the SHT_LLVM_BB_ADDR_MAP section.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-llvm-mc
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-objectyaml

@llvm/pr-subscribers-pgo

Author: None (wdx727)

Changes

We have optimized the implementation of introducing the "matching and inference" technique into Propeller. In this new implementation, we have made every effort to avoid introducing new compilation parameters while ensuring compatibility with Propeller's current usage. Instead of creating a new profile format, we reused the existing one employed by Propeller. This new implementation is fully compatible with Propeller's current usage patterns and reduces the amount of code changes. For detailed information, please refer to the following RFC: https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.


Patch is 59.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160706.diff

24 Files Affected:

  • (added) llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h (+50)
  • (modified) llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h (+37)
  • (added) llvm/include/llvm/CodeGen/MachineBlockHashInfo.h (+106)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (+7)
  • (modified) llvm/include/llvm/InitializePasses.h (+2)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+9-6)
  • (modified) llvm/include/llvm/ObjectYAML/ELFYAML.h (+1)
  • (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (+16)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+13-1)
  • (added) llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp (+168)
  • (modified) llvm/lib/CodeGen/BasicBlockSections.cpp (+82-3)
  • (modified) llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp (+91)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (+2)
  • (added) llvm/lib/CodeGen/MachineBlockHashInfo.cpp (+111)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (+16-1)
  • (modified) llvm/lib/Object/ELF.cpp (+2-1)
  • (modified) llvm/lib/ObjectYAML/ELFEmitter.cpp (+3)
  • (modified) llvm/lib/ObjectYAML/ELFYAML.cpp (+1)
  • (added) llvm/test/CodeGen/X86/basic-block-address-map-with-bb-hash.ll (+86)
  • (added) llvm/test/CodeGen/X86/basic-block-sections-clusters-with-match-infer.ll (+90)
  • (modified) llvm/test/tools/obj2yaml/ELF/bb-addr-map-pgo-analysis-map.yaml (+5)
  • (modified) llvm/test/tools/obj2yaml/ELF/bb-addr-map.yaml (+5)
  • (modified) llvm/unittests/Object/ELFObjectFileTest.cpp (+118)
  • (modified) llvm/unittests/Object/ELFTypesTest.cpp (+2-2)
diff --git a/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h b/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h
new file mode 100644
index 0000000000000..66209d7685ecc
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h
@@ -0,0 +1,50 @@
+#ifndef LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
+#define LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
+
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/Transforms/Utils/SampleProfileInference.h"
+
+namespace llvm {
+
+class BasicBlockMatchingAndInference : public MachineFunctionPass {
+private:
+  using Edge = std::pair<const MachineBasicBlock *, const MachineBasicBlock *>;
+  using BlockWeightMap = DenseMap<const MachineBasicBlock *, uint64_t>;
+  using EdgeWeightMap = DenseMap<Edge, uint64_t>;
+  using BlockEdgeMap = DenseMap<const MachineBasicBlock *,
+                                SmallVector<const MachineBasicBlock *, 8>>;
+
+  struct WeightInfo {
+    // Weight of basic blocks.
+    BlockWeightMap BlockWeights;
+    // Weight of edges.
+    EdgeWeightMap EdgeWeights;
+  };
+
+public:
+  static char ID;
+  BasicBlockMatchingAndInference();
+
+  StringRef getPassName() const override {
+    return "Basic Block Matching and Inference";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  bool runOnMachineFunction(MachineFunction &F) override;
+
+  std::optional<WeightInfo> getWeightInfo(StringRef FuncName) const;
+
+private:
+  StringMap<WeightInfo> ProgramWeightInfo;
+
+  WeightInfo initWeightInfoByMatching(MachineFunction &MF);
+
+  void generateWeightInfoByInference(MachineFunction &MF,
+                                     WeightInfo &MatchWeight);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
diff --git a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
index 08e6a0e3ef629..a27b921fb1205 100644
--- a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
+++ b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
@@ -31,6 +31,22 @@
 
 namespace llvm {
 
+using Edge = std::pair<uint64_t, uint64_t>;
+using BlockWeightMap = DenseMap<uint64_t, uint64_t>;
+using EdgeWeightMap = DenseMap<Edge, uint64_t>;
+using BlockHashMap = DenseMap<uint64_t, uint64_t>;
+
+// This represents the weights of basic blocks and edges, and the hashed of 
+// basic blocks for one function.
+struct WeightAndHashInfo {
+  // Weight of basic blocks.
+  BlockWeightMap BlockWeights;
+  // Weight of edges.
+  EdgeWeightMap EdgeWeights;
+  // Hashes of basic blocks.
+  BlockHashMap BlockHashes;
+};
+
 // This struct represents the cluster information for a machine basic block,
 // which is specifed by a unique ID (`MachineBasicBlock::BBID`).
 struct BBClusterInfo {
@@ -98,6 +114,10 @@ class BasicBlockSectionsProfileReader {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
+  // Returns the weight and hash info for the given function.
+  std::pair<bool, WeightAndHashInfo>
+  getWeightAndHashInfoForFunction(StringRef FuncName) const;
+
 private:
   StringRef getAliasName(StringRef FuncName) const {
     auto R = FuncAliasMap.find(FuncName);
@@ -118,6 +138,16 @@ class BasicBlockSectionsProfileReader {
   // positive integer.
   Expected<UniqueBBID> parseUniqueBBID(StringRef S) const;
 
+  // Parses the weight of basic block and edgs.
+  Error parseWight(StringRef S, BlockWeightMap &BlockWeights, 
+                                EdgeWeightMap &EdgeWeights);
+
+  // Parses the hash of basic block.
+  Error parseBBHash(StringRef S, BlockHashMap &BlockHashes);
+
+  // Parse a pair in the form of "xxx:xxx"
+  Expected<std::pair<uint64_t, uint64_t>> parsePairItem(StringRef S) const;
+
   // Reads the basic block sections profile for functions in this module.
   Error ReadProfile();
 
@@ -146,6 +176,10 @@ class BasicBlockSectionsProfileReader {
   // block in that cluster.
   StringMap<FunctionPathAndClusterInfo> ProgramPathAndClusterInfo;
 
+  // This contains the weights of basic blocks and edges, and the hashes of 
+  // basic blocks of the whole program.
+  StringMap<WeightAndHashInfo> ProgramWeightAndHashInfo;
+
   // Some functions have alias names. We use this map to find the main alias
   // name which appears in ProgramPathAndClusterInfo as a key.
   StringMap<StringRef> FuncAliasMap;
@@ -204,6 +238,9 @@ class BasicBlockSectionsProfileReaderWrapperPass : public ImmutablePass {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
+  std::pair<bool, WeightAndHashInfo>
+  getWeightAndHashInfoForFunction(StringRef FuncName) const;
+
   // Initializes the FunctionNameToDIFilename map for the current module and
   // then reads the profile for the matching functions.
   bool doInitialization(Module &M) override;
diff --git a/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h b/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h
new file mode 100644
index 0000000000000..5de1b567e0309
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h
@@ -0,0 +1,106 @@
+#ifndef LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
+#define LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
+
+#include "llvm/CodeGen/MachineFunctionPass.h"
+
+namespace llvm {
+
+/// An object wrapping several components of a basic block hash. The combined
+/// (blended) hash is represented and stored as one uint64_t, while individual
+/// components are of smaller size (e.g., uint16_t or uint8_t).
+struct BlendedBlockHash {
+private:
+  static uint64_t combineHashes(uint16_t Hash1, uint16_t Hash2, uint16_t Hash3,
+                                uint16_t Hash4) {
+    uint64_t Hash = 0;
+
+    Hash |= uint64_t(Hash4);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash3);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash2);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash1);
+
+    return Hash;
+  }
+
+  static void parseHashes(uint64_t Hash, uint16_t &Hash1, uint16_t &Hash2,
+                          uint16_t &Hash3, uint16_t &Hash4) {
+    Hash1 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash2 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash3 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash4 = Hash & 0xffff;
+    Hash >>= 16;
+  }
+
+public:
+  explicit BlendedBlockHash() {}
+
+  explicit BlendedBlockHash(uint64_t CombinedHash) {
+    parseHashes(CombinedHash, Offset, OpcodeHash, InstrHash, NeighborHash);
+  }
+
+  /// Combine the blended hash into uint64_t.
+  uint64_t combine() const {
+    return combineHashes(Offset, OpcodeHash, InstrHash, NeighborHash);
+  }
+
+  /// Compute a distance between two given blended hashes. The smaller the
+  /// distance, the more similar two blocks are. For identical basic blocks,
+  /// the distance is zero.
+  uint64_t distance(const BlendedBlockHash &BBH) const {
+    assert(OpcodeHash == BBH.OpcodeHash &&
+           "incorrect blended hash distance computation");
+    uint64_t Dist = 0;
+    // Account for NeighborHash
+    Dist += NeighborHash == BBH.NeighborHash ? 0 : 1;
+    Dist <<= 16;
+    // Account for InstrHash
+    Dist += InstrHash == BBH.InstrHash ? 0 : 1;
+    Dist <<= 16;
+    // Account for Offset
+    Dist += (Offset >= BBH.Offset ? Offset - BBH.Offset : BBH.Offset - Offset);
+    return Dist;
+  }
+
+  /// The offset of the basic block from the function start.
+  uint16_t Offset{0};
+  /// (Loose) Hash of the basic block instructions, excluding operands.
+  uint16_t OpcodeHash{0};
+  /// (Strong) Hash of the basic block instructions, including opcodes and
+  /// operands.
+  uint16_t InstrHash{0};
+  /// Hash of the (loose) basic block together with (loose) hashes of its
+  /// successors and predecessors.
+  uint16_t NeighborHash{0};
+};
+
+class MachineBlockHashInfo : public MachineFunctionPass {
+  DenseMap<unsigned, uint64_t> MBBHashInfo;
+
+public:
+  static char ID;
+  MachineBlockHashInfo();
+
+  StringRef getPassName() const override { return "Basic Block Hash Compute"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  bool runOnMachineFunction(MachineFunction &F) override;
+
+  uint64_t getMBBHash(const MachineBasicBlock &MBB);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index d214ab9306c2f..063dd43e80638 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -67,6 +67,13 @@ namespace llvm {
 
   MachineFunctionPass *createBasicBlockPathCloningPass();
 
+  /// createBasicBlockMatchingAndInferencePass - This pass enables matching
+  /// and inference when using propeller.
+  MachineFunctionPass *createBasicBlockMatchingAndInferencePass();
+
+  /// createMachineBlockHashInfoPass - This pass computes basic block hashes.
+  MachineFunctionPass *createMachineBlockHashInfoPass();
+
   /// createMachineFunctionSplitterPass - This pass splits machine functions
   /// using profile information.
   MachineFunctionPass *createMachineFunctionSplitterPass();
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 1ce36a95317b4..3172b135426f6 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -53,6 +53,7 @@ void initializeAlwaysInlinerLegacyPassPass(PassRegistry &);
 void initializeAssignmentTrackingAnalysisPass(PassRegistry &);
 void initializeAssumptionCacheTrackerPass(PassRegistry &);
 void initializeAtomicExpandLegacyPass(PassRegistry &);
+void initializeBasicBlockMatchingAndInferencePass(PassRegistry &);
 void initializeBasicBlockPathCloningPass(PassRegistry &);
 void initializeBasicBlockSectionsProfileReaderWrapperPassPass(PassRegistry &);
 void initializeBasicBlockSectionsPass(PassRegistry &);
@@ -185,6 +186,7 @@ void initializeMIRCanonicalizerPass(PassRegistry &);
 void initializeMIRNamerPass(PassRegistry &);
 void initializeMIRPrintingPassPass(PassRegistry &);
 void initializeMachineBlockFrequencyInfoWrapperPassPass(PassRegistry &);
+void initializeMachineBlockHashInfoPass(PassRegistry&);
 void initializeMachineBlockPlacementLegacyPass(PassRegistry &);
 void initializeMachineBlockPlacementStatsLegacyPass(PassRegistry &);
 void initializeMachineBranchProbabilityInfoWrapperPassPass(PassRegistry &);
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 87e4dbe448091..bbf07d87bb318 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -831,6 +831,7 @@ struct BBAddrMap {
     bool BrProb : 1;
     bool MultiBBRange : 1;
     bool OmitBBEntries : 1;
+    bool BBHash : 1;
 
     bool hasPGOAnalysis() const { return FuncEntryCount || BBFreq || BrProb; }
 
@@ -842,7 +843,8 @@ struct BBAddrMap {
              (static_cast<uint8_t>(BBFreq) << 1) |
              (static_cast<uint8_t>(BrProb) << 2) |
              (static_cast<uint8_t>(MultiBBRange) << 3) |
-             (static_cast<uint8_t>(OmitBBEntries) << 4);
+             (static_cast<uint8_t>(OmitBBEntries) << 4) | 
+             (static_cast<uint8_t>(BBHash) << 5);
     }
 
     // Decodes from minimum bit width representation and validates no
@@ -851,7 +853,7 @@ struct BBAddrMap {
       Features Feat{
           static_cast<bool>(Val & (1 << 0)), static_cast<bool>(Val & (1 << 1)),
           static_cast<bool>(Val & (1 << 2)), static_cast<bool>(Val & (1 << 3)),
-          static_cast<bool>(Val & (1 << 4))};
+          static_cast<bool>(Val & (1 << 4)), static_cast<bool>(Val & (1 << 5))};
       if (Feat.encode() != Val)
         return createStringError(
             std::error_code(), "invalid encoding for BBAddrMap::Features: 0x%x",
@@ -861,9 +863,9 @@ struct BBAddrMap {
 
     bool operator==(const Features &Other) const {
       return std::tie(FuncEntryCount, BBFreq, BrProb, MultiBBRange,
-                      OmitBBEntries) ==
+                      OmitBBEntries, BBHash) ==
              std::tie(Other.FuncEntryCount, Other.BBFreq, Other.BrProb,
-                      Other.MultiBBRange, Other.OmitBBEntries);
+                      Other.MultiBBRange, Other.OmitBBEntries, Other.BBHash);
     }
   };
 
@@ -914,9 +916,10 @@ struct BBAddrMap {
     uint32_t Size = 0;   // Size of the basic block.
     Metadata MD = {false, false, false, false,
                    false}; // Metdata for this basic block.
+    uint64_t Hash = 0;     // Hash for this basic block.
 
-    BBEntry(uint32_t ID, uint32_t Offset, uint32_t Size, Metadata MD)
-        : ID(ID), Offset(Offset), Size(Size), MD(MD){};
+    BBEntry(uint32_t ID, uint32_t Offset, uint32_t Size, Metadata MD, uint64_t Hash = 0)
+        : ID(ID), Offset(Offset), Size(Size), MD(MD), Hash(Hash){};
 
     bool operator==(const BBEntry &Other) const {
       return ID == Other.ID && Offset == Other.Offset && Size == Other.Size &&
diff --git a/llvm/include/llvm/ObjectYAML/ELFYAML.h b/llvm/include/llvm/ObjectYAML/ELFYAML.h
index dfdfa055d65fa..9427042db4303 100644
--- a/llvm/include/llvm/ObjectYAML/ELFYAML.h
+++ b/llvm/include/llvm/ObjectYAML/ELFYAML.h
@@ -162,6 +162,7 @@ struct BBAddrMapEntry {
     llvm::yaml::Hex64 AddressOffset;
     llvm::yaml::Hex64 Size;
     llvm::yaml::Hex64 Metadata;
+    llvm::yaml::Hex64 Hash;
   };
   uint8_t Version;
   llvm::yaml::Hex8 Feature;
diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index 7231e45fe8eb7..2b4db171bfdfb 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -130,6 +130,11 @@ template <typename FT> class SampleProfileInference {
   SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,
                          BlockWeightMap &SampleBlockWeights)
       : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights) {}
+  SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,
+                         BlockWeightMap &SampleBlockWeights,
+                         EdgeWeightMap &SampleEdgeWeights)
+      : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights), 
+        SampleEdgeWeights(SampleEdgeWeights) {}
 
   /// Apply the profile inference algorithm for a given function
   void apply(BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);
@@ -157,6 +162,9 @@ template <typename FT> class SampleProfileInference {
 
   /// Map basic blocks to their sampled weights.
   BlockWeightMap &SampleBlockWeights;
+
+  /// Map edges to their sampled weights.
+  EdgeWeightMap SampleEdgeWeights;
 };
 
 template <typename BT>
@@ -266,6 +274,14 @@ FlowFunction SampleProfileInference<BT>::createFlowFunction(
       FlowJump Jump;
       Jump.Source = BlockIndex[BB];
       Jump.Target = BlockIndex[Succ];
+      auto It = SampleEdgeWeights.find(std::make_pair(BB, Succ));
+      if (It != SampleEdgeWeights.end()) {
+        Jump.HasUnknownWeight = false;
+        Jump.Weight = It->second;
+      } else {
+        Jump.HasUnknownWeight = true;
+        Jump.Weight = 0;
+      }
       Func.Jumps.push_back(Jump);
     }
   }
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index bdcd54a135da9..41c084a4e4e49 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -40,6 +40,7 @@
 #include "llvm/CodeGen/GCMetadataPrinter.h"
 #include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockHashInfo.h"
 #include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineDominators.h"
@@ -180,6 +181,8 @@ static cl::opt<bool> PrintLatency(
     cl::desc("Print instruction latencies as verbose asm comments"), cl::Hidden,
     cl::init(false));
 
+extern cl::opt<bool> EmitBBHash;
+
 STATISTIC(EmittedInsts, "Number of machine instrs printed");
 
 char AsmPrinter::ID = 0;
@@ -454,6 +457,8 @@ void AsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequired<GCModuleInfo>();
   AU.addRequired<LazyMachineBlockFrequencyInfoPass>();
   AU.addRequired<MachineBranchProbabilityInfoWrapperPass>();
+  if (EmitBBHash)
+    AU.addRequired<MachineBlockHashInfo>();
 }
 
 bool AsmPrinter::doInitialization(Module &M) {
@@ -1419,7 +1424,8 @@ getBBAddrMapFeature(const MachineFunction &MF, int NumMBBSectionRanges) {
   }
   return {FuncEntryCountEnabled, BBFreqEnabled, BrProbEnabled,
           MF.hasBBSections() && NumMBBSectionRanges > 1,
-          static_cast<bool>(BBAddrMapSkipEmitBBEntries)};
+          static_cast<bool>(BBAddrMapSkipEmitBBEntries),
+          static_cast<bool>(EmitBBHash)};
 }
 
 void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
@@ -1477,6 +1483,8 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
       PrevMBBEndSymbol = MBBSymbol;
     }
 
+    auto MBHI = Features.BBHash ? &getAnalysis<MachineBlockHashInfo>() : nullptr;
+
     if (!Features.OmitBBEntries) {
       // TODO: Remove this check when version 1 is deprecated.
       if (BBAddrMapVersion > 1) {
@@ -1496,6 +1504,10 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
       emitLabelDifferenceAsULEB128(MBB.getEndSymbol(), MBBSymbol);
       // Emit the Metadata.
       OutStreamer->emitULEB128IntValue(getBBAddrMapMetadata(MBB));
+      // Emit the Hash.
+      if (MBHI) {
+        OutStreamer->emitULEB128IntValue(MBHI->getMBBHash(MBB));
+      }
     }
 
     PrevMBBEndSymbol = MBB.getEndSymbol();
diff --git a/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp b/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp
new file mode 100644
index 0000000000000..e2776162043ff
--- /dev/null
+++ b/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp
@@ -0,0 +1,168 @@
+#include "llvm/CodeGen/BasicBlockMatchingAndInference.h"
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
+#include "llvm/CodeGen/MachineBlockHashInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/InitializePasses.h"
+#include <llvm/Support/CommandLine.h>
+
+using namespace llvm;
+
+static cl::opt<float>
+    PropellerInferThreshold("propeller-infer-threshold",
+                            cl::desc("Threshold for infer stale profile"),
+                            cl::init(0.6), cl::Optional);
+
+/// The object is used to identify and match basic blocks given their hashes.
+class StaleMatcher {
+public:
+  /// Initialize stale matcher.
+  void init(const std::vector<MachineBasicBlock *> &Blocks,
+            const std::vector<BlendedBlockHash> &Hashes) {
+    assert(Blocks.size() == Hashes.size() &&
+           "incorrect matcher initialization");
+    for (size_t I = 0; I < Blocks.size(); I++) {
+      MachineBasicBlock *Block = Blocks[I];
+      uint16_t OpHash = Hashes[I].OpcodeHash;
+      OpHashToBlocks[OpHash].push_back(std::make_pair(Hashes[I], Block));
+    }
+  }
+
+  /// Find the most similar block for a given hash.
+  MachineBasicBlock *matchBlock(BlendedBlockHash BlendedHash) const {
+    auto BlockIt = OpHashToBlocks.find(BlendedHash.OpcodeHash);
+    if (BlockIt == OpHashToBlocks.end()) {
+      return nullptr;
+    }
+    MachineBasicBlock *BestBlock = nullptr;
+    uint64_t BestDist = std::numeric_limits<uint64_t>::max();
+    for (auto It : BlockIt->second) {
+      MachineBasicBlock *Block = It.second;
+      BlendedBlockHash Hash = It.first;
+      uint64_t Dist = Hash.distance(BlendedHash);
+      if (BestBlock == nullptr || Dist < BestDist) {
+        BestDist = Dist;
+        BestBlock = Block;
+      }
+    }
+    return BestBlock;
+  }
+
+private:
+  using HashBlockPairType = std::pair<BlendedBlockHash, MachineBasicBlock *>;
+  std::unordered_map<uint16_t, std::vector<HashBlockPairType>> OpHashToBlocks;
+};
+
+INITIALIZE_PASS_BEGIN(BasicBlockMatchingAndInference,
+                      "machine-block-match-infer",
+                      "Machine Block Matching and Inference Analysis", true,
+                      true)
+INITIALIZE_PASS_DEPENDENCY(MachineBlockHashInfo)
+INITIALIZE_PASS_DEPENDENCY(BasicBlockSectionsProfileReaderWrapperPass)
+INITIALIZE_PASS_END(BasicBlockMatchingAndInference, "machine-block-match-infer",
+                    "Machine Block Matching and Inference Analysis", true, true)
+
+char BasicBlockMatch...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Sep 25, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: None (wdx727)

Changes

We have optimized the implementation of introducing the "matching and inference" technique into Propeller. In this new implementation, we have made every effort to avoid introducing new compilation parameters while ensuring compatibility with Propeller's current usage. Instead of creating a new profile format, we reused the existing one employed by Propeller. This new implementation is fully compatible with Propeller's current usage patterns and reduces the amount of code changes. For detailed information, please refer to the following RFC: https://discourse.llvm.org/t/rfc-adding-matching-and-inference-functionality-to-propeller/86238.


Patch is 59.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/160706.diff

24 Files Affected:

  • (added) llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h (+50)
  • (modified) llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h (+37)
  • (added) llvm/include/llvm/CodeGen/MachineBlockHashInfo.h (+106)
  • (modified) llvm/include/llvm/CodeGen/Passes.h (+7)
  • (modified) llvm/include/llvm/InitializePasses.h (+2)
  • (modified) llvm/include/llvm/Object/ELFTypes.h (+9-6)
  • (modified) llvm/include/llvm/ObjectYAML/ELFYAML.h (+1)
  • (modified) llvm/include/llvm/Transforms/Utils/SampleProfileInference.h (+16)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+13-1)
  • (added) llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp (+168)
  • (modified) llvm/lib/CodeGen/BasicBlockSections.cpp (+82-3)
  • (modified) llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp (+91)
  • (modified) llvm/lib/CodeGen/CMakeLists.txt (+2)
  • (added) llvm/lib/CodeGen/MachineBlockHashInfo.cpp (+111)
  • (modified) llvm/lib/CodeGen/TargetPassConfig.cpp (+16-1)
  • (modified) llvm/lib/Object/ELF.cpp (+2-1)
  • (modified) llvm/lib/ObjectYAML/ELFEmitter.cpp (+3)
  • (modified) llvm/lib/ObjectYAML/ELFYAML.cpp (+1)
  • (added) llvm/test/CodeGen/X86/basic-block-address-map-with-bb-hash.ll (+86)
  • (added) llvm/test/CodeGen/X86/basic-block-sections-clusters-with-match-infer.ll (+90)
  • (modified) llvm/test/tools/obj2yaml/ELF/bb-addr-map-pgo-analysis-map.yaml (+5)
  • (modified) llvm/test/tools/obj2yaml/ELF/bb-addr-map.yaml (+5)
  • (modified) llvm/unittests/Object/ELFObjectFileTest.cpp (+118)
  • (modified) llvm/unittests/Object/ELFTypesTest.cpp (+2-2)
diff --git a/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h b/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h
new file mode 100644
index 0000000000000..66209d7685ecc
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/BasicBlockMatchingAndInference.h
@@ -0,0 +1,50 @@
+#ifndef LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
+#define LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
+
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/Transforms/Utils/SampleProfileInference.h"
+
+namespace llvm {
+
+class BasicBlockMatchingAndInference : public MachineFunctionPass {
+private:
+  using Edge = std::pair<const MachineBasicBlock *, const MachineBasicBlock *>;
+  using BlockWeightMap = DenseMap<const MachineBasicBlock *, uint64_t>;
+  using EdgeWeightMap = DenseMap<Edge, uint64_t>;
+  using BlockEdgeMap = DenseMap<const MachineBasicBlock *,
+                                SmallVector<const MachineBasicBlock *, 8>>;
+
+  struct WeightInfo {
+    // Weight of basic blocks.
+    BlockWeightMap BlockWeights;
+    // Weight of edges.
+    EdgeWeightMap EdgeWeights;
+  };
+
+public:
+  static char ID;
+  BasicBlockMatchingAndInference();
+
+  StringRef getPassName() const override {
+    return "Basic Block Matching and Inference";
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  bool runOnMachineFunction(MachineFunction &F) override;
+
+  std::optional<WeightInfo> getWeightInfo(StringRef FuncName) const;
+
+private:
+  StringMap<WeightInfo> ProgramWeightInfo;
+
+  WeightInfo initWeightInfoByMatching(MachineFunction &MF);
+
+  void generateWeightInfoByInference(MachineFunction &MF,
+                                     WeightInfo &MatchWeight);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGEN_BASIC_BLOCK_AND_INFERENCE_H
diff --git a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
index 08e6a0e3ef629..a27b921fb1205 100644
--- a/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
+++ b/llvm/include/llvm/CodeGen/BasicBlockSectionsProfileReader.h
@@ -31,6 +31,22 @@
 
 namespace llvm {
 
+using Edge = std::pair<uint64_t, uint64_t>;
+using BlockWeightMap = DenseMap<uint64_t, uint64_t>;
+using EdgeWeightMap = DenseMap<Edge, uint64_t>;
+using BlockHashMap = DenseMap<uint64_t, uint64_t>;
+
+// This represents the weights of basic blocks and edges, and the hashed of 
+// basic blocks for one function.
+struct WeightAndHashInfo {
+  // Weight of basic blocks.
+  BlockWeightMap BlockWeights;
+  // Weight of edges.
+  EdgeWeightMap EdgeWeights;
+  // Hashes of basic blocks.
+  BlockHashMap BlockHashes;
+};
+
 // This struct represents the cluster information for a machine basic block,
 // which is specifed by a unique ID (`MachineBasicBlock::BBID`).
 struct BBClusterInfo {
@@ -98,6 +114,10 @@ class BasicBlockSectionsProfileReader {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
+  // Returns the weight and hash info for the given function.
+  std::pair<bool, WeightAndHashInfo>
+  getWeightAndHashInfoForFunction(StringRef FuncName) const;
+
 private:
   StringRef getAliasName(StringRef FuncName) const {
     auto R = FuncAliasMap.find(FuncName);
@@ -118,6 +138,16 @@ class BasicBlockSectionsProfileReader {
   // positive integer.
   Expected<UniqueBBID> parseUniqueBBID(StringRef S) const;
 
+  // Parses the weight of basic block and edgs.
+  Error parseWight(StringRef S, BlockWeightMap &BlockWeights, 
+                                EdgeWeightMap &EdgeWeights);
+
+  // Parses the hash of basic block.
+  Error parseBBHash(StringRef S, BlockHashMap &BlockHashes);
+
+  // Parse a pair in the form of "xxx:xxx"
+  Expected<std::pair<uint64_t, uint64_t>> parsePairItem(StringRef S) const;
+
   // Reads the basic block sections profile for functions in this module.
   Error ReadProfile();
 
@@ -146,6 +176,10 @@ class BasicBlockSectionsProfileReader {
   // block in that cluster.
   StringMap<FunctionPathAndClusterInfo> ProgramPathAndClusterInfo;
 
+  // This contains the weights of basic blocks and edges, and the hashes of 
+  // basic blocks of the whole program.
+  StringMap<WeightAndHashInfo> ProgramWeightAndHashInfo;
+
   // Some functions have alias names. We use this map to find the main alias
   // name which appears in ProgramPathAndClusterInfo as a key.
   StringMap<StringRef> FuncAliasMap;
@@ -204,6 +238,9 @@ class BasicBlockSectionsProfileReaderWrapperPass : public ImmutablePass {
   SmallVector<SmallVector<unsigned>>
   getClonePathsForFunction(StringRef FuncName) const;
 
+  std::pair<bool, WeightAndHashInfo>
+  getWeightAndHashInfoForFunction(StringRef FuncName) const;
+
   // Initializes the FunctionNameToDIFilename map for the current module and
   // then reads the profile for the matching functions.
   bool doInitialization(Module &M) override;
diff --git a/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h b/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h
new file mode 100644
index 0000000000000..5de1b567e0309
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/MachineBlockHashInfo.h
@@ -0,0 +1,106 @@
+#ifndef LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
+#define LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
+
+#include "llvm/CodeGen/MachineFunctionPass.h"
+
+namespace llvm {
+
+/// An object wrapping several components of a basic block hash. The combined
+/// (blended) hash is represented and stored as one uint64_t, while individual
+/// components are of smaller size (e.g., uint16_t or uint8_t).
+struct BlendedBlockHash {
+private:
+  static uint64_t combineHashes(uint16_t Hash1, uint16_t Hash2, uint16_t Hash3,
+                                uint16_t Hash4) {
+    uint64_t Hash = 0;
+
+    Hash |= uint64_t(Hash4);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash3);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash2);
+    Hash <<= 16;
+
+    Hash |= uint64_t(Hash1);
+
+    return Hash;
+  }
+
+  static void parseHashes(uint64_t Hash, uint16_t &Hash1, uint16_t &Hash2,
+                          uint16_t &Hash3, uint16_t &Hash4) {
+    Hash1 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash2 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash3 = Hash & 0xffff;
+    Hash >>= 16;
+
+    Hash4 = Hash & 0xffff;
+    Hash >>= 16;
+  }
+
+public:
+  explicit BlendedBlockHash() {}
+
+  explicit BlendedBlockHash(uint64_t CombinedHash) {
+    parseHashes(CombinedHash, Offset, OpcodeHash, InstrHash, NeighborHash);
+  }
+
+  /// Combine the blended hash into uint64_t.
+  uint64_t combine() const {
+    return combineHashes(Offset, OpcodeHash, InstrHash, NeighborHash);
+  }
+
+  /// Compute a distance between two given blended hashes. The smaller the
+  /// distance, the more similar two blocks are. For identical basic blocks,
+  /// the distance is zero.
+  uint64_t distance(const BlendedBlockHash &BBH) const {
+    assert(OpcodeHash == BBH.OpcodeHash &&
+           "incorrect blended hash distance computation");
+    uint64_t Dist = 0;
+    // Account for NeighborHash
+    Dist += NeighborHash == BBH.NeighborHash ? 0 : 1;
+    Dist <<= 16;
+    // Account for InstrHash
+    Dist += InstrHash == BBH.InstrHash ? 0 : 1;
+    Dist <<= 16;
+    // Account for Offset
+    Dist += (Offset >= BBH.Offset ? Offset - BBH.Offset : BBH.Offset - Offset);
+    return Dist;
+  }
+
+  /// The offset of the basic block from the function start.
+  uint16_t Offset{0};
+  /// (Loose) Hash of the basic block instructions, excluding operands.
+  uint16_t OpcodeHash{0};
+  /// (Strong) Hash of the basic block instructions, including opcodes and
+  /// operands.
+  uint16_t InstrHash{0};
+  /// Hash of the (loose) basic block together with (loose) hashes of its
+  /// successors and predecessors.
+  uint16_t NeighborHash{0};
+};
+
+class MachineBlockHashInfo : public MachineFunctionPass {
+  DenseMap<unsigned, uint64_t> MBBHashInfo;
+
+public:
+  static char ID;
+  MachineBlockHashInfo();
+
+  StringRef getPassName() const override { return "Basic Block Hash Compute"; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  bool runOnMachineFunction(MachineFunction &F) override;
+
+  uint64_t getMBBHash(const MachineBasicBlock &MBB);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_CODEGEN_MACHINEBLOCKHASHINFO_H
diff --git a/llvm/include/llvm/CodeGen/Passes.h b/llvm/include/llvm/CodeGen/Passes.h
index d214ab9306c2f..063dd43e80638 100644
--- a/llvm/include/llvm/CodeGen/Passes.h
+++ b/llvm/include/llvm/CodeGen/Passes.h
@@ -67,6 +67,13 @@ namespace llvm {
 
   MachineFunctionPass *createBasicBlockPathCloningPass();
 
+  /// createBasicBlockMatchingAndInferencePass - This pass enables matching
+  /// and inference when using propeller.
+  MachineFunctionPass *createBasicBlockMatchingAndInferencePass();
+
+  /// createMachineBlockHashInfoPass - This pass computes basic block hashes.
+  MachineFunctionPass *createMachineBlockHashInfoPass();
+
   /// createMachineFunctionSplitterPass - This pass splits machine functions
   /// using profile information.
   MachineFunctionPass *createMachineFunctionSplitterPass();
diff --git a/llvm/include/llvm/InitializePasses.h b/llvm/include/llvm/InitializePasses.h
index 1ce36a95317b4..3172b135426f6 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -53,6 +53,7 @@ void initializeAlwaysInlinerLegacyPassPass(PassRegistry &);
 void initializeAssignmentTrackingAnalysisPass(PassRegistry &);
 void initializeAssumptionCacheTrackerPass(PassRegistry &);
 void initializeAtomicExpandLegacyPass(PassRegistry &);
+void initializeBasicBlockMatchingAndInferencePass(PassRegistry &);
 void initializeBasicBlockPathCloningPass(PassRegistry &);
 void initializeBasicBlockSectionsProfileReaderWrapperPassPass(PassRegistry &);
 void initializeBasicBlockSectionsPass(PassRegistry &);
@@ -185,6 +186,7 @@ void initializeMIRCanonicalizerPass(PassRegistry &);
 void initializeMIRNamerPass(PassRegistry &);
 void initializeMIRPrintingPassPass(PassRegistry &);
 void initializeMachineBlockFrequencyInfoWrapperPassPass(PassRegistry &);
+void initializeMachineBlockHashInfoPass(PassRegistry&);
 void initializeMachineBlockPlacementLegacyPass(PassRegistry &);
 void initializeMachineBlockPlacementStatsLegacyPass(PassRegistry &);
 void initializeMachineBranchProbabilityInfoWrapperPassPass(PassRegistry &);
diff --git a/llvm/include/llvm/Object/ELFTypes.h b/llvm/include/llvm/Object/ELFTypes.h
index 87e4dbe448091..bbf07d87bb318 100644
--- a/llvm/include/llvm/Object/ELFTypes.h
+++ b/llvm/include/llvm/Object/ELFTypes.h
@@ -831,6 +831,7 @@ struct BBAddrMap {
     bool BrProb : 1;
     bool MultiBBRange : 1;
     bool OmitBBEntries : 1;
+    bool BBHash : 1;
 
     bool hasPGOAnalysis() const { return FuncEntryCount || BBFreq || BrProb; }
 
@@ -842,7 +843,8 @@ struct BBAddrMap {
              (static_cast<uint8_t>(BBFreq) << 1) |
              (static_cast<uint8_t>(BrProb) << 2) |
              (static_cast<uint8_t>(MultiBBRange) << 3) |
-             (static_cast<uint8_t>(OmitBBEntries) << 4);
+             (static_cast<uint8_t>(OmitBBEntries) << 4) | 
+             (static_cast<uint8_t>(BBHash) << 5);
     }
 
     // Decodes from minimum bit width representation and validates no
@@ -851,7 +853,7 @@ struct BBAddrMap {
       Features Feat{
           static_cast<bool>(Val & (1 << 0)), static_cast<bool>(Val & (1 << 1)),
           static_cast<bool>(Val & (1 << 2)), static_cast<bool>(Val & (1 << 3)),
-          static_cast<bool>(Val & (1 << 4))};
+          static_cast<bool>(Val & (1 << 4)), static_cast<bool>(Val & (1 << 5))};
       if (Feat.encode() != Val)
         return createStringError(
             std::error_code(), "invalid encoding for BBAddrMap::Features: 0x%x",
@@ -861,9 +863,9 @@ struct BBAddrMap {
 
     bool operator==(const Features &Other) const {
       return std::tie(FuncEntryCount, BBFreq, BrProb, MultiBBRange,
-                      OmitBBEntries) ==
+                      OmitBBEntries, BBHash) ==
              std::tie(Other.FuncEntryCount, Other.BBFreq, Other.BrProb,
-                      Other.MultiBBRange, Other.OmitBBEntries);
+                      Other.MultiBBRange, Other.OmitBBEntries, Other.BBHash);
     }
   };
 
@@ -914,9 +916,10 @@ struct BBAddrMap {
     uint32_t Size = 0;   // Size of the basic block.
     Metadata MD = {false, false, false, false,
                    false}; // Metdata for this basic block.
+    uint64_t Hash = 0;     // Hash for this basic block.
 
-    BBEntry(uint32_t ID, uint32_t Offset, uint32_t Size, Metadata MD)
-        : ID(ID), Offset(Offset), Size(Size), MD(MD){};
+    BBEntry(uint32_t ID, uint32_t Offset, uint32_t Size, Metadata MD, uint64_t Hash = 0)
+        : ID(ID), Offset(Offset), Size(Size), MD(MD), Hash(Hash){};
 
     bool operator==(const BBEntry &Other) const {
       return ID == Other.ID && Offset == Other.Offset && Size == Other.Size &&
diff --git a/llvm/include/llvm/ObjectYAML/ELFYAML.h b/llvm/include/llvm/ObjectYAML/ELFYAML.h
index dfdfa055d65fa..9427042db4303 100644
--- a/llvm/include/llvm/ObjectYAML/ELFYAML.h
+++ b/llvm/include/llvm/ObjectYAML/ELFYAML.h
@@ -162,6 +162,7 @@ struct BBAddrMapEntry {
     llvm::yaml::Hex64 AddressOffset;
     llvm::yaml::Hex64 Size;
     llvm::yaml::Hex64 Metadata;
+    llvm::yaml::Hex64 Hash;
   };
   uint8_t Version;
   llvm::yaml::Hex8 Feature;
diff --git a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
index 7231e45fe8eb7..2b4db171bfdfb 100644
--- a/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
+++ b/llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
@@ -130,6 +130,11 @@ template <typename FT> class SampleProfileInference {
   SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,
                          BlockWeightMap &SampleBlockWeights)
       : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights) {}
+  SampleProfileInference(FunctionT &F, BlockEdgeMap &Successors,
+                         BlockWeightMap &SampleBlockWeights,
+                         EdgeWeightMap &SampleEdgeWeights)
+      : F(F), Successors(Successors), SampleBlockWeights(SampleBlockWeights), 
+        SampleEdgeWeights(SampleEdgeWeights) {}
 
   /// Apply the profile inference algorithm for a given function
   void apply(BlockWeightMap &BlockWeights, EdgeWeightMap &EdgeWeights);
@@ -157,6 +162,9 @@ template <typename FT> class SampleProfileInference {
 
   /// Map basic blocks to their sampled weights.
   BlockWeightMap &SampleBlockWeights;
+
+  /// Map edges to their sampled weights.
+  EdgeWeightMap SampleEdgeWeights;
 };
 
 template <typename BT>
@@ -266,6 +274,14 @@ FlowFunction SampleProfileInference<BT>::createFlowFunction(
       FlowJump Jump;
       Jump.Source = BlockIndex[BB];
       Jump.Target = BlockIndex[Succ];
+      auto It = SampleEdgeWeights.find(std::make_pair(BB, Succ));
+      if (It != SampleEdgeWeights.end()) {
+        Jump.HasUnknownWeight = false;
+        Jump.Weight = It->second;
+      } else {
+        Jump.HasUnknownWeight = true;
+        Jump.Weight = 0;
+      }
       Func.Jumps.push_back(Jump);
     }
   }
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index bdcd54a135da9..41c084a4e4e49 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -40,6 +40,7 @@
 #include "llvm/CodeGen/GCMetadataPrinter.h"
 #include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBlockHashInfo.h"
 #include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineDominators.h"
@@ -180,6 +181,8 @@ static cl::opt<bool> PrintLatency(
     cl::desc("Print instruction latencies as verbose asm comments"), cl::Hidden,
     cl::init(false));
 
+extern cl::opt<bool> EmitBBHash;
+
 STATISTIC(EmittedInsts, "Number of machine instrs printed");
 
 char AsmPrinter::ID = 0;
@@ -454,6 +457,8 @@ void AsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequired<GCModuleInfo>();
   AU.addRequired<LazyMachineBlockFrequencyInfoPass>();
   AU.addRequired<MachineBranchProbabilityInfoWrapperPass>();
+  if (EmitBBHash)
+    AU.addRequired<MachineBlockHashInfo>();
 }
 
 bool AsmPrinter::doInitialization(Module &M) {
@@ -1419,7 +1424,8 @@ getBBAddrMapFeature(const MachineFunction &MF, int NumMBBSectionRanges) {
   }
   return {FuncEntryCountEnabled, BBFreqEnabled, BrProbEnabled,
           MF.hasBBSections() && NumMBBSectionRanges > 1,
-          static_cast<bool>(BBAddrMapSkipEmitBBEntries)};
+          static_cast<bool>(BBAddrMapSkipEmitBBEntries),
+          static_cast<bool>(EmitBBHash)};
 }
 
 void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
@@ -1477,6 +1483,8 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
       PrevMBBEndSymbol = MBBSymbol;
     }
 
+    auto MBHI = Features.BBHash ? &getAnalysis<MachineBlockHashInfo>() : nullptr;
+
     if (!Features.OmitBBEntries) {
       // TODO: Remove this check when version 1 is deprecated.
       if (BBAddrMapVersion > 1) {
@@ -1496,6 +1504,10 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
       emitLabelDifferenceAsULEB128(MBB.getEndSymbol(), MBBSymbol);
       // Emit the Metadata.
       OutStreamer->emitULEB128IntValue(getBBAddrMapMetadata(MBB));
+      // Emit the Hash.
+      if (MBHI) {
+        OutStreamer->emitULEB128IntValue(MBHI->getMBBHash(MBB));
+      }
     }
 
     PrevMBBEndSymbol = MBB.getEndSymbol();
diff --git a/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp b/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp
new file mode 100644
index 0000000000000..e2776162043ff
--- /dev/null
+++ b/llvm/lib/CodeGen/BasicBlockMatchingAndInference.cpp
@@ -0,0 +1,168 @@
+#include "llvm/CodeGen/BasicBlockMatchingAndInference.h"
+#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
+#include "llvm/CodeGen/MachineBlockHashInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/InitializePasses.h"
+#include <llvm/Support/CommandLine.h>
+
+using namespace llvm;
+
+static cl::opt<float>
+    PropellerInferThreshold("propeller-infer-threshold",
+                            cl::desc("Threshold for infer stale profile"),
+                            cl::init(0.6), cl::Optional);
+
+/// The object is used to identify and match basic blocks given their hashes.
+class StaleMatcher {
+public:
+  /// Initialize stale matcher.
+  void init(const std::vector<MachineBasicBlock *> &Blocks,
+            const std::vector<BlendedBlockHash> &Hashes) {
+    assert(Blocks.size() == Hashes.size() &&
+           "incorrect matcher initialization");
+    for (size_t I = 0; I < Blocks.size(); I++) {
+      MachineBasicBlock *Block = Blocks[I];
+      uint16_t OpHash = Hashes[I].OpcodeHash;
+      OpHashToBlocks[OpHash].push_back(std::make_pair(Hashes[I], Block));
+    }
+  }
+
+  /// Find the most similar block for a given hash.
+  MachineBasicBlock *matchBlock(BlendedBlockHash BlendedHash) const {
+    auto BlockIt = OpHashToBlocks.find(BlendedHash.OpcodeHash);
+    if (BlockIt == OpHashToBlocks.end()) {
+      return nullptr;
+    }
+    MachineBasicBlock *BestBlock = nullptr;
+    uint64_t BestDist = std::numeric_limits<uint64_t>::max();
+    for (auto It : BlockIt->second) {
+      MachineBasicBlock *Block = It.second;
+      BlendedBlockHash Hash = It.first;
+      uint64_t Dist = Hash.distance(BlendedHash);
+      if (BestBlock == nullptr || Dist < BestDist) {
+        BestDist = Dist;
+        BestBlock = Block;
+      }
+    }
+    return BestBlock;
+  }
+
+private:
+  using HashBlockPairType = std::pair<BlendedBlockHash, MachineBasicBlock *>;
+  std::unordered_map<uint16_t, std::vector<HashBlockPairType>> OpHashToBlocks;
+};
+
+INITIALIZE_PASS_BEGIN(BasicBlockMatchingAndInference,
+                      "machine-block-match-infer",
+                      "Machine Block Matching and Inference Analysis", true,
+                      true)
+INITIALIZE_PASS_DEPENDENCY(MachineBlockHashInfo)
+INITIALIZE_PASS_DEPENDENCY(BasicBlockSectionsProfileReaderWrapperPass)
+INITIALIZE_PASS_END(BasicBlockMatchingAndInference, "machine-block-match-infer",
+                    "Machine Block Matching and Inference Analysis", true, true)
+
+char BasicBlockMatch...
[truncated]

@rlavaee
Copy link
Contributor

rlavaee commented Sep 26, 2025

This needs to be split into at least two PRs, with the first one enabling the decoding (changes in ELF.cpp, etc.), at the second one enabling the encoding (changes in the Codgen). Please also add a new version number for this (where the feature is only supported for this version and up) like the other features: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/ELF.cpp#L850-854.

@llvmbot llvmbot added the llvm:mc Machine (object) code label Sep 27, 2025
@wdx727
Copy link
Author

wdx727 commented Sep 27, 2025

This needs to be split into at least two PRs, with the first one enabling the decoding (changes in ELF.cpp, etc.), at the second one enabling the encoding (changes in the Codgen). Please also add a new version number for this (where the feature is only supported for this version and up) like the other features: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/ELF.cpp#L850-854.

Thanks. We have split the original PR into two separate ones. The current PR only contains modifications related to ELF and basic block hash calculation. Additionally, we have updated the version number of SHT_LLVM_BB_ADDR_MAP.

@rlavaee
Copy link
Contributor

rlavaee commented Sep 30, 2025

This needs to be split into at least two PRs, with the first one enabling the decoding (changes in ELF.cpp, etc.), at the second one enabling the encoding (changes in the Codgen). Please also add a new version number for this (where the feature is only supported for this version and up) like the other features: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/ELF.cpp#L850-854.

Thanks. We have split the original PR into two separate ones. The current PR only contains modifications related to ELF and basic block hash calculation. Additionally, we have updated the version number of SHT_LLVM_BB_ADDR_MAP.

This is still pending. All the code related to emitting the BB hash as part of codegen must be split into a separate PR.

@wdx727 wdx727 force-pushed the propeller_bb_hash branch from 2c908bd to 9a7b663 Compare October 1, 2025 02:47
@wdx727
Copy link
Author

wdx727 commented Oct 1, 2025

This needs to be split into at least two PRs, with the first one enabling the decoding (changes in ELF.cpp, etc.), at the second one enabling the encoding (changes in the Codgen). Please also add a new version number for this (where the feature is only supported for this version and up) like the other features: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Object/ELF.cpp#L850-854.

Thanks. We have split the original PR into two separate ones. The current PR only contains modifications related to ELF and basic block hash calculation. Additionally, we have updated the version number of SHT_LLVM_BB_ADDR_MAP.

This is still pending. All the code related to emitting the BB hash as part of codegen must be split into a separate PR.

Done. Currently, this PR only relates to ELF modifications. The calculation of the basic block hash will be submitted in the next PR.

Copy link
Contributor

@rlavaee rlavaee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add obj2yaml and yaml2obj tests (like tools/yaml2obj/ELF/bb-addr-map.yaml and ./tools/obj2yaml/ELF/bb-addr-map.yaml). These should ideally be done in a separate PR and they will test the ELFEmitter.cpp and obj2yaml/elf2yaml.cpp changes. If you want to include them here, it would be fine too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a codegen change and must be defered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a codegen change and must be defered.

How can we increment the version of SHT_LLVM_BB_ADDR_MAP without modifying MCContext.h? Should I refrain from changing the version of SHT_LLVM_BB_ADDR_MAP for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We don't want to change the emitted BBAddrMap just yet. The reason for this complexity is that the Propeller tooling is released independently. So if we change the version right away, there is a chance that the compiler gets released before the Propeller tooling. I understand this is somewhat inconvenient. Once we move Propeller to the LLVM, this will be resolved. Nonetheless, splitting the change into smaller parts helps the code review purpose and this PR definitely would benefit from it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. Done.

@@ -1526,6 +1526,9 @@ void ELFState<ELFT>::writeSectionContent(
}
SHeader.sh_size += CBA.writeULEB128(BBE.Size);
SHeader.sh_size += CBA.writeULEB128(BBE.Metadata);
if (FeatureOrErr->BBHash && BBE.Hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right.
We should write the hash if the feature is enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. We should write the hash if the feature is enabled.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is not what I thought because BBE.Hash is std::optional. Since this is YAML, you should use || instead. So we would emit the hash if either the feature is enabled (even if BBE.Hash is zero) or BBE.Hash has value (even if feature is disabled). In the latter case, we don't need to enable the feature value. Please also use BBE.Hash.has_value() to disambiguate against value comparison.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. Done.

@wdx727 wdx727 force-pushed the propeller_bb_hash branch from 9a7b663 to ef6c87d Compare October 2, 2025 14:05
@wdx727
Copy link
Author

wdx727 commented Oct 2, 2025

Please also add obj2yaml and yaml2obj tests (like tools/yaml2obj/ELF/bb-addr-map.yaml and ./tools/obj2yaml/ELF/bb-addr-map.yaml). These should ideally be done in a separate PR and they will test the ELFEmitter.cpp and obj2yaml/elf2yaml.cpp changes. If you want to include them here, it would be fine too.

Done.

@wdx727 wdx727 force-pushed the propeller_bb_hash branch 2 times, most recently from f954e83 to f79dbb9 Compare October 3, 2025 01:13

BBEntry(uint32_t ID, uint32_t Offset, uint32_t Size, Metadata MD,
SmallVector<uint32_t, 1> CallsiteEndOffsets)
SmallVector<uint32_t, 1> CallsiteEndOffsets, uint64_t Hash = 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the default value here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (FeatureOrErr->BBHash || BBE.Hash.has_value()) {
auto Hash = BBE.Hash.has_value() ?
BBE.Hash.value() : llvm::yaml::Hex64(0);;
SHeader.sh_size += CBA.writeULEB128(Hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple points.

  1. ULEB128 gives you size savings for smaller values (which could fit within a smaller number of bytes), but your random hashes can have their most significant bits set. So there wouldn't be any size savings (and you may even add a single extra byte because of the ULEB encoding).
  2. This gets me into thinking whether we actually need 64bit values for the hash. Since we're storing a hash for every basic block, we might be able to use a smaller number of bytes if our inference algorithm is smart-enough. Even with a single byte, collision chance is 1/(2^8) which could be acceptable. We need to remember that this is about performance and a best-effort solution is fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that using ULEB encoding does not help reduce data size. In fact, a 64-bit hash value consists of 4 hashes at different levels, with each hash value being 16 bits. This structure allows us to perform matching at different levels during the matching process, thereby improving matching accuracy.
If a smaller number of bits is used to represent the hash, the matching accuracy will decrease. Nevertheless, this approach may still be feasible. To determine the extent to which the reduced matching accuracy ultimately affects the inference results, further experiments need to be conducted for evaluation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your hashes are always 64-bit, then you don't really need ULEB.
I'd like us to consider the size implications a little bit, since we're planning to include the section in our production binaries. 8 extra bytes per basic block could be a huge overhead (could almost double the total section size).

Do you need the 4 hashes to be separate or can you combine them to form a single hash? If we can combine them, then it's possible to define a more flexible encoding. Every function stores the number of hashing bytes once. Then we read the hash values for the specified number of bytes. Different functions can have varying number of hash bytes. So for larger functions we can utilize more bytes (even more than 8). WDYT?

Copy link
Author

@wdx727 wdx727 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have encoded the hashes in their original 64-bit form instead of using ULEB encoding.
Since employing different hash widths for different functions would introduce additional complexity, we plan to maintain the current 64-bit hash format for now. This will allow the matching and inference features to become available in propeller first. Further research and experimentation will be conducted before we proceed with hash bit compression. Would you be comfortable with this approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@wdx727 wdx727 force-pushed the propeller_bb_hash branch 3 times, most recently from c27d1f2 to 88b80ba Compare October 10, 2025 12:38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do this. Hash should be defined as optional and set if(FeatureOrErr->BBHash). Then we always push_back {ID, Offset, Size, Metadata, std::move(CallsiteEndOffsets), Hash},

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is YAML code so the check here should be checking whether BBE.Hash.has_value() instead of the feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BBE is not YAML.

@@ -1526,6 +1526,12 @@ void ELFState<ELFT>::writeSectionContent(
}
SHeader.sh_size += CBA.writeULEB128(BBE.Size);
SHeader.sh_size += CBA.writeULEB128(BBE.Metadata);
if (FeatureOrErr->BBHash || BBE.Hash.has_value()) {
auto Hash = BBE.Hash.has_value() ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell out the type explicitly here: uint64_t

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rlavaee
Copy link
Contributor

rlavaee commented Oct 10, 2025

BTW, if you have the Codegen PR, we can start reviewing that one too. We just need to have about a 1 week delay between pushing the PRs upstream.

…ic block hash to the SHT_LLVM_BB_ADDR_MAP section.
@wdx727
Copy link
Author

wdx727 commented Oct 11, 2025

BTW, if you have the Codegen PR, we can start reviewing that one too. We just need to have about a 1 week delay between pushing the PRs upstream.

The PR related to CodeGen is ready. #162963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants