Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SHT_LLVM_BB_ADDR_MAP][AsmPrinter] Implements PGOAnalysisMap emitting in AsmPrinter with tests. #75202

Merged
merged 4 commits into from
Jan 4, 2024

Conversation

red1bluelost
Copy link
Member

Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF section. Implements filecheck tests to verify emitting new fields.

PR Series

The current code for PGOBBAddrMap to be upstreamed is split into five PRs:

  1. Object and ObjectYAML - Implements PGOBBAddrMap in Object and ObjectYAML with tests [1/5] #71750
  2. AsmPrinter - (this one) https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--asm-printer
  3. llvm-readobj - Implements llvm-readobj with tests. [3/5] red1bluelost/llvm-project#2 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-readobj
  4. llvm obj2yaml - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-obj2yaml

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 12, 2023

@llvm/pr-subscribers-backend-x86

Author: Micah Weston (red1bluelost)

Changes

Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF section. Implements filecheck tests to verify emitting new fields.

PR Series

The current code for PGOBBAddrMap to be upstreamed is split into five PRs:

  1. Object and ObjectYAML - Implements PGOBBAddrMap in Object and ObjectYAML with tests [1/5] #71750
  2. AsmPrinter - (this one) https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--asm-printer
  3. llvm-readobj - Implements llvm-readobj with tests. [3/5] red1bluelost/llvm-project#2 or https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-readobj
  4. llvm obj2yaml - https://github.com/red1bluelost/llvm-project/tree/pgo-bb-addr-map--llvm-obj2yaml

Patch is 27.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75202.diff

8 Files Affected:

  • (modified) llvm/docs/Extensions.rst (+84)
  • (modified) llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp (+77-2)
  • (modified) llvm/test/CodeGen/X86/basic-block-labels-mir-parse.mir (+6)
  • (modified) llvm/test/CodeGen/X86/basic-block-sections-labels-empty-block.ll (+2)
  • (modified) llvm/test/CodeGen/X86/basic-block-sections-labels-empty-function.ll (+4-2)
  • (modified) llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll (+26-4)
  • (added) llvm/test/CodeGen/X86/basic-block-sections-labels-pgo-features.ll (+120)
  • (modified) llvm/test/CodeGen/X86/basic-block-sections-labels.ll (+41-4)
diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index 6e94840897832a..5dadc85b2c5341 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -451,6 +451,90 @@ Example:
    .uleb128  .LBB_END0_1-.LBB0_1          # BB_1 size
    .byte     y                            # BB_1 metadata
 
+PGO Analysis Map Extra Data
+"""""""""""""""""""""""""""
+
+PGO related analysis data can be emitted after each function within the
+BBAddrMap through the optional ``pgo-analysis-map`` flag. Supported analyses
+currently are Function Entry Count, Basic Block Frequencies, and Branch
+Probabilities.
+
+Each analysis is enabled or disabled via a bit in the feature byte. Currently
+those bits are:
+
+#. Function Entry Count - Number of times the function was called as taken
+   from a PGO profile. This will always be zero if PGO was not used or the
+   function was not encountered in the profile.
+
+#. Basic Block Frequencies - Encoded as raw block frequency value taken from
+   MBFI analysis. This value is an integer that encodes the relative frequency
+   compared to the entry block. More information can be found in
+   'llvm/Support/BlockFrequency.h'.
+
+#. Branch Probabilities - Encoded as raw numerator for branch probability
+   taken from MBPI analysis. This value is the numerator for a fixed point ratio
+   defined in 'llvm/Support/BranchProbability.h'. It indicates the probability
+   that the block is followed by a given successor block during execution.
+
+This extra data requires version 2 or above. This is necessary since successors
+of basic blocks won't know their index but will know their BB ID.
+
+Example of BBAddrMap with PGO data:
+
+.. code-block:: gas
+
+  .section  ".llvm_bb_addr_map","",@llvm_bb_addr_map
+  .byte     2                             # version number
+  .byte     7                             # feature byte - PGO analyses enabled mask
+  .quad     .Lfunc_begin0                 # address of the function
+  .uleb128  4                             # number of basic blocks
+  # BB record for BB_0
+   .uleb128  0                            # BB_0 BB ID
+   .uleb128  .Lfunc_begin0-.Lfunc_begin0  # BB_0 offset relative to function entry (always zero)
+   .uleb128  .LBB_END0_0-.Lfunc_begin0    # BB_0 size
+   .byte     0x18                         # BB_0 metadata (multiple successors)
+  # BB record for BB_1
+   .uleb128  1                            # BB_1 BB ID
+   .uleb128  .LBB0_1-.LBB_END0_0          # BB_1 offset relative to the end of last block (BB_0).
+   .uleb128  .LBB_END0_1-.LBB0_1          # BB_1 size
+   .byte     0x0                          # BB_1 metadata (two successors)
+  # BB record for BB_2
+   .uleb128  2                            # BB_2 BB ID
+   .uleb128  .LBB0_2-.LBB_END1_0          # BB_2 offset relative to the end of last block (BB_1).
+   .uleb128  .LBB_END0_2-.LBB0_2          # BB_2 size
+   .byte     0x0                          # BB_2 metadata (one successor)
+  # BB record for BB_3
+   .uleb128  3                            # BB_3 BB ID
+   .uleb128  .LBB0_3-.LBB_END0_2          # BB_3 offset relative to the end of last block (BB_2).
+   .uleb128  .LBB_END0_3-.LBB0_3          # BB_3 size
+   .byte     0x0                          # BB_3 metadata (zero successors)
+  # PGO Analysis Map
+  .uleb128  1000                          # function entry count (only when enabled)
+  # PGO data record for BB_0
+   .uleb128  1000                         # BB_0 basic block frequency (only when enabled)
+   .uleb128  3                            # BB_0 successors count (only enabled with branch probabilities)
+   .uleb128  1                            # BB_0 successor 1 BB ID (only enabled with branch probabilities)
+   .uleb128  0x22222222                   # BB_0 successor 1 branch probability (only enabled with branch probabilities)
+   .uleb128  2                            # BB_0 successor 2 BB ID (only enabled with branch probabilities)
+   .uleb128  0x33333333                   # BB_0 successor 2 branch probability (only enabled with branch probabilities)
+   .uleb128  3                            # BB_0 successor 3 BB ID (only enabled with branch probabilities)
+   .uleb128  0xaaaaaaaa                   # BB_0 successor 3 branch probability (only enabled with branch probabilities)
+  # PGO data record for BB_1
+   .uleb128  133                          # BB_1 basic block frequency (only when enabled)
+   .uleb128  2                            # BB_1 successors count (only enabled with branch probabilities)
+   .uleb128  2                            # BB_1 successor 1 BB ID (only enabled with branch probabilities)
+   .uleb128  0x11111111                   # BB_1 successor 1 branch probability (only enabled with branch probabilities)
+   .uleb128  3                            # BB_1 successor 2 BB ID (only enabled with branch probabilities)
+   .uleb128  0x11111111                   # BB_1 successor 2 branch probability (only enabled with branch probabilities)
+  # PGO data record for BB_2
+   .uleb128  18                           # BB_2 basic block frequency (only when enabled)
+   .uleb128  1                            # BB_2 successors count (only enabled with branch probabilities)
+   .uleb128  3                            # BB_2 successor 1 BB ID (only enabled with branch probabilities)
+   .uleb128  0xffffffff                   # BB_2 successor 1 branch probability (only enabled with branch probabilities)
+  # PGO data record for BB_3
+   .uleb128  1000                         # BB_3 basic block frequency (only when enabled)
+   .uleb128  0                            # BB_3 successors count (only enabled with branch probabilities)
+
 ``SHT_LLVM_OFFLOADING`` Section (offloading data)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 This section stores the binary data used to perform offloading device linking
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 3a679f1576b7b6..78878d30597ab6 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -40,6 +40,7 @@
 #include "llvm/CodeGen/GCMetadataPrinter.h"
 #include "llvm/CodeGen/LazyMachineBlockFrequencyInfo.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
 #include "llvm/CodeGen/MachineConstantPool.h"
 #include "llvm/CodeGen/MachineDominators.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
@@ -139,6 +140,26 @@ static cl::opt<std::string> BasicBlockProfileDump(
              "performed with -basic-block-sections=labels. Enabling this "
              "flag during in-process ThinLTO is not supported."));
 
+// This is a replication of fields of object::PGOAnalysisMap::Features. It
+// should match the order of the fields so that
+// `object::PGOAnalysisMap::Features::decode(PgoAnalysisMapFeatures.getBits())`
+// succeeds.
+enum class PGOMapFeaturesEnum {
+  FuncEntryCount,
+  BBFreq,
+  BrProb,
+};
+static cl::bits<PGOMapFeaturesEnum> PgoAnalysisMapFeatures(
+    "pgo-analysis-map", cl::Hidden, cl::CommaSeparated,
+    cl::values(clEnumValN(PGOMapFeaturesEnum::FuncEntryCount, "func-entry-count",
+                          "Function Entry Count"),
+               clEnumValN(PGOMapFeaturesEnum::BBFreq, "bb-freq",
+                          "Basic Block Frequency"),
+               clEnumValN(PGOMapFeaturesEnum::BrProb, "br-prob",
+                          "Branch Probability")),
+    cl::desc("Enable extended information within the BBAddrMap that is "
+             "extracted from PGO related analysis."));
+
 const char DWARFGroupName[] = "dwarf";
 const char DWARFGroupDescription[] = "DWARF Emission";
 const char DbgTimerName[] = "emit";
@@ -427,6 +448,7 @@ void AsmPrinter::getAnalysisUsage(AnalysisUsage &AU) const {
   AU.addRequired<MachineOptimizationRemarkEmitterPass>();
   AU.addRequired<GCModuleInfo>();
   AU.addRequired<LazyMachineBlockFrequencyInfoPass>();
+  AU.addRequired<MachineBranchProbabilityInfo>();
 }
 
 bool AsmPrinter::doInitialization(Module &M) {
@@ -1377,7 +1399,8 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
   uint8_t BBAddrMapVersion = OutStreamer->getContext().getBBAddrMapVersion();
   OutStreamer->emitInt8(BBAddrMapVersion);
   OutStreamer->AddComment("feature");
-  OutStreamer->emitInt8(0);
+  auto FeaturesBits = static_cast<uint8_t>(PgoAnalysisMapFeatures.getBits());
+  OutStreamer->emitInt8(FeaturesBits);
   OutStreamer->AddComment("function address");
   OutStreamer->emitSymbolValue(FunctionSymbol, getPointerSize());
   OutStreamer->AddComment("number of basic blocks");
@@ -1407,6 +1430,51 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
     OutStreamer->emitULEB128IntValue(getBBAddrMapMetadata(MBB));
     PrevMBBEndSymbol = MBB.getEndSymbol();
   }
+
+  if (FeaturesBits != 0) {
+    assert(BBAddrMapVersion >= 2 &&
+           "PGOAnalysisMap only supports version 2 or later");
+
+    auto FeatEnable =
+        cantFail(object::PGOAnalysisMap::Features::decode(FeaturesBits));
+
+    if (FeatEnable.FuncEntryCount) {
+      OutStreamer->AddComment("function entry count");
+      auto MaybeEntryCount = MF.getFunction().getEntryCount();
+      OutStreamer->emitULEB128IntValue(
+          MaybeEntryCount ? MaybeEntryCount->getCount() : 0);
+    }
+    const MachineBlockFrequencyInfo *MBFI =
+        FeatEnable.BBFreq
+            ? &getAnalysis<LazyMachineBlockFrequencyInfoPass>().getBFI()
+            : nullptr;
+    const MachineBranchProbabilityInfo *MBPI =
+        FeatEnable.BrProb ? &getAnalysis<MachineBranchProbabilityInfo>()
+                          : nullptr;
+
+    if (FeatEnable.BBFreq || FeatEnable.BrProb) {
+      for (const MachineBasicBlock &MBB : MF) {
+        if (FeatEnable.BBFreq) {
+          OutStreamer->AddComment("basic block frequency");
+          OutStreamer->emitULEB128IntValue(
+              MBFI->getBlockFreq(&MBB).getFrequency());
+        }
+        if (FeatEnable.BrProb) {
+          unsigned SuccCount = MBB.succ_size();
+          OutStreamer->AddComment("basic block successor count");
+          OutStreamer->emitULEB128IntValue(SuccCount);
+          for (const MachineBasicBlock *SuccMBB : MBB.successors()) {
+            OutStreamer->AddComment("successor BB ID");
+            OutStreamer->emitULEB128IntValue(SuccMBB->getBBID()->BaseID);
+            OutStreamer->AddComment("successor branch probability");
+            OutStreamer->emitULEB128IntValue(
+                MBPI->getEdgeProbability(&MBB, SuccMBB).getNumerator());
+          }
+        }
+      }
+    }
+  }
+
   OutStreamer->popSection();
 }
 
@@ -1932,8 +2000,15 @@ void AsmPrinter::emitFunctionBody() {
 
   // Emit section containing BB address offsets and their metadata, when
   // BB labels are requested for this function. Skip empty functions.
-  if (MF->hasBBLabels() && HasAnyRealCode)
+  bool HasLabels = MF->hasBBLabels();
+  if (HasLabels && HasAnyRealCode)
     emitBBAddrMapSection(*MF);
+  else if (!HasLabels && HasAnyRealCode &&
+           PgoAnalysisMapFeatures.getBits() != 0)
+    MF->getContext().reportWarning(
+        SMLoc(), "pgo-analysis-map is enabled but the following machine "
+                 "function was does not have labels: " +
+                     MF->getName());
 
   // Emit sections containing instruction and function PCs.
   emitPCSections(*MF);
diff --git a/llvm/test/CodeGen/X86/basic-block-labels-mir-parse.mir b/llvm/test/CodeGen/X86/basic-block-labels-mir-parse.mir
index f11707c719895d..5f124c071801db 100644
--- a/llvm/test/CodeGen/X86/basic-block-labels-mir-parse.mir
+++ b/llvm/test/CodeGen/X86/basic-block-labels-mir-parse.mir
@@ -1,6 +1,12 @@
 # Start after bbsections0-prepare and check that the BB address map is generated.
 # RUN: llc -mtriple x86_64-unknown-linux-gnu -start-after=bbsections-prepare  %s -o - | FileCheck %s -check-prefix=CHECK
 
+# Also verify that this holds for PGO extension
+# RUN: llc -mtriple x86_64-unknown-linux-gnu -start-after=bbsections-prepare -pgo-analysis-map=func-entry-count,bb-freq,br-prob %s -o - | FileCheck %s -check-prefix=CHECK
+# RUN: llc -mtriple x86_64-unknown-linux-gnu -start-after=bbsections-prepare -pgo-analysis-map=func-entry-count %s -o - | FileCheck %s -check-prefix=CHECK
+# RUN: llc -mtriple x86_64-unknown-linux-gnu -start-after=bbsections-prepare -pgo-analysis-map=bb-freq %s -o - | FileCheck %s -check-prefix=CHECK
+# RUN: llc -mtriple x86_64-unknown-linux-gnu -start-after=bbsections-prepare -pgo-analysis-map=br-prob %s -o - | FileCheck %s -check-prefix=CHECK
+
 # How to generate the input:
 # foo.cc
 # int foo(bool k) {
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-block.ll b/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-block.ll
index 8e0f4fa7bc928e..f4b3bc1beb0e2e 100644
--- a/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-block.ll
+++ b/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-block.ll
@@ -1,5 +1,7 @@
 ;; This test verifies that with -gc-empty-basic-blocks SHT_LLVM_BB_ADDR_MAP will not include entries for empty blocks.
 ; RUN: llc < %s -mtriple=x86_64 -O0 -basic-block-sections=labels -gc-empty-basic-blocks | FileCheck --check-prefix=CHECK %s
+;; Additionally test that this holds for pgo extension
+; RUN: llc < %s -mtriple=x86_64 -O0 -basic-block-sections=labels -pgo-analysis-map=bb-freq -gc-empty-basic-blocks | FileCheck --check-prefixes=CHECK %s
 
 define void @foo(i1 zeroext %0) nounwind {
   br i1 %0, label %2, label %empty_block
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-function.ll b/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-function.ll
index 7b7bbb95fb4e21..d2e09884ff1299 100644
--- a/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-function.ll
+++ b/llvm/test/CodeGen/X86/basic-block-sections-labels-empty-function.ll
@@ -1,5 +1,6 @@
 ;; Verify that the BB address map is not emitted for empty functions.
-; RUN: llc < %s -mtriple=x86_64 -basic-block-sections=labels | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64 -basic-block-sections=labels | FileCheck %s --check-prefixes=CHECK,BASIC
+; RUN: llc < %s -mtriple=x86_64 -basic-block-sections=labels -pgo-analysis-map=bb-freq | FileCheck %s --check-prefixes=CHECK,PGO
 
 define void @empty_func() {
 entry:
@@ -19,5 +20,6 @@ entry:
 ; CHECK:	.Lfunc_begin1:
 ; CHECK:		.section	.llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text{{$}}
 ; CHECK-NEXT:		.byte 2			# version
-; CHECK-NEXT:		.byte 0			# feature
+; BASIC-NEXT:		.byte 0			# feature
+; PGO-NEXT:		.byte 2			# feature
 ; CHECK-NEXT:		.quad	.Lfunc_begin1	# function address
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll b/llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll
index b8217bbc00760f..1b31069b67113c 100644
--- a/llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll
+++ b/llvm/test/CodeGen/X86/basic-block-sections-labels-functions-sections.ll
@@ -1,4 +1,5 @@
-; RUN: llc < %s -mtriple=x86_64 -function-sections -basic-block-sections=labels | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64 -function-sections -basic-block-sections=labels | FileCheck %s --check-prefixes=CHECK,BASIC
+; RUN: llc < %s -mtriple=x86_64 -function-sections -basic-block-sections=labels -pgo-analysis-map=func-entry-count,bb-freq | FileCheck %s --check-prefixes=CHECK,PGO
 
 $_Z4fooTIiET_v = comdat any
 
@@ -11,8 +12,15 @@ define dso_local i32 @_Z3barv() {
 ; CHECK-NEXT:	[[BAR_BEGIN:.Lfunc_begin[0-9]+]]:
 ; CHECK:		.section .llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text._Z3barv{{$}}
 ; CHECK-NEXT:		.byte 2			# version
-; CHECK-NEXT:		.byte 0			# feature
+; BASIC-NEXT:		.byte 0			# feature
+; PGO-NEXT:		.byte 3			# feature
 ; CHECK-NEXT:		.quad [[BAR_BEGIN]]	# function address
+; CHECK-NEXT:		.byte 1			# number of basic blocks
+; CHECK-NEXT:	      	.byte 0			# BB id
+; CHECK-NEXT:	      	.uleb128 .Lfunc_begin0-.Lfunc_begin0
+; CHECK-NEXT:	      	.uleb128 .LBB_END0_0-.Lfunc_begin0
+; CHECK-NEXT:	      	.byte 1
+; PGO-NEXT:		.byte 0			# function entry count
 
 
 define dso_local i32 @_Z3foov() {
@@ -24,8 +32,15 @@ define dso_local i32 @_Z3foov() {
 ; CHECK-NEXT:	[[FOO_BEGIN:.Lfunc_begin[0-9]+]]:
 ; CHECK:		.section  .llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text._Z3foov{{$}}
 ; CHECK-NEXT:		.byte 2			# version
-; CHECK-NEXT:		.byte 0			# feature
+; BASIC-NEXT:		.byte 0			# feature
+; PGO-NEXT:		.byte 3			# feature
 ; CHECK-NEXT:		.quad [[FOO_BEGIN]]	# function address
+; CHECK-NEXT:		.byte 1			# number of basic blocks
+; CHECK-NEXT:	      	.byte 0			# BB id
+; CHECK-NEXT:	      	.uleb128 .Lfunc_begin1-.Lfunc_begin1
+; CHECK-NEXT:	      	.uleb128 .LBB_END1_0-.Lfunc_begin1
+; CHECK-NEXT:	      	.byte 1
+; PGO-NEXT:		.byte 0			# function entry count
 
 
 define linkonce_odr dso_local i32 @_Z4fooTIiET_v() comdat {
@@ -37,5 +52,12 @@ define linkonce_odr dso_local i32 @_Z4fooTIiET_v() comdat {
 ; CHECK-NEXT:	[[FOOCOMDAT_BEGIN:.Lfunc_begin[0-9]+]]:
 ; CHECK:		.section .llvm_bb_addr_map,"Go",@llvm_bb_addr_map,_Z4fooTIiET_v,comdat,.text._Z4fooTIiET_v{{$}}
 ; CHECK-NEXT:		.byte 2				# version
-; CHECK-NEXT:		.byte 0				# feature
+; BASIC-NEXT:		.byte 0				# feature
+; PGO-NEXT:		.byte 3				# feature
 ; CHECK-NEXT:		.quad [[FOOCOMDAT_BEGIN]]	# function address
+; CHECK-NEXT:		.byte 1				# number of basic blocks
+; CHECK-NEXT:	      	.byte 0			# BB id
+; CHECK-NEXT:	      	.uleb128 .Lfunc_begin2-.Lfunc_begin2
+; CHECK-NEXT:	      	.uleb128 .LBB_END2_0-.Lfunc_begin2
+; CHECK-NEXT:	      	.byte 1
+; PGO-NEXT:		.byte 0				# function entry count
diff --git a/llvm/test/CodeGen/X86/basic-block-sections-labels-pgo-features.ll b/llvm/test/CodeGen/X86/basic-block-sections-labels-pgo-features.ll
new file mode 100644
index 00000000000000..4c4cb615e593c7
--- /dev/null
+++ b/llvm/test/CodeGen/X86/basic-block-sections-labels-pgo-features.ll
@@ -0,0 +1,120 @@
+; Check the basic block sections labels option
+; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels | FileCheck %s --check-prefixes=CHECK,BASIC
+
+;; Also verify this holds for all PGO features enabled
+; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels -pgo-analysis-map=func-entry-count,bb-freq,br-prob | FileCheck %s --check-prefixes=CHECK,PGO-ALL,PGO-FEC,PGO-BBF,PGO-BRP
+
+;; Also verify that pgo extension only includes the enabled feature
+; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels -pgo-analysis-map=func-entry-count | FileCheck %s --check-prefixes=CHECK,PGO-FEC,FEC-ONLY
+; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels -pgo-analysis-map=bb-freq | FileCheck %s --check-prefixes=CHECK,PGO-BBF,BBF-ONLY
+; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels -pgo-analysis-map=br-prob | FileCheck %s --check-prefixes=CHECK,PGO-BRP,BRP-ONLY
+
+
+define void @_Z3bazb(i1 zeroext, i1 zeroext) personality ptr @__gxx_personality_v0 {
+  br i1 %0, label %3, label %8
+
+3:
+  %4 = invoke i32 @_Z3barv()
+          to label %8 unwind label %6
+  br label %10
+
+6:
+  landingpad { ptr, i32 }
+          catch ptr null
+  br label %12
+
+8:
+  %9 = call i32 @_Z3foov()
+  br i1 %1, label %12, label %10
+
+10:
+  %11 = select i1 %1, ptr blockaddress(@_Z3bazb, %3), ptr blockaddress(@_Z3bazb, %12) ; <ptr> [#uses=1]
+  indirectbr ptr %11, [label %3, label %12]
+
+12:
+  ret void
+}
+
+declare i32 @_Z3barv() #1
+
+declare i32 @_Z3foov() #1
+
+declare i32 @__gxx_personality_v0(...)
+
+; CHECK:	.section .text._Z3bazb,"ax",@progbits{{$}}
+; CHECK-LABEL:	_Z3bazb:
+; CHECK-LABEL:	.Lfunc_begin0:
+; CHECK-LABEL:	.LBB_END0_0:
+; CHECK-LABEL:	.LBB0_1:
+; CHECK-LABEL:	.LBB_END0_1:
+; CHECK-LABEL:	.LBB0_2:
+; CHECK-LABEL:	.LBB_END0_2:
+; CHECK-LABEL:	.LBB0_3:
+; CHECK-LABEL:	.LBB_END0_3:
+; CHECK-LABEL:	.Lfunc_end0:
+
+; CHECK: 	.section	.llvm_bb_addr_map,"o",@llvm_bb_addr_map,.text._Z3bazb{{$}}
+; CHECK-NEXT:   .byte   2		# version
+; BASIC-NEXT:   .byte   0		# feature
+; PGO-ALL-NEXT:	.byte   7		# feature
+; FEC-ONLY-NEXT:.byte   1		# feature
+; BBF-ONLY-NEXT:.byte   2		# feature
+; BRP-ONLY-NEXT:.byte   4		# feature
+; CHECK-NEXT:	.quad	.Lfunc_begin0	# function address
+; CHECK-NEXT:	.byte	6		# number of basic blocks
+; CHECK-NEXT: ...
[truncated]

Copy link

github-actions bot commented Dec 12, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -1377,7 +1399,8 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
uint8_t BBAddrMapVersion = OutStreamer->getContext().getBBAddrMapVersion();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking to @MatzeB comment on the draft PR: red1bluelost#1 (comment).

FWIW as far as I can tell OutStreamer->getContext().getBBAddrMapVersion() is never set anywhere in the whole LLVM codebase so is always 2 in practice. @rlavaee would it make sense to remove this field and accessor and just put a constant somewhere instead? (out of scope for this diff of course, should do that separately)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sounds good to me.

Implements remaining relevant tests for asm-printer.

Adds comma separation to the option.

Decouples tests from block frequency value.

Adds a check if the option is set without labels.

Addresses initial feedback.

Updates AsmPrinter code after rework to PGOAnalysisMap.

Adds minor optimizations to avoid unnecessary analysis passes.

Allows direct usage of immutable MBPI analysis.

Updates after moving PGO analysis map after each function.

Updates asm printer with encoded feature type.
llvm/docs/Extensions.rst Outdated Show resolved Hide resolved
llvm/docs/Extensions.rst Outdated Show resolved Hide resolved
@@ -1377,7 +1399,8 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
uint8_t BBAddrMapVersion = OutStreamer->getContext().getBBAddrMapVersion();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Sounds good to me.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp Outdated Show resolved Hide resolved
; PGO-BBF-NEXT: .{{.*}} {{.*}} # basic block frequency
; PGO-BRP-NEXT: .byte 2 # basic block successor count
; PGO-BRP-NEXT: .byte 1 # successor BB ID
; PGO-BRP-NEXT: .ascii "\200\200\200\200\004" # successor branch probability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do these values come from? Isn't it better to add real profile data (similar to machine-function-splitter.ll)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of that commit, it comes from BFI with not prof data. I agree putting in the prof would be better and should avoid the inconsistency I saw when running the tests on different machines. I updated the test to contain profile data (i.e. prof metadata).

; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=false -basic-block-sections=labels | FileCheck %s --check-prefixes=CHECK,NOUNIQ,BASIC
; RUN: llc < %s -mtriple=x86_64 -function-sections -unique-section-names=true -basic-block-sections=labels -split-machine-functions | FileCheck %s --check-prefixes=CHECK,UNIQ,BASIC

;; Also verify this holds for PGO extension
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we reuse the tests for this feature? Can we separate them out into their own tests. If the test is not really necessary, we can drop them too. I think one or two separate tests should be sufficient for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of them are likely overkill, especially now that the analysis map is a separate optional entry rather than embedded in the function addr map.

I've reduced it to the *-pgo-features.ll which is the primary test and *-empty-function.ll to validate that the analysis map won't exist when bb addr map does not.


;; PGO Analysis Map
; PGO-FEC-NEXT: .byte 100 # function entry count
; PGO-BBF-NEXT: .ascii "\271\235\376\332\245\200\356\017" # basic block frequency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am scratching my head around why function entry count is encoded as simple byte but basic block frequencies are encoded as ascii. Shouldn't we at least have 20 and 80 as block frequencies of two blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those (20 and 80) would likely be the relative block frequencies. The entry block frequency and function entry are not necessarily the same. Especially if Function Entry Count does not exist, they the block frequencies are synthesized. @mtrofin has a better comment about it here:

MachineBlockFrequencyInfo &MBFI =
getAnalysis<LazyMachineBlockFrequencyInfoPass>().getBFI();
// The entry count and the entry basic block frequency aren't the same. We
// want to capture "absolute" frequencies, i.e. the frequency with which a
// MBB is executed when the program is executed. From there, we can derive
// Function-relative frequencies (divide by the value for the first MBB).
// We also have the information about frequency with which functions
// were called. This helps, for example, in a type of integration tests
// where we want to cross-validate the compiler's profile with a real
// profile.
// Using double precision because uint64 values used to encode mbb
// "frequencies" may be quite large.
const double EntryCount =
static_cast<double>(MF->getFunction().getEntryCount()->getCount());
for (const auto &MBB : *MF) {
const double MBBRelFreq = MBFI.getBlockFreqRelativeToEntryBlock(&MBB);
const double AbsMBBFreq = MBBRelFreq * EntryCount;
*MBBProfileDumpFileOutput.get()
<< MF->getName() << "," << MBB.getBBID()->BaseID << ","
<< AbsMBBFreq << "\n";
}
}
.

Given that the PGOAnalysisMap supports enabling Function Entry Count and Block Frequencies separately, I feel that keeping the raw frequencies in the encoding.

I think ascii is here just because it's a big number.

Addresses typo and improves condition.
@red1bluelost red1bluelost merged commit 7df28fd into llvm:main Jan 4, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants