Skip to content

Conversation

kaadam
Copy link
Contributor

@kaadam kaadam commented Sep 22, 2025

When the SPE previous branch target address (named as PBT) feature is available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch operation, and it stores into the first entry. PBT records the target address of most recently taken branch in program order before the sampled operation, it places into the second entry. They are formed a chain of two consecutive branches.

Where:

  • The previous branch operation (PBT) is always taken.
  • In SPE entry, the current source branch (SRC) may be either fall-through or taken, and the target address (DEST) of the recorded branch operation is always what was architecturally executed.

However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information are always filled with zero in PBT entry. Therefore Bolt cannot evaluate the prediction, and source branch fields, it leaves them zero during the aggregation process.

Consider the following example to see how SPE profile looks like combining with PBT:

<PID> <SRC>/<DEST>/PN/-/-/10/COND/- <NULL>/<PBT>/-/-/-/0//- 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-

@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-bolt

Author: Ádám Kallai (kaadam)

Changes

When the SPE previous branch target address (named as PBT) feature is available, it records the previous branch target before the sampled operation.

One SPE sample by combining this feature, has two entries. It forms a chain of two consecutive branches. Arm SPE stores the latest branch into the first entry, and the previous branch address is stored into the second entry. However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information filled with zero.

Consider the following example:

FROM/TO/P/-/-/1/COND/- FROM/TO/-/-/-/0//-
0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-

Where the first entry is the newest pair, the second one is the oldest pair.


Full diff: https://github.com/llvm/llvm-project/pull/160095.diff

1 Files Affected:

  • (modified) bolt/unittests/Profile/PerfSpeEvents.cpp (+47)
diff --git a/bolt/unittests/Profile/PerfSpeEvents.cpp b/bolt/unittests/Profile/PerfSpeEvents.cpp
index 8d023cd7b7e74..0ec4651a5ae17 100644
--- a/bolt/unittests/Profile/PerfSpeEvents.cpp
+++ b/bolt/unittests/Profile/PerfSpeEvents.cpp
@@ -161,4 +161,51 @@ TEST_F(PerfSpeEventsTestHelper, SpeBranchesWithBrstack) {
   parseAndCheckBrstackEvents(1234, ExpectedSamples);
 }
 
+TEST_F(PerfSpeEventsTestHelper, SpeBranchesWithBrstackAndPbt) {
+  // Check perf input with SPE branch events as brstack format by
+  // combining with the previous branch target address (named as PBT).
+  // Example collection command:
+  // ```
+  // perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY
+  // ```
+  // How Bolt extracts the branch events:
+  // ```
+  // perf script -F pid,brstack --itrace=bl
+  // ```
+
+  opts::ArmSPE = true;
+  opts::ReadPerfEvents = "  4567  0xa002/0xa003/PN/-/-/10/COND/- 0x0/0xa001/-/-/-/0//-\n"
+                         "  4567  0xb002/0xb003/P/-/-/4/RET/- 0x0/0xb001/-/-/-/0//-\n"
+                         "  4567  0xc456/0xc789/P/-/-/13/-/- 0x0/0xc123/-/-/-/0//-\n"
+                         "  4567  0xd456/0xd789/M/-/-/7/RET/- 0x0/0xd123/-/-/-/0//-\n"
+                         "  4567  0xe005/0xe009/P/-/-/14/RET/- 0x0/0xe001/-/-/-/0//-\n"
+                         "  4567  0xd456/0xd789/M/-/-/7/RET/- 0x0/0xd123/-/-/-/0//-\n"
+                         "  4567  0xf002/0xf003/MN/-/-/8/COND/- 0x0/0xf001/-/-/-/0//-\n"
+                         "  4567  0xc456/0xc789/P/-/-/13/-/- 0x0/0xc123/-/-/-/0//-\n";
+
+  // ExpectedSamples contains the aggregated information about
+  // a branch {{From, To, TraceTo}, {TakenCount, MispredCount}}.
+  // If the PBT feture is availabe, an SPE sample has two entries.
+  // These two entries form a chain of two consecutive branches.
+  // However PBT lacks associated information such as branch
+  // source address, branch type, and prediction bit.
+  // For the first branch stack, please see the description above.
+  // Consider this example for PBT trace: {{0x0, 0xc123, 0xc456}, {2, 0}}.
+  // This entry has a TakenCount = 2, as we have two samples for
+  // this entry (0x0,d123) in our input. It has MispredsCount = 0,
+  // as it lacks prediction information.
+  // It also has no infromation about source branch address therefore
+  // the 'From' field filled with zero (0x0).
+  // TraceTo = 0xc456, means the execution jumped from 0xc123 to 0xc456.
+  std::vector<std::pair<Trace, TakenBranchInfo>> ExpectedSamples = {
+      {{0xa002, 0xa003, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xa001, 0xa002}, {1, 0}},
+      {{0xb002, 0xb003, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xb001, 0xb002}, {1, 0}},
+      {{0xc456, 0xc789, Trace::BR_ONLY}, {2, 0}}, {{0x0, 0xc123, 0xc456}, {2, 0}},
+      {{0xd456, 0xd789, Trace::BR_ONLY}, {2, 2}}, {{0x0, 0xd123, 0xd456}, {2, 0}},
+      {{0xe005, 0xe009, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xe001, 0xe005}, {1, 0}},
+      {{0xf002, 0xf003, Trace::BR_ONLY}, {1, 1}}, {{0x0, 0xf001, 0xf002}, {1, 0}}};
+
+  parseAndCheckBrstackEvents(4567, ExpectedSamples);
+}
+
 #endif

Copy link

github-actions bot commented Sep 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Adam,

Thanks for the PBT test. Regarding this comment:

// If the PBT feture is availabe, an SPE sample has two entries.
// These two entries form a chain of two consecutive branches.

With regular SPE, we only see consecutive branch pairs in code. If we try to infer any FT branches in between, we'll always get an empty set.

TMU, with the optional FEAT_SPE_PBT, we get an BRBE-like stack with a depth of 1?
@mate-stodulka, can you confirm that and expand with some reference?

Also, just to double-check with @aaupov, since we use BOLT's LBR format (branch stacks), fall-throughts should be inferred automatically (which PBT can benefit), so no extra flags needed, correct?

@mate-stodulka
Copy link

Hi,
The architecture in the Arm Architecture Reference Manual version L.b defines the Previous Branch Target as follows:

FEAT_SPE_PBT provides support for generating a packet that provides the target address for the previous taken branch.

Section D17.6.3.1 provides more detail, but the relevant part is that PBT records only taken branches. That is the same stipulation as the BRBE has, so the PBT should behave as the target field of a single branch record. Which means fall-throughts between PBT and the current SPE sample's source are possible. Hope that helps clarify things!

@paschalis-mpeis
Copy link
Member

Thanks for confirming and the added info, @mate-stodulka.

So we can reword the test and clarify that:

  1. The PBT entry is always taken, acting as a branch-stack like structure of depth 1.
  2. The current source branch may be either fall-through or taken.
  3. The recorded dest branch is always what was architecturally executed.

We have opportunities to infer FTs from (1) up to (2), regardless of what (2) is, but none between (2) and (3).

When the SPE previous branch target address (named as PBT) feature is
available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch operation,
and it stores into the first entry.
PBT records the target address of most recently taken branch in program order
before the sampled operation, it places into the second entry.
They are formed a chain of two consecutive branches.

Where:
- The previous branch operation (PBT) is always taken.
- In SPE entry, the current source branch (SRC) may be either fall-through or taken.
- The target address (DEST) of the recorded branch operation is always
  what was architecturally executed.

However PBT doesn't provide as much information as SPE does. It lacks those information
such as the address of source branch, branch type, and prediction bit.
These information are always filled with zero in PBT entry.
Therefore Bolt cannot evaluate the prediction, and source branch fields,
it leaves them zero during the aggregation process.

Consider the following example to see how SPE profile looks like combining with PBT:

`<PID> <SRC>/<DEST>/PN/-/-/10/COND/- <NULL>/<PBT>/-/-/-/0//-
0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/-  0x0/0xffff8000807216ac/-/-/-/0//-`
@kaadam
Copy link
Contributor Author

kaadam commented Sep 30, 2025

@paschalis-mpeis @mate-stodulka Thanks for the additional information. Updated the test description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants