-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BOLT] Adding a unittest that covers Arm SPE PBT aggregation #160095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-bolt Author: Ádám Kallai (kaadam) ChangesWhen the SPE previous branch target address (named as PBT) feature is available, it records the previous branch target before the sampled operation. One SPE sample by combining this feature, has two entries. It forms a chain of two consecutive branches. Arm SPE stores the latest branch into the first entry, and the previous branch address is stored into the second entry. However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information filled with zero. Consider the following example: FROM/TO/P/-/-/1/COND/- FROM/TO/-/-/-/0//- Where the first entry is the newest pair, the second one is the oldest pair. Full diff: https://github.com/llvm/llvm-project/pull/160095.diff 1 Files Affected:
diff --git a/bolt/unittests/Profile/PerfSpeEvents.cpp b/bolt/unittests/Profile/PerfSpeEvents.cpp
index 8d023cd7b7e74..0ec4651a5ae17 100644
--- a/bolt/unittests/Profile/PerfSpeEvents.cpp
+++ b/bolt/unittests/Profile/PerfSpeEvents.cpp
@@ -161,4 +161,51 @@ TEST_F(PerfSpeEventsTestHelper, SpeBranchesWithBrstack) {
parseAndCheckBrstackEvents(1234, ExpectedSamples);
}
+TEST_F(PerfSpeEventsTestHelper, SpeBranchesWithBrstackAndPbt) {
+ // Check perf input with SPE branch events as brstack format by
+ // combining with the previous branch target address (named as PBT).
+ // Example collection command:
+ // ```
+ // perf record -e 'arm_spe_0/branch_filter=1/u' -- BINARY
+ // ```
+ // How Bolt extracts the branch events:
+ // ```
+ // perf script -F pid,brstack --itrace=bl
+ // ```
+
+ opts::ArmSPE = true;
+ opts::ReadPerfEvents = " 4567 0xa002/0xa003/PN/-/-/10/COND/- 0x0/0xa001/-/-/-/0//-\n"
+ " 4567 0xb002/0xb003/P/-/-/4/RET/- 0x0/0xb001/-/-/-/0//-\n"
+ " 4567 0xc456/0xc789/P/-/-/13/-/- 0x0/0xc123/-/-/-/0//-\n"
+ " 4567 0xd456/0xd789/M/-/-/7/RET/- 0x0/0xd123/-/-/-/0//-\n"
+ " 4567 0xe005/0xe009/P/-/-/14/RET/- 0x0/0xe001/-/-/-/0//-\n"
+ " 4567 0xd456/0xd789/M/-/-/7/RET/- 0x0/0xd123/-/-/-/0//-\n"
+ " 4567 0xf002/0xf003/MN/-/-/8/COND/- 0x0/0xf001/-/-/-/0//-\n"
+ " 4567 0xc456/0xc789/P/-/-/13/-/- 0x0/0xc123/-/-/-/0//-\n";
+
+ // ExpectedSamples contains the aggregated information about
+ // a branch {{From, To, TraceTo}, {TakenCount, MispredCount}}.
+ // If the PBT feture is availabe, an SPE sample has two entries.
+ // These two entries form a chain of two consecutive branches.
+ // However PBT lacks associated information such as branch
+ // source address, branch type, and prediction bit.
+ // For the first branch stack, please see the description above.
+ // Consider this example for PBT trace: {{0x0, 0xc123, 0xc456}, {2, 0}}.
+ // This entry has a TakenCount = 2, as we have two samples for
+ // this entry (0x0,d123) in our input. It has MispredsCount = 0,
+ // as it lacks prediction information.
+ // It also has no infromation about source branch address therefore
+ // the 'From' field filled with zero (0x0).
+ // TraceTo = 0xc456, means the execution jumped from 0xc123 to 0xc456.
+ std::vector<std::pair<Trace, TakenBranchInfo>> ExpectedSamples = {
+ {{0xa002, 0xa003, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xa001, 0xa002}, {1, 0}},
+ {{0xb002, 0xb003, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xb001, 0xb002}, {1, 0}},
+ {{0xc456, 0xc789, Trace::BR_ONLY}, {2, 0}}, {{0x0, 0xc123, 0xc456}, {2, 0}},
+ {{0xd456, 0xd789, Trace::BR_ONLY}, {2, 2}}, {{0x0, 0xd123, 0xd456}, {2, 0}},
+ {{0xe005, 0xe009, Trace::BR_ONLY}, {1, 0}}, {{0x0, 0xe001, 0xe005}, {1, 0}},
+ {{0xf002, 0xf003, Trace::BR_ONLY}, {1, 1}}, {{0x0, 0xf001, 0xf002}, {1, 0}}};
+
+ parseAndCheckBrstackEvents(4567, ExpectedSamples);
+}
+
#endif
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Adam,
Thanks for the PBT test. Regarding this comment:
// If the PBT feture is availabe, an SPE sample has two entries.
// These two entries form a chain of two consecutive branches.
With regular SPE, we only see consecutive branch pairs in code. If we try to infer any FT branches in between, we'll always get an empty set.
TMU, with the optional FEAT_SPE_PBT
, we get an BRBE-like stack with a depth of 1?
@mate-stodulka, can you confirm that and expand with some reference?
Also, just to double-check with @aaupov, since we use BOLT's LBR format (branch stacks), fall-throughts should be inferred automatically (which PBT can benefit), so no extra flags needed, correct?
Hi,
Section D17.6.3.1 provides more detail, but the relevant part is that PBT records only taken branches. That is the same stipulation as the BRBE has, so the PBT should behave as the target field of a single branch record. Which means fall-throughts between PBT and the current SPE sample's source are possible. Hope that helps clarify things! |
Thanks for confirming and the added info, @mate-stodulka. So we can reword the test and clarify that:
We have opportunities to infer FTs from (1) up to (2), regardless of what (2) is, but none between (2) and (3). |
When the SPE previous branch target address (named as PBT) feature is available, an SPE sample by combining this PBT feature, has two entries. Arm SPE records SRC/DEST addresses of the latest sampled branch operation, and it stores into the first entry. PBT records the target address of most recently taken branch in program order before the sampled operation, it places into the second entry. They are formed a chain of two consecutive branches. Where: - The previous branch operation (PBT) is always taken. - In SPE entry, the current source branch (SRC) may be either fall-through or taken. - The target address (DEST) of the recorded branch operation is always what was architecturally executed. However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information are always filled with zero in PBT entry. Therefore Bolt cannot evaluate the prediction, and source branch fields, it leaves them zero during the aggregation process. Consider the following example to see how SPE profile looks like combining with PBT: `<PID> <SRC>/<DEST>/PN/-/-/10/COND/- <NULL>/<PBT>/-/-/-/0//- 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-`
@paschalis-mpeis @mate-stodulka Thanks for the additional information. Updated the test description. |
When the SPE previous branch target address (named as PBT) feature is available, an SPE sample by combining this PBT feature, has two entries.
Arm SPE records SRC/DEST addresses of the latest sampled branch operation, and it stores into the first entry. PBT records the target address of most recently taken branch in program order before the sampled operation, it places into the second entry. They are formed a chain of two consecutive branches.
Where:
However PBT doesn't provide as much information as SPE does. It lacks those information such as the address of source branch, branch type, and prediction bit. These information are always filled with zero in PBT entry. Therefore Bolt cannot evaluate the prediction, and source branch fields, it leaves them zero during the aggregation process.
Consider the following example to see how SPE profile looks like combining with PBT:
<PID> <SRC>/<DEST>/PN/-/-/10/COND/- <NULL>/<PBT>/-/-/-/0//- 0xffff8000807216b4/0xffff800080721704/P/-/-/1/COND/- 0x0/0xffff8000807216ac/-/-/-/0//-