Skip to content

Commit

Permalink
[BOLT] Improve ICP activation policy and hot jt processing
Browse files Browse the repository at this point in the history
Summary:
Previously, ICP worked with a budget of N targets to convert to
direct calls. As long as the frequency of up to N of the hottest targets
surpassed a given fraction (threshold) of the total frequency, say, 90%,
then the optimization would convert a number of targets (up to N) to
direct calls. Otherwise, it would completely abort processing this call
site. The intent was to convert a given fraction of the indirect call
site frequency to use direct calls instead, but this ends up being a
"all or nothing" strategy.

In this patch we change this to operate with the same strategy seem in
LLVM's ICP, with two thresholds. The idea is that the hottest target of
an indirect call site will be compared against these two thresholds: one
checks its frequency relative to the total frequency of the original
indirect call site, and the other checks its frequency relative to the
remaining, unconverted targets (excluding the hottest targets that were
already converted to direct calls). The remaining threshold is typically
set higher than the total threshold. This allows us more control over
ICP.

I expose two pairs of knobs, one for jump tables and another for
indirect calls.

To improve the promotion of hot jump table indices when we have memory
profile, I also fix a bug that could cause us to promote extra indices
besides the hottest ones as seen in the memory profile. When we have the
memory profile, I reapply the dual threshold checks to the memory
profile which specifies exactly which indices are hot. I then update N,
the number of targets to be promoted, based on this new information, and
update frequency information.

To allow us to work with smaller profiles, I also created an option in
perf2bolt to filter out memory samples outside the statically allocated
area of the binary (heap/stack). This option is on by default.

(cherry picked from FBD15187832)
  • Loading branch information
rafaelauler authored and maksfb committed May 2, 2019
1 parent fee6123 commit f1fde44
Show file tree
Hide file tree
Showing 3 changed files with 220 additions and 88 deletions.
21 changes: 16 additions & 5 deletions bolt/src/DataAggregator.cpp
Expand Up @@ -54,6 +54,13 @@ IgnoreBuildID("ignore-build-id",
cl::init(false),
cl::cat(AggregatorCategory));

static cl::opt<bool>
FilterMemProfile("filter-mem-profile",
cl::desc("if processing a memory profile, filter out stack or heap accesses that "
"won't be useful for BOLT to reduce profile file size"),
cl::init(true),
cl::cat(AggregatorCategory));

static cl::opt<unsigned>
HeatmapBlock("block-size",
cl::desc("size of a heat map block in bytes (default 64)"),
Expand Down Expand Up @@ -1163,14 +1170,12 @@ std::error_code DataAggregator::parseBranchEvents() {
<< NumEntries << " LBR entries\n";
if (NumTotalSamples) {
if (NumSamples && NumSamplesNoLBR == NumSamples) {
if (errs().has_colors())
errs().changeColor(raw_ostream::RED);
// Note: we don't know if perf2bolt is being used to parse memory samples
// at this point. In this case, it is OK to parse zero LBRs.
errs() << "PERF2BOLT-WARNING: all recorded samples for this binary lack "
"LBR. Record profile with perf record -j any or run perf2bolt "
"in no-LBR mode with -nl (the performance improvement in -nl "
"mode may be limited)\n";
if (errs().has_colors())
errs().resetColor();
} else {
const auto IgnoredSamples = NumTotalSamples - NumSamples;
const auto PercentIgnored = 100.0f * IgnoredSamples / NumTotalSamples;
Expand Down Expand Up @@ -1344,11 +1349,17 @@ void DataAggregator::processMemEvents() {
if (MemFunc) {
MemName = MemFunc->getNames()[0];
Addr -= MemFunc->getAddress();
} else if (Addr) { // TODO: filter heap/stack/nulls here?
} else if (Addr) {
if (auto *BD = BC->getBinaryDataContainingAddress(Addr)) {
MemName = BD->getName();
Addr -= BD->getAddress();
} else if (opts::FilterMemProfile) {
// Filter out heap/stack accesses
continue;
}
} else if (opts::FilterMemProfile) {
// Filter out nulls
continue;
}

const Location FuncLoc(!FuncName.empty(), FuncName, PC);
Expand Down

0 comments on commit f1fde44

Please sign in to comment.