Skip to content

Commit

Permalink
[CSSPGO][llvm-profgen] Context-sensitive profile data generation
Browse files Browse the repository at this point in the history
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.

we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.

With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.

A breakdown of noteworthy changes:
- Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack
* Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted
* Speed up by aggregating  `HybridSample` into `AggregatedSamples`
* Added VirtualUnwinder that consumes aggregated  `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.
 Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`.
* Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.

* Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop
- `getCanonicalFnName` for callee name and name from ELF section
- Added regression test for both unwinding and profile generation

Test Plan:
ninja & ninja check-llvm

Reviewed By: hoy, wenlei, wmi

Differential Revision: https://reviews.llvm.org/D89723
  • Loading branch information
wlei-llvm committed Dec 7, 2020
1 parent 3e1cb0d commit 1f05b1a
Show file tree
Hide file tree
Showing 17 changed files with 1,463 additions and 45 deletions.
6 changes: 6 additions & 0 deletions llvm/docs/CommandGuide/llvm-profgen.rst
Expand Up @@ -36,6 +36,12 @@ OPTIONS
-------
:program:`llvm-profgen` supports the following options:

.. option:: --format=[text|binary|extbinary|compbinary|gcc]

Specify the format of the generated profile. Supported <format> are `text`,
`binary`, `extbinary`, `compbinary`, `gcc`, see `llvm-profdata` for more
descriptions of the format.

.. option:: --show-mmap-events

Print mmap events.
Expand Down
28 changes: 23 additions & 5 deletions llvm/include/llvm/ProfileData/SampleProf.h
Expand Up @@ -246,6 +246,10 @@ struct LineLocation {
return LineOffset == O.LineOffset && Discriminator == O.Discriminator;
}

bool operator!=(const LineLocation &O) const {
return LineOffset != O.LineOffset || Discriminator != O.Discriminator;
}

uint32_t LineOffset;
uint32_t Discriminator;
};
Expand Down Expand Up @@ -585,6 +589,11 @@ class FunctionSamples {
/// Return the sample count of the first instruction of the function.
/// The function can be either a standalone symbol or an inlined function.
uint64_t getEntrySamples() const {
if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
// For CS profile, if we already have more accurate head samples
// counted by branch sample from caller, use them as entry samples.
return getHeadSamples();
}
uint64_t Count = 0;
// Use either BodySamples or CallsiteSamples which ever has the smaller
// lineno.
Expand Down Expand Up @@ -680,19 +689,28 @@ class FunctionSamples {
/// Return the function name.
StringRef getName() const { return Name; }

/// Return function name with context.
StringRef getNameWithContext() const {
return FunctionSamples::ProfileIsCS ? Context.getNameWithContext() : Name;
}

/// Return the original function name.
StringRef getFuncName() const { return getFuncName(Name); }

/// Return the canonical name for a function, taking into account
/// suffix elision policy attributes.
static StringRef getCanonicalFnName(const Function &F) {
static const char *knownSuffixes[] = { ".llvm.", ".part." };
auto AttrName = "sample-profile-suffix-elision-policy";
auto Attr = F.getFnAttribute(AttrName).getValueAsString();
return getCanonicalFnName(F.getName(), Attr);
}

static StringRef getCanonicalFnName(StringRef FnName, StringRef Attr = "") {
static const char *knownSuffixes[] = { ".llvm.", ".part." };
if (Attr == "" || Attr == "all") {
return F.getName().split('.').first;
return FnName.split('.').first;
} else if (Attr == "selected") {
StringRef Cand(F.getName());
StringRef Cand(FnName);
for (const auto &Suf : knownSuffixes) {
StringRef Suffix(Suf);
auto It = Cand.rfind(Suffix);
Expand All @@ -704,11 +722,11 @@ class FunctionSamples {
}
return Cand;
} else if (Attr == "none") {
return F.getName();
return FnName;
} else {
assert(false && "internal error: unknown suffix elision policy");
}
return F.getName();
return FnName;
}

/// Translate \p Name into its original name.
Expand Down
5 changes: 4 additions & 1 deletion llvm/lib/ProfileData/SampleProfWriter.cpp
Expand Up @@ -276,7 +276,10 @@ std::error_code SampleProfileWriterCompactBinary::write(
/// it needs to be parsed by the SampleProfileReaderText class.
std::error_code SampleProfileWriterText::writeSample(const FunctionSamples &S) {
auto &OS = *OutputStream;
OS << S.getName() << ":" << S.getTotalSamples();
if (FunctionSamples::ProfileIsCS)
OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
else
OS << S.getName() << ":" << S.getTotalSamples();
if (Indent == 0)
OS << ":" << S.getHeadSamples();
OS << "\n";
Expand Down
Binary file not shown.
@@ -0,0 +1,7 @@
Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/inline-cs-noprobe.perfbin


40067e
5541f689495641d7
0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x40069b/0x400670/M/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0 0x4006c8/0x40067e/P/-/-/0
Binary file not shown.
24 changes: 24 additions & 0 deletions llvm/test/tools/llvm-profgen/Inputs/noinline-cs-noprobe.perfscript
@@ -0,0 +1,24 @@
Using perf wrapper that supports hot-text. Try perf.real if you encounter any issues.
PERF_RECORD_MMAP2 2854748/2854748: [0x400000(0x1000) @ 0 00:1d 123291722 526021]: r-xp /home/noinline-cs-noprobe.perfbin

4005dc
400634
400684
7f68c5788793
0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0

// Test for leaf frame ending up in prolog
4005b0
400684
7f68c5788793
0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005c8/0x4005dc/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0 0x400645/0x4005ff/P/-/-/0 0x400637/0x400645/P/-/-/0 0x4005e9/0x400634/P/-/-/0 0x4005d7/0x4005e5/P/-/-/0 0x40062f/0x4005b0/P/-/-/0

// Call stack:
// 4005b0 -> start addr of bar
// 400684 -> address in main
// LBR Entry: | Source | Target
// 0x40062f/0x4005b0/P/-/-/0 | callq -132 <bar> | start addr of bar
// 0x400645/0x4005ff/P/-/-/0 | jmp -75 <foo+0xf> | movl -8(%rbp), %eax
// 0x400637/0x400645/P/-/-/0 | jmp 9 <foo+0x55> | jmp -75 <foo+0xf>
// 0x4005e9/0x400634/P/-/-/0 | (bar)retq | next addr of [callq -132 <bar>]
// 0x4005d7/0x4005e5/P/-/-/0 | jmp 9 <bar+0x35> | movl -4(%rbp), %eax
47 changes: 47 additions & 0 deletions llvm/test/tools/llvm-profgen/inline-cs-noprobe.test
@@ -0,0 +1,47 @@
; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-noprobe.perfscript --binary=%S/Inputs/inline-cs-noprobe.perfbin --output=%t --show-unwinder-output | FileCheck %s --check-prefix=CHECK-UNWINDER
; RUN: FileCheck %s --input-file %t

; CHECK:[main:1 @ foo]:44:0
; CHECK: 2.2: 14
; CHECK: 3: 15
; CHECK: 3.2: 14 bar:14
; CHECK: 3.4: 1
; CHECK:[main:1 @ foo:3.2 @ bar]:14:0
; CHECK: 1: 14

; CHECK-UNWINDER: Binary(inline-cs-noprobe.perfbin)'s Range Counter:
; CHECK-UNWINDER: main:1 @ foo:3.2 @ bar
; CHECK-UNWINDER: (6af, 6bb): 14
; CHECK-UNWINDER: main:1 @ foo
; CHECK-UNWINDER: (670, 6ad): 1
; CHECK-UNWINDER: (67e, 69b): 1
; CHECK-UNWINDER: (67e, 6ad): 13
; CHECK-UNWINDER: (6bd, 6c8): 14

; CHECK-UNWINDER: Binary(inline-cs-noprobe.perfbin)'s Branch Counter:
; CHECK-UNWINDER: main:1 @ foo
; CHECK-UNWINDER: (69b, 670): 1
; CHECK-UNWINDER: (6c8, 67e): 15

; original code:
; clang -O3 -g test.c -o a.out
#include <stdio.h>

int bar(int x, int y) {
if (x % 3) {
return x - y;
}
return x + y;
}

void foo() {
int s, i = 0;
while (i++ < 4000 * 4000)
if (i % 91) s = bar(i, s); else s += 30;
printf("sum is %d\n", s);
}

int main() {
foo();
return 0;
}
60 changes: 60 additions & 0 deletions llvm/test/tools/llvm-profgen/noinline-cs-noprobe.test
@@ -0,0 +1,60 @@
; RUN: llvm-profgen --perfscript=%S/Inputs/noinline-cs-noprobe.perfscript --binary=%S/Inputs/noinline-cs-noprobe.perfbin --output=%t --show-unwinder-output | FileCheck %s --check-prefix=CHECK-UNWINDER
; RUN: FileCheck %s --input-file %t

; CHECK:[main:1 @ foo:3 @ bar]:12:3
; CHECK: 0: 3
; CHECK: 1: 3
; CHECK: 2: 2
; CHECK: 4: 1
; CHECK: 5: 3
; CHECK:[main:1 @ foo]:9:0
; CHECK: 2: 3
; CHECK: 3: 3 bar:3

; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Range Counter:
; CHECK-UNWINDER: main:1 @ foo
; CHECK-UNWINDER: (5ff, 62f): 3
; CHECK-UNWINDER: (634, 637): 3
; CHECK-UNWINDER: (645, 645): 3
; CHECK-UNWINDER: main:1 @ foo:3 @ bar
; CHECK-UNWINDER: (5b0, 5c8): 1
; CHECK-UNWINDER: (5b0, 5d7): 2
; CHECK-UNWINDER: (5dc, 5e9): 1
; CHECK-UNWINDER: (5e5, 5e9): 2

; CHECK-UNWINDER: Binary(noinline-cs-noprobe.perfbin)'s Branch Counter:
; CHECK-UNWINDER: main:1 @ foo
; CHECK-UNWINDER: (62f, 5b0): 3
; CHECK-UNWINDER: (637, 645): 3
; CHECK-UNWINDER: (645, 5ff): 3
; CHECK-UNWINDER: main:1 @ foo:3 @ bar
; CHECK-UNWINDER: (5c8, 5dc): 2
; CHECK-UNWINDER: (5d7, 5e5): 2
; CHECK-UNWINDER: (5e9, 634): 3





; original code:
; clang -O0 -g test.c -o a.out
#include <stdio.h>

int bar(int x, int y) {
if (x % 3) {
return x - y;
}
return x + y;
}

void foo() {
int s, i = 0;
while (i++ < 4000 * 4000)
if (i % 91) s = bar(i, s); else s += 30;
printf("sum is %d\n", s);
}

int main() {
foo();
return 0;
}
2 changes: 2 additions & 0 deletions llvm/tools/llvm-profgen/CMakeLists.txt
Expand Up @@ -7,6 +7,7 @@ set(LLVM_LINK_COMPONENTS
MC
MCDisassembler
Object
ProfileData
Support
Symbolize
)
Expand All @@ -15,4 +16,5 @@ add_llvm_tool(llvm-profgen
llvm-profgen.cpp
PerfReader.cpp
ProfiledBinary.cpp
ProfileGenerator.cpp
)

0 comments on commit 1f05b1a

Please sign in to comment.