[llvm-profgen] [SPGO] Samples incorrectly discarded when multiple processes of target binary run concurrently

When interpreting a perf profile, profgen only stores a single `BaseAddress` value in the `ProfiledBinary` class:

https://github.com/llvm/llvm-project/blob/15d11ebc84886e06127750ef5bea60ba1d36798a/llvm/tools/llvm-profgen/ProfiledBinary.h#L200-L201

However, it is possible for a binary to be loaded at multiple addresses at once under different process IDs due to address space layout randomisation. The result of this is that when running multiple instances of the same binary concurrently, new processes will overwrite the `BaseAddress` before all the samples from the last process were processed.

For example, using this program:
```c
#include <stdint.h>

// Loop for a while so we see samples in the function (compiled at -O0)
void loop1() {
    for (uint64_t i = 0; i < 10000000000; i++) {}
}

// Slightly modified to make sure there isn't any kind of linker merging
void loop2() {
    for (uint64_t i = 1; i < 10000000001; i++) {}
}

int main(int argc, char *argv[]) {
    // Use CLI argument to choose which loop to run, so we can distinguish which process samples were collected from

    if (argc >= 2 && argv[1][0] == '1') {
        loop1();
    }
    if (argc >= 2 && argv[1][0] == '2') {
        loop2();
    }
}
```
compiled at `-O0` so the loop doesn't get optimised out:
```bash
gcc -g test.c -o test # I happened to use GCC but clang should work the same
```
then if we run one after the other like this:
```bash
./test 1
sleep 1
./test2
```
and collect a perf profile:
```bash
taskset -c 1 perf record -o perf.data --freq=max -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp bash run_with_delay.sh
```
then convert to proftext format:
```
llvm-profgen --binary=./test --format=text -output=output.proftext --perfdata=perf.data
```
then as expected, we get samples from both functions from each process.
```
loop1:16274349:0
 0: 0
 1: 774969
 2: 0
loop2:16274349:0
 0: 0
 1: 774969
 2: 0
```
This is because when the mmap event from the second process being loaded is interpreted by profgen, and the base address is updated, the first process has already exited, so we don't miss any samples.

However, if we instead run both processes concurrently:
```bash
./test 1 & ./test 2
```
then we get this (note that the order of functions is now reversed):
```
loop2:16276302:0
 0: 0
 1: 775062
 2: 0
loop1:3906:0
 0: 0
 1: 186
 2: 0
```

The first process loads, and profgen reads its mmap event and sets the base address, and begins collecting samples. However, after 186 samples were collected, the second process loads, overwriting the base address. Now, any more samples collected from the first process appear to be out of the range of the binary, and so are discarded. For the rest of the execution, profgen only counts samples from the second process, so while the count for the second process is correct, the count for the first process is too low.

This is an issue when profiling builds, such as running `perf record` on `ninja` or `make`, which appears to work but silently undercounts blocks.

As far as I can tell, the solution to this is to use a map between PID and base address, then use the PID of each sample to disambiguate which base address to use, which appears to fix the problem, and samples from both processes get included in the final sample.

Before submitting a PR, it might be useful to get some clarification on whether this is the intended behaviour: should samples from multiple processes be merged together as if they were collected from one process, or is profgen intended to produce a single profile per process to be merged later? Since there already seems to be partial support for generating profiles with data aggregated from multiple processes, I'm leaning towards this being a supported use case, but the documentation is sparse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[llvm-profgen] [SPGO] Samples incorrectly discarded when multiple processes of target binary run concurrently #164581

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// The runtime base address that the first executable segment is loaded at.
	uint64_t BaseAddress = 0;

[llvm-profgen] [SPGO] Samples incorrectly discarded when multiple processes of target binary run concurrently #164581

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions