Skip to content

traced_perf - support setting period/frequency for follower events.#5526

Open
abhmen01 wants to merge 1 commit intogoogle:mainfrom
pan-:support_follower_event_period_in_traced_perf
Open

traced_perf - support setting period/frequency for follower events.#5526
abhmen01 wants to merge 1 commit intogoogle:mainfrom
pan-:support_follower_event_period_in_traced_perf

Conversation

@abhmen01
Copy link
Copy Markdown

@abhmen01 abhmen01 commented Apr 21, 2026

This change adds support for configuring the sampling period or frequency for follower events.

Previously, only the group leader event could be configured to trigger samples. With this change, follower events can also independently trigger samples by specifying the 'period' or 'frequency' field in the config.

This aligns Perfetto's behavior more closely with perf semantics, e.g., perf record -e '{context-switches,cpu-clock/period=<SAMPLE_PERIOD>}', and allows more flexible PMU event sampling.

@abhmen01 abhmen01 requested a review from a team as a code owner April 21, 2026 18:26
@primiano primiano requested a review from rsavitski April 22, 2026 21:10
Copy link
Copy Markdown
Member

@rsavitski rsavitski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this patch is motivated due to a misunderstanding of the current config, and lacks a big chunk of implementation of the buffer sharing part.

Currently, each "perf.linux" data source is one recording group that uses leader sampling (https://man7.org/linux/man-pages/man1/perf-list.1.html#LEADER_SAMPLING). So only the leader sampling is by design.

If you need to independently record several counters, you can specify several "linux.perf" data sources in your config. That's the equivalent of "perf record -e A,B,C".


// redirect the follower events to the leader's buffer.
for (size_t i = 0; i < follower_fds.size(); ++i) {
ioctl(follower_fds[i].get(), PERF_EVENT_IOC_SET_OUTPUT, timebase_fd.get());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very incomplete. It redirects events, but the patch doesn't have any code for adjusting how events are read when using either the sampling or the polling interface.

I'd expect a lot more code for any buffer sharing. In particular extra IDs in the samples (see PERF_SAMPLE_STREAM_ID) and demultiplexing them on the reading side.

What kind of config were you testing with?

Copy link
Copy Markdown
Author

@abhmen01 abhmen01 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the grouped case, I did not think extra demultiplexing was needed, because we always use PERF_FORMAT_GROUP, so the whole group is read together regardless of which event triggered the overflow. In that path, PERF_SAMPLE_STREAM_ID should not be required for decoding, which is why I did not add it.

On further review, I think the non-grouped case still needs to be handled explicitly, even though PERF_FORMAT_GROUP is always enabled during configuration. There we would need to maintain a mapping from event index to stream ID and use that during decoding.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the events reading code to use the stream id to recognize the event which produced the sample.

@abhmen01
Copy link
Copy Markdown
Author

The current behavior is indeed leader sampling, and I agree that this is how perfetto works today.
The intent of this patch is to extend that behavior, not to make events independent. In particular, this is not trying to model perf record -e A,B, where each event is recorded separately. The goal is to keep the events in a group and read them together, but allow a follower event to trigger sampling when its own period overflows.

So today the behavior is effectively equivalent to: perf record -e '{A,B}':S
where only the leader can trigger samples, I presume.

With this patch, if a period is specified for follower B, the intended behavior becomes: perf record -e '{A,B/period=100/}'

That is, the group is still read as a unit regardless of which event triggered the sample, but sampling is no longer restricted to the leader only. So I think the key difference is:

current behavior: grouped read, leader-only sample trigger
intended behavior with this patch: grouped read, leader or follower can trigger if that follower has its own sampling period configured.

This change adds support for configuring the sampling period or frequency
for follower  events.

Previously, only the group leader event could be configured to trigger
samples. With this change, follower events can also independently trigger
samples by specifying the 'period' or 'frequency' field in the config.

This aligns Perfetto's behavior more closely with `perf` semantics,
e.g., `perf record -e {context-switching,cpu-clock/period=<SAMPLE_PERIOD>}`,
and allows more flexible event sampling.
@abhmen01 abhmen01 force-pushed the support_follower_event_period_in_traced_perf branch from 93a02bd to 6849261 Compare April 29, 2026 19:09
@abhmen01
Copy link
Copy Markdown
Author

@rsavitski
A concrete use case is periodic IPC measurement for a process. For example:

perf record -e '{sched:sched_switch/period=1/,cpu-cycles/period=10000/,inst_retired}'

This keeps the given events in one group so they are read together as a consistent snapshot. sched:sched_switch/period=1/ can trigger a sample on every context switch, while cpu-cycles/period=10000/ can also trigger a sample every 10,000 cycles. At each sample point, both the cycle count and retired-instruction count are captured together, which allows IPC to be computed periodically.

The equivalent perfetto configuration will be:

data_sources: {
    config {
        name: "linux.perf"
        perf_event_config {
            timebase {
                period: 1
                tracepoint {
                    name: "sched/sched_switch"
                }
                name: "sched_switch"
            }
            followers: {
                period: 10000
                raw_event {
                    type: 8
                    config: 17
                }
                name: "CPU_CYCLES"
            }
            followers: {
                raw_event {
                    type: 8
                    config: 8
                }
                name: "INST_RETIRED"
            }
            ignore_open_failure: true

        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants