Skip to content

Fix: perf argument ordering bug silently disabling the profiler and losing collected data#69

Merged
artursarlo merged 1 commit into
masterfrom
fix/discover_perf_event_logic
Apr 10, 2026
Merged

Fix: perf argument ordering bug silently disabling the profiler and losing collected data#69
artursarlo merged 1 commit into
masterfrom
fix/discover_perf_event_logic

Conversation

@artursarlo
Copy link
Copy Markdown

Fix: perf argument ordering bug silently disabling the profiler and losing collected data

Summary

A subtle but impactful argument ordering bug in PerfProcess._get_perf_cmd() caused two compounding issues:

  1. Perf event discovery always failed on certain hosts, disabling the perf profiler entirely.
  2. Collected data was silently lost on hosts where discovery accidentally succeeded, because the wrong process was being profiled.

The Bug

PerfProcess._get_perf_cmd() assembled the final perf record command by concatenating argument groups in this order:

[perf_path(), "record", "-F", ..., "-m", ...]
+ extra_args     # may include "--" followed by a command (e.g. "-- sleep 0.5")
+ self._pid_args # contains "-a" (system-wide) or "--pid <pids>"
+ self._cgroup_args

During event discovery, extra_args includes ["--", "sleep", "0.5"] — the -- separator tells perf record that everything after it is the command to trace. This means self._pid_args (which contains -a for system-wide profiling) was appended after the -- separator, effectively being passed as an argument to sleep rather than to perf:

# What was actually executed:
perf record -F 11 -g -o /tmp/perf_default_event.fp --switch-output=15s,signal --switch-max-files=1 -m 129 -- sleep 0.5 -a
#                                                                                                                        ^^
#                                                                              passed to sleep, not perf ────────────────┘

sleep does not accept -a as a valid argument and exits immediately with a non-zero code before any samples can be collected.


Impact

Impact 1: Perf profiler silently disabled (consistent failure)

discover_appropriate_perf_event() runs perf record -- sleep 0.5 for each supported event type (default, cpu-clock, task-clock) and checks whether any samples were collected. Because sleep exited immediately due to the invalid -a flag, perf record also exited with zero samples every time. The discovery function exhausted all events and raised PerfNoSupportedEvent, logging:

[WARNING] gprofiler.profilers.perf: Failed to determine perf event to use.
This is likely due to GPU machine compatibility issues where perf segfaults.
Perf profiler will be disabled. Other profilers will continue.

This was consistently triggered on hosts where the processes_to_profile was None (system-wide mode), causing the entire perf profiler to be permanently disabled for those hosts.

Impact 2: Collected data silently lost (intermittent failure)

In normal profiling runs (not discovery), extra_args does not contain --. However, self._pid_args was still being placed after extra_args, meaning -a or --pid <pids> came after all event flags (-e cycles, --call-graph dwarf,...). While perf tolerates this in many cases, the ordering is incorrect per perf's argument parsing rules, and on some kernel/perf versions this caused profiling samples to be dropped or the wrong set of processes to be targeted — resulting in inconsistent, partial data collection that was invisible to the caller.


Root Cause

The _get_perf_cmd method in PerfProcess concatenated extra_args before self._pid_args:

# BEFORE (buggy)
+ extra_args      # could contain "--" separator
+ self._pid_args  # "-a" or "--pid X" — ended up after "--"

The Fix

Two files were changed:

gprofiler/utils/perf_process.py — root fix

Swapped the order so pid_args always comes before extra_args:

# AFTER (fixed)
+ self._pid_args  # "-a" or "--pid X" — always before "--"
+ extra_args      # events and optional "-- <cmd>" for discovery

This guarantees the resulting command is always well-formed:

perf record -F 11 -g -o ... -m 129   -a   -- sleep 0.5
                                      ^^
                              correctly placed before "--"

gprofiler/utils/perf.py — cleanup

Removed the now-unnecessary discovery_pids variable and simplified the PerfProcess construction in discover_appropriate_perf_event to always pass processes_to_profile=None (system-wide), relying on the corrected ordering in _get_perf_cmd to place -a correctly.


How It Was Found

The bug was surfaced by a host emitting the "Failed to determine perf event to use" warning. Manual inspection of the perf data files on the host using perf evlist confirmed events were being recorded (cycles:P), but perf script output showed no stack frames — which led to tracing the discovery logic and ultimately uncovering the argument ordering bug.


Files Changed

  • gprofiler/utils/perf_process.py — swapped extra_args and self._pid_args order in _get_perf_cmd()
  • gprofiler/utils/perf.py — cleaned up discover_appropriate_perf_event() to always use processes_to_profile=None for discovery

Comment thread gprofiler/utils/perf_process.py
Comment thread gprofiler/utils/perf_process.py
@artursarlo
Copy link
Copy Markdown
Author

artursarlo commented Apr 9, 2026

Test for cgropup based profiling

Configuration Under Test

--perf-use-cgroups
--perf-max-cgroups 10
--perf-max-docker-containers 1   (initial runs: 0/1, final test: 1 + heartbeat aggressive)
--skip-system-profilers-above 600
--profiling-duration 60
--perf-mode fp

Validation Checklist

✅ Check 1: -a placed before -- in discovery command

Expected: perf record ... -a -- sleep 0.5
Observed (23:45:32):

perf record -F 11 -g -o .../perf_default_event.fp --switch-output=15s,signal \
  --switch-max-files=1 -m 129 -a -- sleep 0.5
                             ^^
                      -a correctly before --

Result: ✅ PASS — -a appears before --, sleep receives no spurious arguments.


✅ Check 2: Perf event discovery succeeds

Expected: Perf event PERF_DEFAULT discovery successful, found N samples
Observed (23:45:32–23:45:35):

[DEBUG] gprofiler.utils.perf: Perf event PERF_DEFAULT discovery: parsing output stream
[DEBUG] gprofiler.utils.perf_process: perf (fp mode) run output
        perf_stderr: rounding mmap pages size to 1024K (256 pages)
                     Captured and wrote 1.431 MB ...perf_default_event.fp.<timestamp>
[DEBUG] gprofiler.utils.perf: Perf event PERF_DEFAULT discovery successful, found 13 samples
[INFO]  gprofiler.utils.perf_process: Stopped perf (fp mode) (exit_code=0)
[DEBUG] gprofiler.profilers.perf: Discovered perf event (discovered_perf_event=PERF_DEFAULT)

Result: ✅ PASS — 13 samples collected, PERF_DEFAULT (hardware cycles) selected. No fallback to software events needed.


✅ Check 3: Absence of "Failed to determine perf event to use"

Expected: No warning about failing to determine perf event.
Observed: The warning Failed to determine perf event to use does not appear anywhere in the log during this run.
Result: ✅ PASS — The bug that was previously disabling the profiler is resolved.


✅ Check 4: Cgroup v2 detection and Docker container discovery

Expected: Cgroups discovered and mapped to cgroup v2 paths under /sys/fs/cgroup/system.slice/.
Observed (23:45:37):

[DEBUG] Found cgroup v2 path for container cdfc2f7c...: 
        /sys/fs/cgroup/system.slice/docker-cdfc2f7c....scope (CPU: 5.4%)
[DEBUG] Found cgroup v2 path for container 0a47afe8...: 
        /sys/fs/cgroup/system.slice/docker-0a47afe8....scope (CPU: 4.23%)
[DEBUG] Found cgroup v2 path for container 3ef0bad9...: 
        /sys/fs/cgroup/system.slice/docker-3ef0bad9....scope (CPU: 3.9%)
... (21 containers total, ranked by CPU usage)
[DEBUG] Found 88 cgroups to analyze
[DEBUG] Analyzed 88 cgroups with resource data
[INFO]  Using individual Docker container profiling: 21 containers, 1 other cgroups
[INFO]  Using cgroup-based profiling with 22 top cgroups: [...]

Result: ✅ PASS — Cgroup v2 correctly detected; 21 Docker containers + 1 system cgroup = 22 total cgroups selected by CPU usage.


✅ Check 5: -a placed before -e and -G in the actual profiling command

Expected: perf record ... -a -e cycles ... -G <cgroups>
Observed (23:45:39):

perf record -F 11 -g -o .../perf.fp --switch-output=90.0s,signal \
  --switch-max-files=1 -m 129 \
  -a \
  -e cycles -e cycles ... (×22, one per cgroup) \
  -G system.slice/docker-cdfc2f7c....scope,system.slice/docker-0a47afe8....scope,...
  ^^
  -a correctly before -e and -G

Result: ✅ PASS — Argument order is correct: pid_args (-a)extra_args (-e ...)cgroup_args (-G ...).


✅ Check 6: Perf starts and continuously dumps data

Expected: Perf starts successfully and periodically dumps output files.
Observed:

[INFO]  23:45:39 - Started perf (fp mode)
[DEBUG] 23:46:39 - perf (fp mode) run output: Dump .../perf.fp.2026040923463936
[DEBUG] 23:48:29 - perf (fp mode) run output: Dump .../perf.fp.2026040923482937
[DEBUG] 23:49:34 - perf (fp mode) run output: Dump .../perf.fp.2026040923493440
[DEBUG] 23:50:38 - perf (fp mode) run output: Dump .../perf.fp.2026040923503844

Result: ✅ PASS — Perf is running continuously, switching output files every ~60s as expected. Data is being collected.

Screenshot 2026-04-09 at 20 53 03

@artursarlo artursarlo merged commit 56273b6 into master Apr 10, 2026
12 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants