Fix: perf argument ordering bug silently disabling the profiler and losing collected data#69
Merged
Merged
Conversation
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fix: perf argument ordering bug silently disabling the profiler and losing collected data
Summary
A subtle but impactful argument ordering bug in
PerfProcess._get_perf_cmd()caused two compounding issues:The Bug
PerfProcess._get_perf_cmd()assembled the finalperf recordcommand by concatenating argument groups in this order:During event discovery,
extra_argsincludes["--", "sleep", "0.5"]— the--separator tellsperf recordthat everything after it is the command to trace. This meansself._pid_args(which contains-afor system-wide profiling) was appended after the--separator, effectively being passed as an argument tosleeprather than toperf:sleepdoes not accept-aas a valid argument and exits immediately with a non-zero code before any samples can be collected.Impact
Impact 1: Perf profiler silently disabled (consistent failure)
discover_appropriate_perf_event()runsperf record -- sleep 0.5for each supported event type (default,cpu-clock,task-clock) and checks whether any samples were collected. Becausesleepexited immediately due to the invalid-aflag,perf recordalso exited with zero samples every time. The discovery function exhausted all events and raisedPerfNoSupportedEvent, logging:This was consistently triggered on hosts where the
processes_to_profilewasNone(system-wide mode), causing the entire perf profiler to be permanently disabled for those hosts.Impact 2: Collected data silently lost (intermittent failure)
In normal profiling runs (not discovery),
extra_argsdoes not contain--. However,self._pid_argswas still being placed afterextra_args, meaning-aor--pid <pids>came after all event flags (-e cycles,--call-graph dwarf,...). While perf tolerates this in many cases, the ordering is incorrect per perf's argument parsing rules, and on some kernel/perf versions this caused profiling samples to be dropped or the wrong set of processes to be targeted — resulting in inconsistent, partial data collection that was invisible to the caller.Root Cause
The
_get_perf_cmdmethod inPerfProcessconcatenatedextra_argsbeforeself._pid_args:The Fix
Two files were changed:
gprofiler/utils/perf_process.py— root fixSwapped the order so
pid_argsalways comes beforeextra_args:This guarantees the resulting command is always well-formed:
gprofiler/utils/perf.py— cleanupRemoved the now-unnecessary
discovery_pidsvariable and simplified thePerfProcessconstruction indiscover_appropriate_perf_eventto always passprocesses_to_profile=None(system-wide), relying on the corrected ordering in_get_perf_cmdto place-acorrectly.How It Was Found
The bug was surfaced by a host emitting the
"Failed to determine perf event to use"warning. Manual inspection of the perf data files on the host usingperf evlistconfirmed events were being recorded (cycles:P), butperf scriptoutput showed no stack frames — which led to tracing the discovery logic and ultimately uncovering the argument ordering bug.Files Changed
gprofiler/utils/perf_process.py— swappedextra_argsandself._pid_argsorder in_get_perf_cmd()gprofiler/utils/perf.py— cleaned updiscover_appropriate_perf_event()to always useprocesses_to_profile=Nonefor discovery