Feature or enhancement
Proposal:
Feature
Add a Perfetto trace output backend to profiling.sampling (target: 3.16), alongside the existing other output formats. This would let users open Python sampling profiles directly in the Perfetto UI and analyse them with PerfettoSQL in the trace processor.
Opening this for design sanity-check before sending a PR, as suggested by @pablogsal in offline discussion. The most load-bearing question (how to emit protobuf without a runtime dependency) is covered below.
Motivation
- Perfetto is a trace viewer oriented systems-perf work, and being able to view Python samples there means a Python profile can sit on the same timeline as scheduler events, native CPU samples, and application instrumentation rather than living in a separate tool.
- Once trace processor can read the output, users get SQL-driven analysis of Python samples (top-N functions, per-thread breakdowns, custom aggregations).
Proposed design
Output format
Emit a Perfetto trace (perfetto.protos.Trace) containing:
- A
ProcessDescriptor / ThreadDescriptor per observed process/thread.
InternedData carrying Frame, Mapping, and Callstack entries
- One
TracePacket per sample containing a StackSample message that references the interned callstack.
StackSample is part of a new set of public profiling protos I'm landing in Perfetto specifically so that producers like this one have a stable, transport-neutral surface to target (rather than reusing PerfSample, which is shaped by perf_event_open and leaks producer diagnostics into the data). The RFC is at google/perfetto#6027.
Concretely, each Python sample maps to roughly:
TracePacket {
timestamp: <ns>
trusted_packet_sequence_id: <seq>
interned_data { ... } // first packet only, or as new frames appear
stack_sample {
task_context_iid: <thread>
execution_context_iid: <cpu/mode> // optional
callstack_iid: <interned callstack>
primary_descriptor_iid: <"wall_time_ns" counter>
primary_weight: <ns since last sample>
}
}
Proto serialisation without a runtime dependency
The stdlib can't depend on protobuf, so I'd hand-roll the wire format for the specific message types we emit. This is tractable because:
- Proto wire format is tiny. It's varints + length-delimited + fixed32/64 + a tag byte per field. The whole encoder for the messages we need is on the order of a couple hundred lines of Python.
- We only encode, never decode. Decoding protobuf is signifcantly harder than encode.
- The set of message types is small and stable. I've specifically designated in the RFC upstream that these protos I'm adding are going to be "eternally stabe" protos which we won't change the wire format or semantics in a non-backwards compatible way.
- Precedent. Perfetto itself ships an ad-hoc proto encoder/decoder (
protozero) in C++ for similar reasons.
Sketch of the shape:
# Hand-rolled, no runtime deps.
def _varint(buf, n): ...
def _tag(buf, field_no, wire_type): ...
def _string(buf, field_no, s): ...
def _message(buf, field_no, payload): ...
def encode_stack_sample(buf, sample):
_tag(buf, 1, WIRE_VARINT); _varint(buf, sample.task_context_iid)
_tag(buf, 6, WIRE_VARINT); _varint(buf, sample.callstack_iid)
# ...
Field numbers and wire types come straight from the .proto definitions in the Perfetto RFC. Any wire type constants would be inlined as Python integers.
CLI surface
A new --format perfetto CLI flag to the profiling.sampling module in all the same places --gecko is allowed today.
Target
Python 3.16.
Prerequisites
- Perfetto RFC-0027: public stack-sampling and heap-profiling protos. Required before this lands so we're targeting the stable protos.
cc @pablogsal
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Feature or enhancement
Proposal:
Feature
Add a Perfetto trace output backend to
profiling.sampling(target: 3.16), alongside the existing other output formats. This would let users open Python sampling profiles directly in the Perfetto UI and analyse them with PerfettoSQL in the trace processor.Opening this for design sanity-check before sending a PR, as suggested by @pablogsal in offline discussion. The most load-bearing question (how to emit protobuf without a runtime dependency) is covered below.
Motivation
Proposed design
Output format
Emit a Perfetto trace (
perfetto.protos.Trace) containing:ProcessDescriptor/ThreadDescriptorper observed process/thread.InternedDatacarryingFrame,Mapping, andCallstackentriesTracePacketper sample containing aStackSamplemessage that references the interned callstack.StackSampleis part of a new set of public profiling protos I'm landing in Perfetto specifically so that producers like this one have a stable, transport-neutral surface to target (rather than reusingPerfSample, which is shaped byperf_event_openand leaks producer diagnostics into the data). The RFC is at google/perfetto#6027.Concretely, each Python sample maps to roughly:
Proto serialisation without a runtime dependency
The stdlib can't depend on
protobuf, so I'd hand-roll the wire format for the specific message types we emit. This is tractable because:protozero) in C++ for similar reasons.Sketch of the shape:
Field numbers and wire types come straight from the
.protodefinitions in the Perfetto RFC. Any wire type constants would be inlined as Python integers.CLI surface
A new
--format perfettoCLI flag to the profiling.sampling module in all the same places--geckois allowed today.Target
Python 3.16.
Prerequisites
cc @pablogsal
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response