runtime/pprof: support efficient accumulation of custom event count profiles #18454
I want to be able to gather pprof-esque information about instantaneous events that have occurred over the lifetime of my program, possibly with sampling for performance reasons.
This is similar to what the heap profile does, at least when used with alloc_space and alloc_objects--it tracks memory allocations over a long period.
The existing runtime/pprof custom profile API seems ill-suited to this. (Insofar as I understand it. See #18453.) One could accomplish it by inventing a unique key for Add and never Remove anything. However, this will result in a giant, ever-growing map. It would be far more efficient to just keep a counter per pc, as many of the runtime-provided profiles do. It might be worth considering adding a different kind of custom profile geared more towards this use case.
I don't have a concrete proposal, since I haven't thought about this deeply. This issue is just intended to open discussion, particularly since pprof labels are coming for 1.9, and it might be worth considering how they interact with custom profiles--hopefully productively.
The text was updated successfully, but these errors were encountered:
@josharian Something like
The comment text is bad but you get the idea. This would capture the kinds of things that we do for the mutex profile as well as basic counters.
I'd like to make sure we get the other pprof changes through first, but this seems like a reasonable followup.
@josharian Sorry, looks like I dropped a bunch of Github notifications about two weeks ago.
You have more context here than I do, but I don't think a new data structure is required for the API I sketched. There's nothing about "top N" in the usual profiles; it's supposed to be a representative sample of the overall behavior, not just the "weighty" behavior.
I think all that is needed is a
I just had a use case where I wished the
This is an old proposal that was accepted but seems to have gone stale. Is there still an appetite for this?
It also seems this could be tackled in two parts: the
I have some working code for this, but there's a lot in the details of the interface that isn't clear to me. (I don't think there's enough time in the current cycle to nail down good answers for these.)
The Add method takes
The built-in profiles that accumulate value over time (heap, block, mutex) give pairs of values for each record: the total weight, and the number of events sampled. I think that this type of custom profile should give that same pair, especially because the internal sampling (when enabled) will make it hard for users to track consistent counts on their own.
The heap profile uses a Poisson process for sampling. That gives small events a fair chance of being sampled, even when they consistently come after huge events (which would reset the clock). Each event is sampled 0 or 1 times. That seems like an appropriate approach to use here. But the smallest heap event has weight 1 (with an argument for it being 4 or 8), and the typical rate is 512*1024. There's special handling when the rate is 0 (sample nothing) or 1 (sample everything).
But accepting a float64 for weight in custom Event profiles means the weight can be less than 1. If a user makes 100 calls to Event with weight 0.1, I'd expect a profile with rate=10 to collect 1 sample. Setting rate=2 would mean 5 samples. But if the profile has rate=1, should it collect 10 samples or be a special case to collect every (all 100) samples? Collecting every sample means a big discontinuity, which isn't the case for the built-in profiles (which use integers for weight, often several orders of magnitude larger than 1).
If rate=1, should an Event with weight=1 always be sampled? Using a Poisson process means the average spacing between samples will be rate, but Event calls are discrete and will be sampled either 0 or 1 times, leading to a lower average spacing (because the sampled Events can only be spaced by 1, 2, 3, etc). And how different should the sampling be between Event calls with weight=1 vs weight=0.999? Maybe it's appropriate to roll any excess sampling weight over to the next counter, though that could give an artificial bump to tiny samples that come right after large ones.
Here are the docs I wrote, which describe what I think are decent answers to those (default of rate=0 means collect everything, rate>=1 means use Poisson):