Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: support efficient accumulation of custom event count profiles #18454

Open
josharian opened this issue Dec 28, 2016 · 11 comments
Open
Assignees
Labels
NeedsFix Proposal-Accepted
Milestone

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Dec 28, 2016

I want to be able to gather pprof-esque information about instantaneous events that have occurred over the lifetime of my program, possibly with sampling for performance reasons.

This is similar to what the heap profile does, at least when used with alloc_space and alloc_objects--it tracks memory allocations over a long period.

The existing runtime/pprof custom profile API seems ill-suited to this. (Insofar as I understand it. See #18453.) One could accomplish it by inventing a unique key for Add and never Remove anything. However, this will result in a giant, ever-growing map. It would be far more efficient to just keep a counter per pc, as many of the runtime-provided profiles do. It might be worth considering adding a different kind of custom profile geared more towards this use case.

I don't have a concrete proposal, since I haven't thought about this deeply. This issue is just intended to open discussion, particularly since pprof labels are coming for 1.9, and it might be worth considering how they interact with custom profiles--hopefully productively.

cc @matloob

@josharian josharian added this to the Proposal milestone Dec 28, 2016
@josharian josharian changed the title proposal: runtime/pprof: support efficient accumulative custom profiles proposal: runtime/pprof: support efficient accumulation custom profiles Dec 28, 2016
@rsc
Copy link
Contributor

@rsc rsc commented Jan 9, 2017

@josharian Something like

// SetSampleRate sets the profile to record samples randomly with probability 1/n.
// It must be called before any calls to Add, Remove, or Event.
func (*Profile) SetSampleRate(n int)

// Event records that 'weight' events happened.
// For a simple event counter, weight should be 1.
// If events have different expenses, weight can vary according to the expense.
// A given Profile should be populated using Add/Remove or Event, but not both.
func (*Profile) Event(weight float64)

?

The comment text is bad but you get the idea. This would capture the kinds of things that we do for the mutex profile as well as basic counters.

I'd like to make sure we get the other pprof changes through first, but this seems like a reasonable followup.

@josharian
Copy link
Contributor Author

@josharian josharian commented Jan 9, 2017

@rsc yes! SGTM.

@josharian
Copy link
Contributor Author

@josharian josharian commented Jan 30, 2017

Opinions about implementing this using a probabilistic data structure like count-min-sketch-based top-k?

@josharian
Copy link
Contributor Author

@josharian josharian commented Feb 13, 2017

Ping for opinions. (Or if someone else is going to implement this, that's great, I'll just wait patiently.)

@rsc
Copy link
Contributor

@rsc rsc commented Feb 13, 2017

@josharian Sorry, looks like I dropped a bunch of Github notifications about two weeks ago.

You have more context here than I do, but I don't think a new data structure is required for the API I sketched. There's nothing about "top N" in the usual profiles; it's supposed to be a representative sample of the overall behavior, not just the "weighty" behavior.

I think all that is needed is a func shouldSample(rate int, weight float64) bool that determines whether the event is added to the profile (and then never removed). I believe that code (if not that precise function) already exists for deciding whether to sample a memory allocation. Then the profile itself is a map[stack]float64. I don't know that we have one of those in runtime/pprof right now, but we will once my pending work is done (moving that piece from the runtime to runtime/pprof).

@rsc rsc changed the title proposal: runtime/pprof: support efficient accumulation custom profiles runtime/pprof: support efficient accumulation of custom event count profiles Feb 13, 2017
@rsc rsc added this to the Go1.9 milestone Feb 13, 2017
@rsc rsc removed this from the Proposal milestone Feb 13, 2017
@josharian
Copy link
Contributor Author

@josharian josharian commented Feb 13, 2017

Hmm. I'll look again once your pending work is done.

@bradfitz bradfitz added this to the Go1.10 milestone Jun 7, 2017
@bradfitz bradfitz removed this from the Go1.9 milestone Jun 7, 2017
@bradfitz bradfitz removed this from the Go1.10 milestone Nov 28, 2017
@bradfitz bradfitz added this to the Go1.11 milestone Nov 28, 2017
@ianlancetaylor ianlancetaylor removed this from the Go1.11 milestone Jun 28, 2018
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jun 28, 2018
@ianlancetaylor ianlancetaylor added the NeedsFix label Jun 28, 2018
@Sajmani
Copy link
Contributor

@Sajmani Sajmani commented Jun 29, 2018

@josharian Perhaps the code in x/time/rate might be useful for this:
https://godoc.org/golang.org/x/time/rate#NewLimiter

@mmcloughlin
Copy link
Contributor

@mmcloughlin mmcloughlin commented May 13, 2021

I just had a use case where I wished the Event() API described above existed.

This is an old proposal that was accepted but seems to have gone stale. Is there still an appetite for this?

It also seems this could be tackled in two parts: the Event() API could be implemented independently of sampling, I think?

@josharian
Copy link
Contributor Author

@josharian josharian commented May 21, 2021

I’m still interested in using it, but don’t have a need urgent enough to implement myself.

@gopherbot
Copy link

@gopherbot gopherbot commented May 7, 2022

Change https://go.dev/cl/404697 mentions this issue: runtime/pprof: add counting profile and sampling

@rhysh
Copy link
Contributor

@rhysh rhysh commented May 11, 2022

I have some working code for this, but there's a lot in the details of the interface that isn't clear to me. (I don't think there's enough time in the current cycle to nail down good answers for these.)

The Add method takes skip int as its second argument, so I included that in the Event signature too.

The built-in profiles that accumulate value over time (heap, block, mutex) give pairs of values for each record: the total weight, and the number of events sampled. I think that this type of custom profile should give that same pair, especially because the internal sampling (when enabled) will make it hard for users to track consistent counts on their own.

The heap profile uses a Poisson process for sampling. That gives small events a fair chance of being sampled, even when they consistently come after huge events (which would reset the clock). Each event is sampled 0 or 1 times. That seems like an appropriate approach to use here. But the smallest heap event has weight 1 (with an argument for it being 4 or 8), and the typical rate is 512*1024. There's special handling when the rate is 0 (sample nothing) or 1 (sample everything).

But accepting a float64 for weight in custom Event profiles means the weight can be less than 1. If a user makes 100 calls to Event with weight 0.1, I'd expect a profile with rate=10 to collect 1 sample. Setting rate=2 would mean 5 samples. But if the profile has rate=1, should it collect 10 samples or be a special case to collect every (all 100) samples? Collecting every sample means a big discontinuity, which isn't the case for the built-in profiles (which use integers for weight, often several orders of magnitude larger than 1).

If rate=1, should an Event with weight=1 always be sampled? Using a Poisson process means the average spacing between samples will be rate, but Event calls are discrete and will be sampled either 0 or 1 times, leading to a lower average spacing (because the sampled Events can only be spaced by 1, 2, 3, etc). And how different should the sampling be between Event calls with weight=1 vs weight=0.999? Maybe it's appropriate to roll any excess sampling weight over to the next counter, though that could give an artificial bump to tiny samples that come right after large ones.

Here are the docs I wrote, which describe what I think are decent answers to those (default of rate=0 means collect everything, rate>=1 means use Poisson):

// SetSampleRate sets the profile to record a fraction of samples based on rate.
// For a Profile populated using Add/Remove, each call to Add has weight 1. For
// a Profile populated using Event, the sampling algorithm uses the weight
// provided in each of those calls.
//
// When rate is 0, the Profile stores every sample. When rate is 1 or greater,
// the Profile will collect samples based on their weight at with average
// spacing between samples of rate.
//
// It must be called before any calls to Add, Remove, or Event.
func (p *Profile) SetSampleRate(rate int)
// Event records an event with the given weight.
//
// For a simple event counter, weight should be 1. If events have different
// expenses, weight can vary according to the expense.
//
// The skip parameter has the same meaning as Add's skip and controls where the
// stack trace begins.
//
// A given Profile should be populated using Add/Remove or Event, but not both.
func (p *Profile) Event(weight float64, skip int)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix Proposal-Accepted
Projects
None yet
Development

No branches or pull requests

8 participants