Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add approximate tracking quantile summaries. #859

Closed

Conversation

andrewmcgr
Copy link

This PR implements approximate tracking quantile summaries.

These can be thought of as an analog to the classic exponential weighted moving average (EWMA), except they track an estimate of a quantile of the sampled data, rather than an estimate of the mean.

The algorithms are based on the following two papers:

Quantile Tracking in Dynamically Varying Data Streams Using a Generalized Exponentially Weighted Average of Observations
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1901.04681.pdf

Joint Tracking of Multiple Quantiles Through Conditional Quantiles
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1902.05428.pdf

ApproxSummary cheaply maintains approximate quantiles and outputs them
as a Summary.

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
@beorn7
Copy link
Member

beorn7 commented Apr 26, 2021

Looks cool. I'll have a closer look ASAP.

@beorn7 beorn7 self-requested a review April 26, 2021 17:24
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few random comments here.

// absolute error. If Objectives[q] = e, then the value reported
// for q will be the φ-quantile value for some φ between q-e and q+e.
// The default value is an empty map, resulting in a summary without
// quantiles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error part of the objectives isn't used at all, if I got that right. So this doc comment explaining the error is moot, isn't it? And I guess, you don't really need a map here but could just use a slice.


// ApproxSummaryOpts bundles the options for creating a Summary metric. It is
// mandatory to set Name and Help to a non-empty string. All other fields are
// optional and can safely be left at their zero value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still true for Lam, Gam, and Rho? I haven't studied the algorithm in detail yet, but does leaving them all at 0 leads to good default behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to give some instructions what values to use here. If any user of the ApproxSummary needs to read the paper first, we'll have very little adoption. (o:

}

// NewMQEMATracker constructs a MQTracker instance using QEMA inner trackers.
func NewMQEMATracker(q []float64, lam, gam, rho float64) *MQTracker {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is only used in the tests and benchmarks. Do you plan to use this one as an option in the actual Summary?

@beorn7
Copy link
Member

beorn7 commented Apr 27, 2021

Thanks for doing this. I haven't read the papers yet (and it probably will take a while until I'll find the time), so some of my questions and comments might be quite inappropriate. I still hope asking them now gets us faster along the way than waiting for me to read and understand all the details.

Part 1: Questions about the algorithm

  • You told me a few things privately, but it would be good for the general public if you could summarize the advantages of the algorithm used here over the one used in the existing Summary, e.g. is it more accurate, does it need less memory, is it faster, … In short: Why should a user of the current Summary use the ApproxSummary instead?
  • As said, I didn't read the details yet, but it appears the algorithm has no guaranteed limit of the maximum error. Is that correct?
  • How does the algorithm handle the decay over time? From a first glance, it feels like it does, but it would be good to know the behavior, at least approximately, and document it in doc comments.
  • As already commented above, it would be good to get an idea of the meaning of Lam, Gam, and Rho in the doc comments so that, as a user, I can set meaningful values without reading the papers first. (Perhaps that will also answer the previous question.)

Part 2: Code structure

As said privately, I would love to just replace the existing algorithm, but I guess your ApproxSummary is too different (in terms of error guarantees and how it handles decay over time, see questions above) to serve as a drop-in replacement. Still, I would very much prefer to not create a whole new suite of types (ApproxSummary, ApproxSummaryOpts, ApproxSummaryVec) with all their methods and constructors. (This library has way too many top-level types already, which confuses users a lot.) So my idea would be to make ApproxSummary a part of the regular Summary, and depending on the options configured in the SummaryOpts, the one or the other algorithm is used. We can flesh out the details, it would probably lead to mutually exclusive fields in the SummaryOpts, but it would then be very easy for users of the existing Summary to switch over to the new algorithm. (I have taken that approach for the new SparseHistograms, see the beorn7/histogram branch.) Do you think that would be possible here?

Another aspect of “please do not add more top-level types” is that the whole machinery for the quantile estimation is in exported types. Is there a reason for users of this library to ever directly interact with them (as part of instrumenting code for Prometheus)? If not, I would just un-export those. If you prefer to have them exported for clarity of our (internal) usage, you could move them into a separate tracker package in the internal directory. And finally, if you would like to make your implementation reusable, even outside of the context of instrumenting code for Prometheus, then you should probably maintain them in a separate package, which prometheus/client_golang would import as a dependency (similar to github.com/beorn7/perks/quantile now).

Part 3: Tests

The test still appear flaky. Do you have an idea what's happening here? Is that expected? Can it be fixed? I see that it uses random numbers right now, which kind of guarantees a certain flakiness. Do you think we could use some fixtures where we know the exact quantiles so that we get deterministic results? I'm worried that accommodating the randomness of the input might force us to be so tolerant that we wouldn't detect real errors. (The existing Summary has the advantage of guaranteed limits on the error, so tests can use that as a hard criteria for failure.)

@bwplotka
Copy link
Member

Agree with all @beorn7 said. @andrewmcgr are you still interested to contribute this special Summary implementation using SummaryOpts? (:

If not - that's fine, we can close this PR and someone can continue on your work at some point 🤗

@stale
Copy link

stale bot commented Mar 19, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants