Add approximate tracking quantile summaries. #859

andrewmcgr · 2021-04-23T01:34:16Z

This PR implements approximate tracking quantile summaries.

These can be thought of as an analog to the classic exponential weighted moving average (EWMA), except they track an estimate of a quantile of the sampled data, rather than an estimate of the mean.

The algorithms are based on the following two papers:

Quantile Tracking in Dynamically Varying Data Streams Using a Generalized Exponentially Weighted Average of Observations
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1901.04681.pdf

Joint Tracking of Multiple Quantiles Through Conditional Quantiles
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1902.05428.pdf

ApproxSummary cheaply maintains approximate quantiles and outputs them as a Summary. Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

beorn7 · 2021-04-26T17:23:45Z

Looks cool. I'll have a closer look ASAP.

beorn7

Just a few random comments here.

beorn7 · 2021-04-27T21:44:32Z

prometheus/approx_summary.go

+	// absolute error. If Objectives[q] = e, then the value reported
+	// for q will be the φ-quantile value for some φ between q-e and q+e.
+	// The default value is an empty map, resulting in a summary without
+	// quantiles.


The error part of the objectives isn't used at all, if I got that right. So this doc comment explaining the error is moot, isn't it? And I guess, you don't really need a map here but could just use a slice.

beorn7 · 2021-04-27T21:45:24Z

prometheus/approx_summary.go

+
+// ApproxSummaryOpts bundles the options for creating a Summary metric. It is
+// mandatory to set Name and Help to a non-empty string. All other fields are
+// optional and can safely be left at their zero value.


Is that still true for Lam, Gam, and Rho? I haven't studied the algorithm in detail yet, but does leaving them all at 0 leads to good default behavior?

It would also be good to give some instructions what values to use here. If any user of the ApproxSummary needs to read the paper first, we'll have very little adoption. (o:

beorn7 · 2021-04-27T22:03:01Z

prometheus/quantile_tracker.go

+}
+
+// NewMQEMATracker constructs a MQTracker instance using QEMA inner trackers.
+func NewMQEMATracker(q []float64, lam, gam, rho float64) *MQTracker {


This one is only used in the tests and benchmarks. Do you plan to use this one as an option in the actual Summary?

beorn7 · 2021-04-27T22:35:55Z

Thanks for doing this. I haven't read the papers yet (and it probably will take a while until I'll find the time), so some of my questions and comments might be quite inappropriate. I still hope asking them now gets us faster along the way than waiting for me to read and understand all the details.

Part 1: Questions about the algorithm

You told me a few things privately, but it would be good for the general public if you could summarize the advantages of the algorithm used here over the one used in the existing Summary, e.g. is it more accurate, does it need less memory, is it faster, … In short: Why should a user of the current Summary use the ApproxSummary instead?
As said, I didn't read the details yet, but it appears the algorithm has no guaranteed limit of the maximum error. Is that correct?
How does the algorithm handle the decay over time? From a first glance, it feels like it does, but it would be good to know the behavior, at least approximately, and document it in doc comments.
As already commented above, it would be good to get an idea of the meaning of Lam, Gam, and Rho in the doc comments so that, as a user, I can set meaningful values without reading the papers first. (Perhaps that will also answer the previous question.)

Part 2: Code structure

As said privately, I would love to just replace the existing algorithm, but I guess your ApproxSummary is too different (in terms of error guarantees and how it handles decay over time, see questions above) to serve as a drop-in replacement. Still, I would very much prefer to not create a whole new suite of types (ApproxSummary, ApproxSummaryOpts, ApproxSummaryVec) with all their methods and constructors. (This library has way too many top-level types already, which confuses users a lot.) So my idea would be to make ApproxSummary a part of the regular Summary, and depending on the options configured in the SummaryOpts, the one or the other algorithm is used. We can flesh out the details, it would probably lead to mutually exclusive fields in the SummaryOpts, but it would then be very easy for users of the existing Summary to switch over to the new algorithm. (I have taken that approach for the new SparseHistograms, see the beorn7/histogram branch.) Do you think that would be possible here?

Another aspect of “please do not add more top-level types” is that the whole machinery for the quantile estimation is in exported types. Is there a reason for users of this library to ever directly interact with them (as part of instrumenting code for Prometheus)? If not, I would just un-export those. If you prefer to have them exported for clarity of our (internal) usage, you could move them into a separate tracker package in the internal directory. And finally, if you would like to make your implementation reusable, even outside of the context of instrumenting code for Prometheus, then you should probably maintain them in a separate package, which prometheus/client_golang would import as a dependency (similar to github.com/beorn7/perks/quantile now).

Part 3: Tests

The test still appear flaky. Do you have an idea what's happening here? Is that expected? Can it be fixed? I see that it uses random numbers right now, which kind of guarantees a certain flakiness. Do you think we could use some fixtures where we know the exact quantiles so that we get deterministic results? I'm worried that accommodating the randomness of the input might force us to be so tolerant that we wouldn't detect real errors. (The existing Summary has the advantage of guaranteed limits on the error, so tests can use that as a hard criteria for failure.)

bwplotka · 2021-07-24T09:39:14Z

Agree with all @beorn7 said. @andrewmcgr are you still interested to contribute this special Summary implementation using SummaryOpts? (:

If not - that's fine, we can close this PR and someone can continue on your work at some point 🤗

stale · 2022-03-19T05:24:21Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

andrewmcgr added 4 commits April 23, 2021 15:27

Add ApproxSummary and quantile trackers.

558d0ca

ApproxSummary cheaply maintains approximate quantiles and outputs them as a Summary. Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

Fixup ApproxSummary tests.

938af5d

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

Fix some test issues.

be7b659

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

Add copyright to quantile tracker sources

5e036d6

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

andrewmcgr force-pushed the amcgregor/quantiles_rebase branch from 7f78e19 to 5e036d6 Compare April 23, 2021 05:27

Deflake tests.

fd0640c

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

andrewmcgr force-pushed the amcgregor/quantiles_rebase branch from d1e45f0 to fd0640c Compare April 23, 2021 06:47

Deflake more tests.

d34f2cf

Signed-off-by: Andrew McGregor <amcgregor@fastly.com>

beorn7 self-requested a review April 26, 2021 17:24

beorn7 reviewed Apr 27, 2021

View reviewed changes

stale bot added the stale label Mar 19, 2022

stale bot closed this Mar 27, 2022

beorn7 mentioned this pull request Feb 26, 2023

Add Summary metric type prometheus/client_rust#67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add approximate tracking quantile summaries. #859

Add approximate tracking quantile summaries. #859

andrewmcgr commented Apr 23, 2021

beorn7 commented Apr 26, 2021

beorn7 left a comment

beorn7 Apr 27, 2021

beorn7 Apr 27, 2021

beorn7 Apr 27, 2021

beorn7 Apr 27, 2021

beorn7 commented Apr 27, 2021 •

edited

bwplotka commented Jul 24, 2021

stale bot commented Mar 19, 2022

Add approximate tracking quantile summaries. #859

Add approximate tracking quantile summaries. #859

Conversation

andrewmcgr commented Apr 23, 2021

beorn7 commented Apr 26, 2021

beorn7 left a comment

Choose a reason for hiding this comment

beorn7 Apr 27, 2021

Choose a reason for hiding this comment

beorn7 Apr 27, 2021

Choose a reason for hiding this comment

beorn7 Apr 27, 2021

Choose a reason for hiding this comment

beorn7 Apr 27, 2021

Choose a reason for hiding this comment

beorn7 commented Apr 27, 2021 • edited

Part 1: Questions about the algorithm

Part 2: Code structure

Part 3: Tests

bwplotka commented Jul 24, 2021

stale bot commented Mar 19, 2022

beorn7 commented Apr 27, 2021 •

edited