New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add approximate tracking quantile summaries. #859
Conversation
ApproxSummary cheaply maintains approximate quantiles and outputs them as a Summary. Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
7f78e19
to
5e036d6
Compare
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
d1e45f0
to
fd0640c
Compare
Signed-off-by: Andrew McGregor <amcgregor@fastly.com>
Looks cool. I'll have a closer look ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few random comments here.
// absolute error. If Objectives[q] = e, then the value reported | ||
// for q will be the φ-quantile value for some φ between q-e and q+e. | ||
// The default value is an empty map, resulting in a summary without | ||
// quantiles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error part of the objectives isn't used at all, if I got that right. So this doc comment explaining the error is moot, isn't it? And I guess, you don't really need a map here but could just use a slice.
|
||
// ApproxSummaryOpts bundles the options for creating a Summary metric. It is | ||
// mandatory to set Name and Help to a non-empty string. All other fields are | ||
// optional and can safely be left at their zero value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that still true for Lam
, Gam
, and Rho
? I haven't studied the algorithm in detail yet, but does leaving them all at 0 leads to good default behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to give some instructions what values to use here. If any user of the ApproxSummary needs to read the paper first, we'll have very little adoption. (o:
} | ||
|
||
// NewMQEMATracker constructs a MQTracker instance using QEMA inner trackers. | ||
func NewMQEMATracker(q []float64, lam, gam, rho float64) *MQTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is only used in the tests and benchmarks. Do you plan to use this one as an option in the actual Summary?
Thanks for doing this. I haven't read the papers yet (and it probably will take a while until I'll find the time), so some of my questions and comments might be quite inappropriate. I still hope asking them now gets us faster along the way than waiting for me to read and understand all the details. Part 1: Questions about the algorithm
Part 2: Code structureAs said privately, I would love to just replace the existing algorithm, but I guess your ApproxSummary is too different (in terms of error guarantees and how it handles decay over time, see questions above) to serve as a drop-in replacement. Still, I would very much prefer to not create a whole new suite of types ( Another aspect of “please do not add more top-level types” is that the whole machinery for the quantile estimation is in exported types. Is there a reason for users of this library to ever directly interact with them (as part of instrumenting code for Prometheus)? If not, I would just un-export those. If you prefer to have them exported for clarity of our (internal) usage, you could move them into a separate Part 3: TestsThe test still appear flaky. Do you have an idea what's happening here? Is that expected? Can it be fixed? I see that it uses random numbers right now, which kind of guarantees a certain flakiness. Do you think we could use some fixtures where we know the exact quantiles so that we get deterministic results? I'm worried that accommodating the randomness of the input might force us to be so tolerant that we wouldn't detect real errors. (The existing Summary has the advantage of guaranteed limits on the error, so tests can use that as a hard criteria for failure.) |
Agree with all @beorn7 said. @andrewmcgr are you still interested to contribute this special Summary implementation using SummaryOpts? (: If not - that's fine, we can close this PR and someone can continue on your work at some point 🤗 |
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward? This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This PR implements approximate tracking quantile summaries.
These can be thought of as an analog to the classic exponential weighted moving average (EWMA), except they track an estimate of a quantile of the sampled data, rather than an estimate of the mean.
The algorithms are based on the following two papers:
Quantile Tracking in Dynamically Varying Data Streams Using a Generalized Exponentially Weighted Average of Observations
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1901.04681.pdf
Joint Tracking of Multiple Quantiles Through Conditional Quantiles
Hammer, Yazidi, and Rue
https://arxiv.org/pdf/1902.05428.pdf