Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Client-side metric aggregation #4557

Closed
Wenzil opened this Issue Aug 28, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@Wenzil
Copy link

Wenzil commented Aug 28, 2018

Proposal

Prometheus seems to be missing a way to report the maximum observed value in a sample of observations (instant vector) where the cardinality would be too high for each observation to have a unique label set.

Replace maximum with any type of aggregation (e.g. minimum, average, median, etc.) in the previous sentence.

My first thought was to use a Summary metric with only e.g. 0, 0.5 and 1 quantiles to get the minimum, median, and maximum. The problem is that some Prometheus clients don't even support sliding windows for summary metrics. This means that these aggregations are cumulative over time and don't represent the state of the system at the time of the scrape.

What I'm proposing is a type of metric that would keep track of all these aggregations over a scrape interval (client-side), and expose them with a _<aggretation_type> metric name suffix.

Forgive me if there is already a way to achieve this with existing Prometheus functionality.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 28, 2018

Thank you for your proposal. One of the principles of the Prometheus evaluation model is that you can calculate a given statistic over an arbitrary time window, and there is no efficient way to do that with a min or max. If you need this, you should probably look at a log-based solution.

@Wenzil

This comment has been minimized.

Copy link
Author

Wenzil commented Aug 28, 2018

Could you clarify why it wouldn't be efficient to calculate the min or max over an arbitrary time window? Intuition tells me it would be at least as efficient as estimating quantiles over a time window, which is already supported per the docs .

Then again, maybe we don't need a new metric type for this, and just use the 0 and 1 quantiles in a Summary metric as suggested in the proposal. Of course, this requires that the prometheus client supports sliding time windows

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 28, 2018

For an arbitrary time window you'd basically need to remember every single point. Client-side quantiles aren't exact a good place to start if you care about efficiency.

Ultimately this is not something we can do within the Prometheus evaluation and data models.

@Wenzil

This comment has been minimized.

Copy link
Author

Wenzil commented Aug 28, 2018

For an arbitrary time window you'd basically need to remember every single point.

Do you? Seems like you only need to maintain the min and the max for the time window. To implement a sliding window you can split it in n buckets and keep a min/max for each bucket, and evict the oldest bucket as time goes on.

For example, a sliding window of 10 minutes could be split into 5 buckets of 2 minutes. As new observations come in, they are compared against the max of the current bucket. If they are higher, they replace the max for that bucket. On scrapes, we return the value of the bucket with the highest max. Every two minutes, the oldest bucket is discarded and a new one is rotated in front as the new current bucket.

Same could be done with the min. Average would be rather easy too: just keep a count and current average for each bucket.

Seems pretty efficient. In fact I could see this as being an extension to the Summary metric type rather than a completely new metric type.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

Do you? Seems like you only need to maintain the min and the max for the time window.

Yes, I said any arbitrary time window - not bucketed/sliding time windows. The person using PromQL need to be able to select the window on the fly, with no coordination with the client library. We actually recommend dropping min/max from any system that exposes them: https://prometheus.io/docs/instrumenting/writing_exporters/#drop-less-useful-statistics

Average would be rather easy too: just keep a count and current average for each bucket.

We already provide average via _sum and _count, and they work over arbitrary windows.

@Wenzil

This comment has been minimized.

Copy link
Author

Wenzil commented Aug 29, 2018

You don’t know what time the min or max were calculated over

Fair enough. Still, I would prefer to have them reported to Prometheus rather than a logging system so that we can have nice visualization and alerting based on approximate state of the system over the sliding window.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 29, 2018

The best we can offer there are quantiles, which are approximate by their nature when dealing with metrics.

@Wenzil

This comment has been minimized.

Copy link
Author

Wenzil commented Aug 30, 2018

Yeah 0 and 1 quantiles should work for min/max estimation over a sliding window, assuming the client library supports sliding windows. That's the actual problem here, some don't.

But that's outside the scope of this repo, so feel free to close this issue.

Thanks!

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 30, 2018

You can always request that client libraries add support for summary quantiles.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 22, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 22, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.