Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Includes exemplars for count and sum aggregate metrics for histograms #3460

Conversation

HaloFour
Copy link

@HaloFour HaloFour commented Oct 6, 2022

Currently exemplars are not included in the counter-like metrics recorded with histograms (e.g. foo_seconds_count and foo_seconds_sum). This change grabs the last non-null exemplar from any of the histogram buckets and applies it to those sampled values.

The lack of exemplars in these cases was surprising to our team as we do use the http_server_requests_seconds_count metrics to aggregate all Spring requests regardless of their duration and drive alerting from those metrics, we feel there would be value in following exemplars to example traced requests at that granularity.

As for which metrics include exemplars or which exemplar to use, that is up for debate. I am treating _seconds_count and _seconds_sum as they feel like part of a counter, which does support exemplars in Prometheus, whereas _seconds_max is a gauge. I also am grabbing the last non-null exemplar which may favor the larger buckets but I wanted it to be predictable as well as not requiring any additional iterations of the exemplars array.

@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented Oct 6, 2022

⚠️ 10 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Oct 6, 2022

There is a discussion about this on Micrometer Slack, let me copy here some interesting bits, in separate comments.

This is missing right now because OpenMetrics does not support Exemplars for Summary, I’m not sure why (counter and sum). And it seems Histogram only supports it on the buckets.

Other than this, I’m with you, I’m also missing this, e.g.: what if I have a Timer, histogram is off and I want to use an exemplar for the counter? I think this should be a common use-case but the spec does not support it.
Also, TBH, I’m not sure what would be the consequences doing this 🙂 but:

  • The Java client does not support Exemplars on the API level for Summary
  • The Java client supports Exemplars on the API level for Histogram but on the output, only the buckets will have exemplars the count and sum does not

as it is defined in the specs.

@jonatan-ivanov
Copy link
Member

I’m not sure that this implementation will work, it seems it will not give you the latest exemplar but it will give you the exemplar to the highest bucket.
E.g.: what happens if you switch the recording order in your tests, i.e.: instead of this:

slos.record(10);
slos.record(250);
slos.record(1_000);

do this:

slos.record(1_000);
slos.record(250);
slos.record(10);

After you fix the buckets, I think the test will be still broken because the counter will get the exemplar to the first recording because it is the highest. This behavior can result in the counter exemplar being initialized and never updated again.

Maybe it’s just me but I think the behavior matters (assuming that OpenMetrics will support exemplars there) because of two reasons:

  • To me, this would be the expected behavior but I’m not sure where would I get unexpected data or get into trouble with this assumption and with the current implementation
  • If the first request can take significantly slower (which isn’t rare): lazy init, reading files, populating caches, GC getting excited because of all of this, etc. the exemplar of the count might never get updated ever again

@jonatan-ivanov
Copy link
Member

@HaloFour asked questions to get feedback on the exemplars support of the OpenMetrics specs on CNCF slack: https://cloud-native.slack.com/archives/C01NP3BV26R/p1665087946929039

@jonatan-ivanov jonatan-ivanov added the waiting for feedback We need additional information before we can continue label Oct 6, 2022
@HaloFour
Copy link
Author

HaloFour commented Oct 6, 2022

Sounds good, we can revisit once we get clarity about the OpenMetrics spec. If we would require a "better" approach to determine which observation to use would that require waiting on a solution from the Prometheus client?

@jonatan-ivanov
Copy link
Member

I think this depends on what feedback we will get back and how we want to implement it. I think it is likely that we don't need to wait for the Prometheus Client but let's see.

@jonatan-ivanov
Copy link
Member

For the sake of documentation/being transparent: it seems Prometheus will enable exemplars for all time series: https://groups.google.com/g/prometheus-developers/c/zgu5hwV_2oo/m/5VfUiOfmAgAJ

@shakuzen shakuzen added blocked An issue that's blocked on an external project change and removed waiting for feedback We need additional information before we can continue labels Jan 31, 2023
@shakuzen
Copy link
Member

@jonatan-ivanov do you know if there is a GitHub issue open tracking the work on the Prometheus server side to accept exemplars on all time series?

@jonatan-ivanov
Copy link
Member

fyi: prometheus/prometheus#11982

@shakuzen shakuzen added internal An issue that needs input from a member on another Team and removed blocked An issue that's blocked on an external project change labels Mar 6, 2023
@shakuzen
Copy link
Member

Superseded by #3996

@shakuzen shakuzen closed this Aug 15, 2023
@shakuzen shakuzen added superseded An issue that has been superseded by another and removed internal An issue that needs input from a member on another Team labels Aug 15, 2023
@HaloFour HaloFour deleted the exemplars_histogram_counters branch August 15, 2023 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
superseded An issue that has been superseded by another
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants