Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning when monotonicity is forced in the input to histogram_quantile #12931

Merged

Conversation

zenador
Copy link
Contributor

@zenador zenador commented Oct 4, 2023

Inputs to histogram_quantile must be monotonic for the results to be accurate. However, due to various reasons, sometimes it is not monotonic, so we apply a hack to make it monotonic, but then the results may not be accurate, so we should inform the user with a warning.

…ntile

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
@zenador
Copy link
Contributor Author

zenador commented Oct 4, 2023

@beorn7 As discussed on Slack

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice usage of the new warnings framework. Just a few suggestions for tweaks.

@@ -370,18 +372,22 @@ func coalesceBuckets(buckets buckets) buckets {
//
// As a somewhat hacky solution until ingestion is atomic per scrape, we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, ingestion is atomic per scrape now. But there are still other ways of encountering inconsistent buckets (data is already inconsistent at the target's side, histograms are ingested via the remote write receiver that Prometheus implements).

So this doc comment needs an update, otherwise future readers will be tempted to remove the whole thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the doc comment, but not sure I understood the change correctly, please check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine. Most important was that we don't state a reason that isn't true anymore. I think your explanation is even more than needed, but it also cannot harm to provide this context here.

promql/quantile.go Outdated Show resolved Hide resolved
BadBucketLabelWarning = fmt.Errorf("%w: bucket label %q is missing or has a malformed value", PromQLWarning, model.BucketLabel)
MixedFloatsHistogramsWarning = fmt.Errorf("%w: encountered a mix of histograms and floats for metric name", PromQLWarning)
MixedClassicNativeHistogramsWarning = fmt.Errorf("%w: vector contains a mix of classic and native histograms for metric name", PromQLWarning)
HistogramQuantileForcedMonotonicityWarning = fmt.Errorf("%w: input to histogram_quantile needed to be fixed for monotonicity (and may give inaccurate results) for metric name", PromQLWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this info level?

My understanding of the line between warning and info so far is that a warning is never a "false positive", while info might be. Technically, if we have to adjust for monotonicity, it was always a real issue. Which would be a point for "warning". However, all the other warnings point towards something the user can fix, or in other words, they are actionable, while this histogram thing is really just happening, without any immediate action the user could take (besides re-designing the remote write protocol ;-).

So maybe define "warning" as "it's always a real issue that is also actionable", while "info" could be a false positive and/or non-actionable (but still good to know). WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way at looking at this would be the following: An organization could mandate an internal practice to always fix queries that create warnings, but to tolerate infos. If some warnings are un-actionable, you couldn't stay faithful to the practice, similar to false positives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this info level. I think this is somewhat actionable though, in that the user might want to try to turn off query sharding for it if possible, or look into other causes why it might be non-monotonic and fix (though this may not always be possible). But I agree it could be a false positive, so it makes sense as Info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But sources for non-monotonic histograms might be different from sharding, e.g. non-atomic ingestion via remote-write, which is not actionable. I'm not saying that Infos will never be actionable, I'm just saying that Infos might not be actionable or might be false positives.

Arguably, this particular annotation will never be a false positive, because it only ever fires if the histogram is non monotonic, which is always a real issue.

Anyway, I'll stop splitting hairs now, as we agree that this should be Info.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
@beorn7 beorn7 merged commit b787e5f into prometheus:main Oct 8, 2023
24 checks passed
@zenador zenador deleted the warning-non-monotonic-classic-histogram branch October 9, 2023 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants