New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning when monotonicity is forced in the input to histogram_quantile #12931
Add warning when monotonicity is forced in the input to histogram_quantile #12931
Conversation
…ntile Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
@beorn7 As discussed on Slack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice usage of the new warnings framework. Just a few suggestions for tweaks.
promql/quantile.go
Outdated
@@ -370,18 +372,22 @@ func coalesceBuckets(buckets buckets) buckets { | |||
// | |||
// As a somewhat hacky solution until ingestion is atomic per scrape, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, ingestion is atomic per scrape now. But there are still other ways of encountering inconsistent buckets (data is already inconsistent at the target's side, histograms are ingested via the remote write receiver that Prometheus implements).
So this doc comment needs an update, otherwise future readers will be tempted to remove the whole thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the doc comment, but not sure I understood the change correctly, please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine. Most important was that we don't state a reason that isn't true anymore. I think your explanation is even more than needed, but it also cannot harm to provide this context here.
util/annotations/annotations.go
Outdated
BadBucketLabelWarning = fmt.Errorf("%w: bucket label %q is missing or has a malformed value", PromQLWarning, model.BucketLabel) | ||
MixedFloatsHistogramsWarning = fmt.Errorf("%w: encountered a mix of histograms and floats for metric name", PromQLWarning) | ||
MixedClassicNativeHistogramsWarning = fmt.Errorf("%w: vector contains a mix of classic and native histograms for metric name", PromQLWarning) | ||
HistogramQuantileForcedMonotonicityWarning = fmt.Errorf("%w: input to histogram_quantile needed to be fixed for monotonicity (and may give inaccurate results) for metric name", PromQLWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this info level?
My understanding of the line between warning and info so far is that a warning is never a "false positive", while info might be. Technically, if we have to adjust for monotonicity, it was always a real issue. Which would be a point for "warning". However, all the other warnings point towards something the user can fix, or in other words, they are actionable, while this histogram thing is really just happening, without any immediate action the user could take (besides re-designing the remote write protocol ;-).
So maybe define "warning" as "it's always a real issue that is also actionable", while "info" could be a false positive and/or non-actionable (but still good to know). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way at looking at this would be the following: An organization could mandate an internal practice to always fix queries that create warnings, but to tolerate infos. If some warnings are un-actionable, you couldn't stay faithful to the practice, similar to false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this info level. I think this is somewhat actionable though, in that the user might want to try to turn off query sharding for it if possible, or look into other causes why it might be non-monotonic and fix (though this may not always be possible). But I agree it could be a false positive, so it makes sense as Info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But sources for non-monotonic histograms might be different from sharding, e.g. non-atomic ingestion via remote-write, which is not actionable. I'm not saying that Infos will never be actionable, I'm just saying that Infos might not be actionable or might be false positives.
Arguably, this particular annotation will never be a false positive, because it only ever fires if the histogram is non monotonic, which is always a real issue.
Anyway, I'll stop splitting hairs now, as we agree that this should be Info.
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Inputs to histogram_quantile must be monotonic for the results to be accurate. However, due to various reasons, sometimes it is not monotonic, so we apply a hack to make it monotonic, but then the results may not be accurate, so we should inform the user with a warning.