Add warning when monotonicity is forced in the input to histogram_quantile #12931

zenador · 2023-10-04T10:57:41Z

Inputs to histogram_quantile must be monotonic for the results to be accurate. However, due to various reasons, sometimes it is not monotonic, so we apply a hack to make it monotonic, but then the results may not be accurate, so we should inform the user with a warning.

…ntile Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

zenador · 2023-10-04T11:43:56Z

@beorn7 As discussed on Slack

beorn7

Nice usage of the new warnings framework. Just a few suggestions for tweaks.

beorn7 · 2023-10-04T14:30:21Z

promql/quantile.go

@@ -370,18 +372,22 @@ func coalesceBuckets(buckets buckets) buckets {
 //
 // As a somewhat hacky solution until ingestion is atomic per scrape, we


Actually, ingestion is atomic per scrape now. But there are still other ways of encountering inconsistent buckets (data is already inconsistent at the target's side, histograms are ingested via the remote write receiver that Prometheus implements).

So this doc comment needs an update, otherwise future readers will be tempted to remove the whole thing.

I've updated the doc comment, but not sure I understood the change correctly, please check.

I think it's fine. Most important was that we don't state a reason that isn't true anymore. I think your explanation is even more than needed, but it also cannot harm to provide this context here.

promql/quantile.go

beorn7 · 2023-10-04T14:36:48Z

util/annotations/annotations.go

+	BadBucketLabelWarning                      = fmt.Errorf("%w: bucket label %q is missing or has a malformed value", PromQLWarning, model.BucketLabel)
+	MixedFloatsHistogramsWarning               = fmt.Errorf("%w: encountered a mix of histograms and floats for metric name", PromQLWarning)
+	MixedClassicNativeHistogramsWarning        = fmt.Errorf("%w: vector contains a mix of classic and native histograms for metric name", PromQLWarning)
+	HistogramQuantileForcedMonotonicityWarning = fmt.Errorf("%w: input to histogram_quantile needed to be fixed for monotonicity (and may give inaccurate results) for metric name", PromQLWarning)


Maybe make this info level?

My understanding of the line between warning and info so far is that a warning is never a "false positive", while info might be. Technically, if we have to adjust for monotonicity, it was always a real issue. Which would be a point for "warning". However, all the other warnings point towards something the user can fix, or in other words, they are actionable, while this histogram thing is really just happening, without any immediate action the user could take (besides re-designing the remote write protocol ;-).

So maybe define "warning" as "it's always a real issue that is also actionable", while "info" could be a false positive and/or non-actionable (but still good to know). WDYT?

Another way at looking at this would be the following: An organization could mandate an internal practice to always fix queries that create warnings, but to tolerate infos. If some warnings are un-actionable, you couldn't stay faithful to the practice, similar to false positives.

I've made this info level. I think this is somewhat actionable though, in that the user might want to try to turn off query sharding for it if possible, or look into other causes why it might be non-monotonic and fix (though this may not always be possible). But I agree it could be a false positive, so it makes sense as Info.

But sources for non-monotonic histograms might be different from sharding, e.g. non-atomic ingestion via remote-write, which is not actionable. I'm not saying that Infos will never be actionable, I'm just saying that Infos might not be actionable or might be false positives.

Arguably, this particular annotation will never be a false positive, because it only ever fires if the histogram is non monotonic, which is always a real issue.

Anyway, I'll stop splitting hairs now, as we agree that this should be Info.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

Add warning when monotonicity is forced in the input to histogram_qua…

feaa93d

…ntile Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

zenador requested a review from roidelapluie as a code owner October 4, 2023 10:57

beorn7 requested changes Oct 4, 2023

View reviewed changes

Revise according to code review

0cbf0c1

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>

beorn7 approved these changes Oct 8, 2023

View reviewed changes

beorn7 merged commit b787e5f into prometheus:main Oct 8, 2023
24 checks passed

zenador deleted the warning-non-monotonic-classic-histogram branch October 9, 2023 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warning when monotonicity is forced in the input to histogram_quantile #12931

Add warning when monotonicity is forced in the input to histogram_quantile #12931

zenador commented Oct 4, 2023

zenador commented Oct 4, 2023

beorn7 left a comment

beorn7 Oct 4, 2023

zenador Oct 6, 2023

beorn7 Oct 8, 2023

beorn7 Oct 4, 2023

beorn7 Oct 4, 2023

zenador Oct 6, 2023

beorn7 Oct 8, 2023

		@@ -370,18 +372,22 @@ func coalesceBuckets(buckets buckets) buckets {
		//
		// As a somewhat hacky solution until ingestion is atomic per scrape, we

Add warning when monotonicity is forced in the input to histogram_quantile #12931

Add warning when monotonicity is forced in the input to histogram_quantile #12931

Conversation

zenador commented Oct 4, 2023

zenador commented Oct 4, 2023

beorn7 left a comment

Choose a reason for hiding this comment

beorn7 Oct 4, 2023

Choose a reason for hiding this comment

zenador Oct 6, 2023

Choose a reason for hiding this comment

beorn7 Oct 8, 2023

Choose a reason for hiding this comment

beorn7 Oct 4, 2023

Choose a reason for hiding this comment

beorn7 Oct 4, 2023

Choose a reason for hiding this comment

zenador Oct 6, 2023

Choose a reason for hiding this comment

beorn7 Oct 8, 2023

Choose a reason for hiding this comment