Expose histogram_quantile() target bucket lower/upper bounds as series? #5706

juliusv · 2019-06-25T10:10:25Z

The idea was born out of a conversation at https://twitter.com/juliusvolz/status/1142364117661036544.

To give users a better idea of the quantile error margin of histogram_quantile(), it would be useful to expose the target bucket's lower and upper bounds as two extra series in the same graph as the approximated quantile.

Something like this (using a hacked-up version of histogram_quantile()):

Do we want this feature?
Should it be a separate function for just the bucket boundaries, or integrated into histogram_quantile() as an option? E.g. histogram_quantile(φ float, b instant-vector, include_bounds bool, type_label string)

Doing it in the same function would have the advantage that we don't need to calculate the target bucket twice.

The text was updated successfully, but these errors were encountered:

brian-brazil · 2019-06-25T10:19:07Z

Do we want this feature?

I'm not clear that this is something that belongs in PromQL. If someone wants to do this sort of deeper analysis, the data is already there.

Should it be a separate function for just the bucket boundaries, or integrated into histogram_quantile() as an option?

We presently only have one function (absent) that can return more series than it was passed in, and I'd like to keep it that way. I'm also against overloading functions.

juliusv · 2019-06-25T10:25:00Z

I'm not clear that this is something that belongs in PromQL. If someone wants to do this sort of deeper analysis, the data is already there.

It's hard for the average user to do by themselves though. If this turned out to be useful for enough users it would be a shame if they had to implement this themselves (which I think nobody would do), while it's a readily-available by-product of calculations already happening in histogram_quantile().

But yeah, that's a question of how popular and useful it would be. I do see that it would be somewhat quirky.

brian-brazil · 2019-06-25T10:27:36Z

This feels more like a sanity check thing to me, which would be better in a linter that's looking for alerts on series that don't exist, rates of sums etc.

juliusv · 2019-06-25T10:30:27Z

I'd see it as giving a user an ongoing idea of the possible quantile error during operation, though it'd also be useful for the tuning/linting use case you mention.

brian-brazil · 2019-06-25T10:53:22Z

That's determinable by inspection, no need for runtime information.

beorn7 · 2019-06-25T11:08:47Z

That's determinable by inspection

If you want to present a 99th-perc-latency graph or a current 99th-perc-latency value as part of a dashboard, and you also want to show the lower and upper bound of the currently relevant bucket, how would you solve that by "inspection" in practice? I assume users wouldn't like to hardcode the bucketing scheme in their dashboard builder. With the current tooling, you would probably need to export the bucket boundaries as separate time series. Which is pretty crazy.

It would be much easier if Prometheus represented the boundaries as floating point numbers rather than strings. You could do math on them, and use cases like this, even if they are a bit niche, would be solvable in a natural way as a byproduct.

Back in the days, we considered the representation of the boundaries as strings a temporary work-around to get an MVP going ASAP. It's sad that we somehow got stuck with it. Even worse, our MVP is now leaking into OpenMetrics, with the consequences already haunting us.

brian-brazil · 2019-06-25T11:25:38Z

If you want to present a 99th-perc-latency graph or a current 99th-perc-latency value as part of a dashboard, and you also want to show the lower and upper bound of the currently relevant bucket, how would you solve that by "inspection" in practice?

My point was more that you can determine the accuracy in general by knowing the histogram buckets, without having to know which bucket is in use. Inconsistently sized buckets (e.g. not arithmetic or geometric growth) in particular could be caught.

beorn7 · 2019-06-25T11:40:01Z

With the current cost of histograms (one series per bucket), spreading them evenly over the range of interest will overwhelm even the beefiest Prometheus servers in many scenarios. (At SoundCloud, teams had to revert to put buckets around interesting values like the latency mentioned in the SLO. If the current value was far away from that, the higher error range that came with that was just something you had to know about and couldn't visualize in dashboards.)

But even with geometric spacing, you still just need to know if you are on a x1.5, x2, or even x10 bucketing scheme.

harsha070 · 2019-11-07T07:41:02Z

Hi. I am using prometheus along with k8s. Response time for my service is 10-15s usually. When I use prometheus' histogram_quantile(), my P90 is capped at 10.0s. I noticed in my prometheus config (kubectl describe handler prometheus) that largest bucket is 10s. I added 25 in Metrics.Buckets.explicit_buckets.Bounds list, yet my P90 seems to be capped at 10.0s

harsha070 · 2019-11-07T07:41:31Z

Can someone help me with this?

beorn7 · 2019-11-11T12:13:19Z

@harsha070 a GH issue is not a good place to get support for a (only vaguely related) configuration problem. It makes more sense to ask questions like this on the prometheus-users mailing list. On the mailing list, more people are available to potentially respond to your question, and the whole community can benefit from the answers provided.

…ted percentile from `histogram_quantile` if third arg is passed Updates prometheus/prometheus#5706

valyala · 2019-12-11T12:23:48Z

FYI, just added ability to pass third arg to histogram_quantile in VictoriaMetrics, so it can be evaluated before deciding whether it is needed in Prometheus.
Try using histogram_quantile(phi, q, boundsLabel) at http://play-grafana.victoriametrics.com:3000/d/4ome8yJmz/node-exporter-on-victoriametrics-demo (graphs can be edited there). Example query:

histogram_quantile(0.9, histogram(process_resident_memory_bytes), "bound")

The query uses histogram aggregate function from extendend PromQL. The histogram function described in this article.

juliusv · 2019-12-11T13:12:22Z

Hi @valyala, I/we are really happy about anyone building cool products/projects around Prometheus, and I think VictoriaMetrics is making some interesting technical contributions. But at the same time we're seeing a pattern with VM that we see as increasingly problematic:

It seems like every GitHub issue, mailing list thread, etc. of the Prometheus OSS community that can be connected thematically in any way to VM, is used to push VictoriaMetrics. We might not have direct written-down rules about this (yet) for our channels because sometimes it's difficult to draw the line and pointing out products can also be helpful, but I don't think it is good etiquette to use OSS community channels to push own (incompatible) products at every possible chance. The fact that VM initially marketed itself as simply "The best Prometheus storage" without any nuances wasn't helpful in my perception.
IMO more problematically, VM introduces its own implementation of PromQL that is incompatible in multiple ways with native PromQL, while calling itself "Extended PromQL". This implies that it is a compatible superset. I think this will be dangerous in that it confuses users about capabilities of each PromQL variant and their compatibility. I did read https://medium.com/@valyala/evaluating-performance-and-correctness-victoriametrics-response-e27315627e87 and am worried by the post downplaying these compatibility problems as well. But even if "Extended PromQL" was a compatible superset of PromQL, it still will cause fragmentation and interoperability problems in the greater Prometheus space (see your above example of histogram_quantile(), since now anyone on VM cannot go "back"). This may not be your intention, but it does have echos of the "embrace, extend, extinguish" strategy that Microsoft used in the 90s (https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguish). Finally: I am not a lawyer, but do note that "PromQL" is a pending trademark of the Linux Foundation (https://www.linuxfoundation.org/trademark-list/), and my non-lawyerly reading of the LF's trademark usage terms specifically sounds like "Extended PromQL" is a problem, especially for the compatibility communication purpose.

So from my perspective: I'm happy to see what comes out of VictoriaMetrics, but maybe be a bit more sensitive about using Prometheus channels to push it at every available opportunity. Secondly and more importantly, I think it would be the right thing to do to at least make "Extended PromQL" a fully compatible superset of PromQL or rename it completely to not suggest compatibility. Even if it was compatible however, it wouldn't completely avoid the issues about interoperability/fragmentation, and the connected trademark issues (without an appropriate rename).

vearutop · 2019-12-14T16:05:26Z

Dear @juliusv, thank you for a very detailed comment that hooked me on this topic.

I was wondering when you're saying "It seems like every GitHub issue" are you talking about these 8 issues?

juliusv · 2019-12-14T17:01:29Z

@vearutop Most of those qualify, yes. The mailing list at https://groups.google.com/forum/#!searchin/prometheus-users/from$3Avalyala@gmail.com$20victoriametrics%7Csort:date contains many more examples though. And again, it's not necessarily about each individual comment or post, it's about the pattern, and about pushing users onto a system that is deliberately incompatible, both in its PromQL subset, and in its "Extended PromQL" superset. This means that users can never go back to PromQL-compatible systems without breakage. @valyala has the right to build an incompatible system (naming issues aside), but then the Prometheus OSS community also has the right to discourage the above-mentioned communication behavior on our OSS channels.

juliusv · 2019-12-14T17:03:28Z

Closing and locking this issue as this is not the right place for this discussion (we're in the process of finding it), and the original issue probably won't be implemented anyway.

juliusv added the component/promql label Jun 25, 2019

valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Dec 11, 2019

app/vmselect/promql: return lower and upper bounds for the estima…

c25b978

…ted percentile from `histogram_quantile` if third arg is passed Updates prometheus/prometheus#5706

valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Dec 11, 2019

app/vmselect/promql: return lower and upper bounds for the estima…

73b2a3d

…ted percentile from `histogram_quantile` if third arg is passed Updates prometheus/prometheus#5706

krasi-georgiev mentioned this issue Dec 14, 2019

Allow filtering label values API with matchers #6178

Closed

juliusv closed this as completed Dec 14, 2019

prometheus locked as off-topic and limited conversation to collaborators Dec 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose histogram_quantile() target bucket lower/upper bounds as series? #5706

Expose histogram_quantile() target bucket lower/upper bounds as series? #5706

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

beorn7 commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

beorn7 commented Jun 25, 2019

harsha070 commented Nov 7, 2019

harsha070 commented Nov 7, 2019

beorn7 commented Nov 11, 2019

valyala commented Dec 11, 2019

juliusv commented Dec 11, 2019 •

edited

Loading

vearutop commented Dec 14, 2019

juliusv commented Dec 14, 2019

juliusv commented Dec 14, 2019

Expose histogram_quantile() target bucket lower/upper bounds as series? #5706

Expose histogram_quantile() target bucket lower/upper bounds as series? #5706

Comments

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

juliusv commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

beorn7 commented Jun 25, 2019

brian-brazil commented Jun 25, 2019

beorn7 commented Jun 25, 2019

harsha070 commented Nov 7, 2019

harsha070 commented Nov 7, 2019

beorn7 commented Nov 11, 2019

valyala commented Dec 11, 2019

juliusv commented Dec 11, 2019 • edited Loading

vearutop commented Dec 14, 2019

juliusv commented Dec 14, 2019

juliusv commented Dec 14, 2019

juliusv commented Dec 11, 2019 •

edited

Loading