Allows by/without to be empty and available for max/min_over_time #3030

cyriltovena · 2020-12-03T17:17:05Z

This essentially allows to aggregate over all dimensions when using by () while without() is a noop.
This also makes it possible for max and min range vector aggregation to use grouping, which is simpler than doing max by (foo) max_over_time(...)

Examples:

max_over_time(...) by () gives your the max over time accross of all series.
min_over_time(...) by (namespace) gives you the min over time per namespace.
max_over_time(...) without (namespace) gives you the max over time removing the namespace dimension.

PS: I've also refactored a bit how we optimized grouping to make it more clear.

Fixes #2884

Signed-off-by: Cyril Tovena cyril.tovena@gmail.com

…rations. This essentially allows to aggregate over all dimensions when using by () while without() is a noop. This also makes it possible for max and min range vector aggregation to use grouping, which is simpler than doing max by (foo) max_over_time(...) Examples: - `max_over_time(...) by ()` gives your the max over time accross of all series. - `min_over_time(...) by (namespace)` gives you the min over time per namespace. - `max_over_time(...) without (namespace)` gives you the max over time removing the namespace dimension. PS: I've also refactored a bit how we optimized grouping to make it more clear. Fixes grafana#2884 Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

owen-d

One question then LGTM

owen-d · 2020-12-04T18:32:28Z

docs/sources/logql/_index.md

@@ -431,7 +431,7 @@ Supported function for operating over unwrapped ranges are:
 - `stddev_over_time(unwrapped-range)`: the population standard deviation of the values in the specified interval.
 - `quantile_over_time(scalar,unwrapped-range)`: the φ-quantile (0 ≤ φ ≤ 1) of the values in the specified interval.

-Except for `sum_over_time`, `min_over_time` and `max_over_time` unwrapped range aggregations support grouping.
+Except for `sum_over_time`, unwrapped range aggregations support grouping.


Is there a reason we don't allow this on sum_over_time? I think sum_over_time(x) by (y) could be more effectively done than sum by (y) (sum_over_time(x)).

Sorry I have the same question, I'm trying to visualize Netflow statistics using Loki, I have the data coming in like this:

My query looks like this to calculate the rate on a line chart: sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m]). This works, but I'd like to be able to group by SrcAddr,DstAddr, or maybe some other combo like DstAddr,DstPort. I don't see a way to do this while calculating the rate.

I should add, because it appears there is a limit to 500 series per query, and Netflow can generate a very large number of unique series, the ability to use group by with sum_over_time is needed to get the series number down to a reasonable level when calculating the rate.

@owen-d @loganmc10

This is something I discussed with @slim-bean , the reasoning behind is that for sum, count and rate you can sum by the whole range vector which would produce the same result. This is not true for other operation since they are not associate when you reduce down labels before.

Since this way of doing ..._over_time() by/without() is specific to Loki and doesn't exist in Prometheus we wanted to have it only where there's no escape.

@loganmc10 You can do this:

sum by (SrcAddr,DstAddr) ( sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m]) )

This is the same as if

sum_over_time({job="netflow", SrcAddr=~"$srcaddr", DstAddr=~"$dstaddr"} | json | unwrap Bytes[5m]) by (SrcAddr,DstAddr)

Was existing, except that this is more Prometheus like, and if we can keep it Prometheus like, I think we prefer to do it.

@loganmc10 Also check out the max_query_series limit, which is configurable per deployment and per tenant:

# Limit the maximum of unique series that is returned by a metric query. # When the limit is reached an error is returned. # CLI flag: -querier.max-query-series [max_query_series: <int> | default = 500]

cyriltovena added 2 commits December 3, 2020 18:07

nit doc.

e52d1f1

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

pull-request-size bot added the size/L label Dec 3, 2020

owen-d approved these changes Dec 4, 2020

View reviewed changes

cyriltovena mentioned this pull request Dec 7, 2020

maximum of series (500) reached for a single query #3045

Open

cyriltovena merged commit 2406ea7 into grafana:master Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows by/without to be empty and available for max/min_over_time #3030

Allows by/without to be empty and available for max/min_over_time #3030

cyriltovena commented Dec 3, 2020

owen-d left a comment

owen-d Dec 4, 2020

loganmc10 Dec 6, 2020

loganmc10 Dec 6, 2020

cyriltovena Dec 7, 2020

owen-d Dec 7, 2020

Allows by/without to be empty and available for max/min_over_time #3030

Allows by/without to be empty and available for max/min_over_time #3030

Conversation

cyriltovena commented Dec 3, 2020

owen-d left a comment

Choose a reason for hiding this comment

owen-d Dec 4, 2020

Choose a reason for hiding this comment

loganmc10 Dec 6, 2020

Choose a reason for hiding this comment

loganmc10 Dec 6, 2020

Choose a reason for hiding this comment

cyriltovena Dec 7, 2020

Choose a reason for hiding this comment

owen-d Dec 7, 2020

Choose a reason for hiding this comment