quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch #11905

cstyan · 2024-02-09T02:11:01Z

EDIT: I also found the bug that was causing us to OOM when executing instant queries, see the 2nd commit

This should reduce the memory allocated by the vector iteration by at least 25% but as much as ~80%. The quantile vector iteration accounts for at least 15% of all in use space last time I tested (and during that time queriers we're OOM'ing as a result of quantile sharding).

However, with the usage of this alternate store type we give up some amount of relative accuracy guarantees for lower quantiles. From their readme:

We also provide implementations, returned by LogCollapsingLowestDenseDDSketch(relativeAccuracy, maxNumBins) and LogCollapsingHighestDenseDDSketch(relativeAccuracy, maxNumBins), where the q-quantile will be accurate up to the specified relative error for q that is not too small (or large). Concretely, the q-quantile will be accurate up to the specified relative error as long as it belongs to one of the m bins kept by the sketch. For instance, If the values are time in seconds, maxNumBins = 2048 covers a time range from 80 microseconds to 1 year.

Benchmark:

goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql
cpu: AMD Ryzen 9 5950X 16-Core Processor
                                                      │    base     │                new2                │
                                                      │   sec/op    │   sec/op     vs base               │
QuantileBatchRangeVectorIteratorAt/1-samples-32         153.4n ± 1%   169.8n ± 1%  +10.76% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/1000-samples-32      30.17µ ± 1%   10.63µ ± 1%  -64.76% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/10000-samples-32     281.1µ ± 1%   101.5µ ± 2%  -63.88% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/100000-samples-32    2.355m ± 2%   1.006m ± 2%  -57.26% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/1000000-samples-32   23.50m ± 2%   10.11m ± 1%  -56.99% (p=0.002 n=6)
geomean                                                 148.4µ        71.47µ       -51.84%

                                                      │     base     │                 new2                 │
                                                      │     B/op     │    B/op      vs base                 │
QuantileBatchRangeVectorIteratorAt/1-samples-32          80.00 ±  0%   80.00 ±  0%        ~ (p=1.000 n=6) ¹
QuantileBatchRangeVectorIteratorAt/1000-samples-32      157.00 ±  1%   80.00 ±  0%  -49.04% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/10000-samples-32     155.00 ±  3%   80.00 ±  1%  -48.39% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/100000-samples-32    109.00 ± 76%   83.00 ±  0%  -23.85% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/1000000-samples-32   1082.5 ±  4%   222.0 ± 11%  -79.49% (p=0.002 n=6)
geomean                                                  187.2         98.84        -47.19%
¹ all samples are equal

                                                      │    base    │                new2                 │
                                                      │ allocs/op  │ allocs/op   vs base                 │
QuantileBatchRangeVectorIteratorAt/1-samples-32         3.000 ± 0%   3.000 ± 0%        ~ (p=1.000 n=6) ¹
QuantileBatchRangeVectorIteratorAt/1000-samples-32      6.000 ± 0%   3.000 ± 0%  -50.00% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/10000-samples-32     6.000 ± 0%   3.000 ± 0%  -50.00% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/100000-samples-32    4.000 ± 0%   3.000 ± 0%  -25.00% (p=0.002 n=6)
QuantileBatchRangeVectorIteratorAt/1000000-samples-32   7.000 ± 0%   3.000 ± 0%  -57.14% (p=0.002 n=6)
geomean                                                 4.967        3.000       -39.60%
¹ all samples are equal

And profiling

Signed-off-by: Callum Styan <callumstyan@gmail.com>

evaluate more steps than we needed to, especially in the case of instant queries! Signed-off-by: Callum Styan <callumstyan@gmail.com>

jeschkies

Nice find. We altos take the chance and rename the join methods to merge.

jeschkies · 2024-02-17T09:17:54Z

pkg/logql/downstream_test.go

@@ -181,6 +181,70 @@ func TestMappingEquivalenceSketches(t *testing.T) {
 	}
 }

+func TestMappingEquivalenceSketches_Instant(t *testing.T) {


What makes this an instant query? What do you think about song another loop to the original equivalence tests?

The query params right after this line, in the case of an instant query we set start and end to the same value.

jeschkies · 2024-02-17T09:18:36Z

pkg/logql/quantile_over_time_sketch.go

@@ -283,6 +283,11 @@ func JoinQuantileSketchVector(next bool, r StepResult, stepEvaluator StepEvaluat
 		if stepEvaluator.Error() != nil {
 			return nil, stepEvaluator.Error()
 		}
+		// Don't continue to evaluate more steps if we already filled the
+		// # of steps we should have based on start/end and step params.
+		if len(result) == stepCount {


How's that done in the original join?

Found it

loki/pkg/logql/engine.go

Line 388 in 918e736

if GetRangeType(q.params) == InstantType {

Signed-off-by: Callum Styan <callumstyan@gmail.com>

jeschkies

🎉

Signed-off-by: Callum Styan <callumstyan@gmail.com>

…sage) and use CollapsingLowestDense store for ddsketch (#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

…uery mem usage) and use CollapsingLowestDense store for ddsketch (#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com>

…sage) and use CollapsingLowestDense store for ddsketch (grafana#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan requested a review from a team as a code owner February 9, 2024 02:11

pull-request-size bot added the size/XS label Feb 9, 2024

jeschkies approved these changes Feb 9, 2024

View reviewed changes

cstyan added 2 commits February 16, 2024 17:38

use CollapsingLowestDense store for ddsketch

23b1e38

Signed-off-by: Callum Styan <callumstyan@gmail.com>

fix bug in quantile over time sharding that could cause us to

fbdb52a

evaluate more steps than we needed to, especially in the case of instant queries! Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan force-pushed the callum-quantile-collapsing-lowest branch from dd682ab to fbdb52a Compare February 17, 2024 01:48

pull-request-size bot added size/M and removed size/XS labels Feb 17, 2024

cstyan changed the title ~~quantile sharding; use CollapsingLowestDense store for ddsketch~~ quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch Feb 17, 2024

jeschkies requested changes Feb 17, 2024

View reviewed changes

cstyan added 2 commits February 19, 2024 09:51

Move and rework the instant query check

0a56c2e

Signed-off-by: Callum Styan <callumstyan@gmail.com>

simplify test for sharded quantile instant query

d4407d3

Signed-off-by: Callum Styan <callumstyan@gmail.com>

jeschkies approved these changes Feb 20, 2024

View reviewed changes

rename join to merge

1a3e0b2

Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan merged commit b91bcec into main Feb 20, 2024
8 checks passed

cstyan deleted the callum-quantile-collapsing-lowest branch February 20, 2024 20:40

cstyan added a commit that referenced this pull request Feb 20, 2024

quantile sharding; fix bug related to # of steps (instant query mem u…

0c625f7

…sage) and use CollapsingLowestDense store for ddsketch (#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan added a commit that referenced this pull request Feb 20, 2024

backport! quantile sharding; fix bug related to # of steps (instant q…

c10eedf

…uery mem usage) and use CollapsingLowestDense store for ddsketch (#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

cstyan added a commit that referenced this pull request Feb 20, 2024

backport #11905 to k190 (#12013)

8bbb892

Signed-off-by: Callum Styan <callumstyan@gmail.com>

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

quantile sharding; fix bug related to # of steps (instant query mem u…

4e49226

…sage) and use CollapsingLowestDense store for ddsketch (grafana#11905) Signed-off-by: Callum Styan <callumstyan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch #11905

quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch #11905

cstyan commented Feb 9, 2024 •

edited

jeschkies left a comment

jeschkies Feb 17, 2024

cstyan Feb 19, 2024

jeschkies Feb 17, 2024

jeschkies Feb 17, 2024

jeschkies left a comment

quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch #11905

quantile sharding; fix bug related to # of steps (instant query mem usage) and use CollapsingLowestDense store for ddsketch #11905

Conversation

cstyan commented Feb 9, 2024 • edited

jeschkies left a comment

Choose a reason for hiding this comment

jeschkies Feb 17, 2024

Choose a reason for hiding this comment

cstyan Feb 19, 2024

Choose a reason for hiding this comment

jeschkies Feb 17, 2024

Choose a reason for hiding this comment

jeschkies Feb 17, 2024

Choose a reason for hiding this comment

jeschkies left a comment

Choose a reason for hiding this comment

cstyan commented Feb 9, 2024 •

edited