rate() query performance regression in 2.0.0 #2827

scoof · 2020-10-27T21:40:38Z

Describe the bug
It looks like logqlv2 has introduced a performance regression when querying rates on a largish amount of streams. This query: rate({namespace_name=""}[5m]) (for a 1 hour period) takes 3-4 times longer with 2.0.0, than with grafana/loki:k34-8ea6c38 (provided by @slim-bean on Slack)

Expected behavior
Same or better performance than 1.6.1

Environment:

Infrastructure: Kubernetes
Deployment tool: helm

Screenshots, Promtail config, or terminal output
Stats from 2.0.0:

   "stats" : {
      "ingester" : {
         "compressedBytes" : 0,
         "decompressedBytes" : 0,
         "decompressedLines" : 0,
         "headChunkBytes" : 0,
         "headChunkLines" : 0,
         "totalBatches" : 0,
         "totalChunksMatched" : 0,
         "totalDuplicates" : 0,
         "totalLinesSent" : 0,
         "totalReached" : 0
      },
      "store" : {
         "chunksDownloadTime" : 1.745696272,
         "compressedBytes" : 17368345,
         "decompressedBytes" : 216576036,
         "decompressedLines" : 1019558,
         "headChunkBytes" : 0,
         "headChunkLines" : 0,
         "totalChunksDownloaded" : 2732,
         "totalChunksRef" : 2732,
         "totalDuplicates" : 229773
      },
      "summary" : {
         "bytesProcessedPerSecond" : 9750244,
         "execTime" : 22.212370672,
         "linesProcessedPerSecond" : 45900,
         "totalBytesProcessed" : 216576036,
         "totalLinesProcessed" : 1019558
      }
   }
}

Stats from k34-8ea6c38:

         "ingester" : {
            "compressedBytes" : 3796297,
            "decompressedBytes" : 34697528,
            "decompressedLines" : 160127,
            "headChunkBytes" : 26062200,
            "headChunkLines" : 110580,
            "totalBatches" : 0,
            "totalChunksMatched" : 413,
            "totalDuplicates" : 0,
            "totalLinesSent" : 0,
            "totalReached" : 5
         },
         "store" : {
            "chunksDownloadTime" : 0.074917783,
            "compressedBytes" : 7128290,
            "decompressedBytes" : 55934946,
            "decompressedLines" : 249630,
            "headChunkBytes" : 0,
            "headChunkLines" : 0,
            "totalChunksDownloaded" : 1878,
            "totalChunksRef" : 1878,
            "totalDuplicates" : 166809
         },
         "summary" : {
            "bytesProcessedPerSecond" : 36028231,
            "execTime" : 3.238978645,
            "linesProcessedPerSecond" : 160648,
            "totalBytesProcessed" : 116694674,
            "totalLinesProcessed" : 520337
         }
      }

The text was updated successfully, but these errors were encountered:

cyriltovena · 2020-10-28T08:02:46Z

Dropping my notes:

Now in LogQL v2 labels can different from one log line to another, this means we have to stringify labels for each lines whereas before we would only need to do that once per chunk.

We need to implement a cache, with shortcut if no modifications has been done using,

see

loki/pkg/chunkenc/memchunk.go

Line 870 in 081cc02

    
           func (e *sampleBufferedIterator) Labels() string { return e.currLabels.String() }

cyriltovena · 2020-10-28T10:40:58Z

Got a flame graph.

Going to work from this benchmark:

❯ go test -benchmem -run=^$ -mod=vendor  ./pkg/chunkenc  -bench=BenchmarkBufferedIteratorLabels  -memprofile memprofile.out -cpuprofile cpuprofile.out
goos: darwin
goarch: amd64
pkg: github.com/grafana/loki/pkg/chunkenc
BenchmarkBufferedIteratorLabels/{app="foo"}-16         	24785552	        42.1 ns/op	       6 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16         	25299765	        42.3 ns/op	       6 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16         	11777482	        85.6 ns/op	      32 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16         	24603439	        40.7 ns/op	       3 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16         	25011400	        40.3 ns/op	       3 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                        	34682602	        30.9 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                    	32869472	        33.4 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                  	32186278	        32.2 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                   	30648024	        33.4 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16         	40132675	        28.3 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16         	40285348	        28.9 ns/op	       1 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/loki/pkg/chunkenc	61.066s

Working in this branch: https://github.com/cyriltovena/loki/tree/labels-logqlv2

cyriltovena · 2020-11-03T10:30:04Z

current status:

go test -benchmem -run=^$ -mod=vendor  ./pkg/chunkenc  -bench=BenchmarkBufferedIteratorLabels
goos: darwin
goarch: amd64
pkg: github.com/grafana/loki/pkg/chunkenc
BenchmarkBufferedIteratorLabels/{app="foo"}-16         	33238780	        35.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16         	33595916	        35.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16         	27097609	        42.7 ns/op	       4 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16         	29798445	        39.6 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16         	30767776	        37.9 ns/op	       1 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                        	47617257	        24.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                    	45781831	        24.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                  	41351338	        28.0 ns/op	       2 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                   	40465006	        28.2 ns/op	       2 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16         	44210521	        26.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16         	43935969	        26.5 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/grafana/loki/pkg/chunkenc	35.506s

diff

benchcmp before.txt after.txt
benchmark                                                                                                                                old ns/op     new ns/op     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           42.1          35.9          -14.73%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  42.3          35.3          -16.55%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        85.6          42.7          -50.12%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       40.7          39.6          -2.70%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  40.3          37.9          -5.96%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 30.9          24.3          -21.36%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             33.4          24.3          -27.25%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           32.2          28.0          -13.04%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            33.4          28.2          -15.57%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         28.3          26.6          -6.01%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     28.9          26.5          -8.30%

benchmark                                                                                                                                old allocs     new allocs     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       0              0              +0.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  0              0              +0.00%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         0              0              +0.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     0              0              +0.00%

benchmark                                                                                                                                old bytes     new bytes     delta
BenchmarkBufferedIteratorLabels/{app="foo"}-16                                                                                           6             0             -100.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"-16                                                                                  6             0             -100.00%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_-16                                                                        32            4             -87.50%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms-16                                                       3             1             -66.67%
BenchmarkBufferedIteratorLabels/{app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"-16                                  3             1             -66.67%
BenchmarkBufferedIteratorLabels/rate({app="foo"}[1m])-16                                                                                 4             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}[10s]))-16                                                             4             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                           4             2             -50.00%
BenchmarkBufferedIteratorLabels/sum_by_(caller)_(rate({app="foo"}_!=_"foo"_|_logfmt[10s]))-16                                            4             2             -50.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms[10s]))-16                         1             0             -100.00%
BenchmarkBufferedIteratorLabels/sum_by_(cluster)_(rate({app="foo"}_!=_"foo"_|_logfmt_|_duration_>_10ms_and_component="tsdb"[1m]))-16     1             0             -100.00%

cyriltovena added this to the 2.1 milestone Oct 28, 2020

cyriltovena self-assigned this Oct 28, 2020

cyriltovena mentioned this issue Nov 4, 2020

Labels computation LogQLv2 #2875

Merged

cyriltovena closed this as completed in #2875 Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rate() query performance regression in 2.0.0 #2827

rate() query performance regression in 2.0.0 #2827

scoof commented Oct 27, 2020 •

edited

Loading

cyriltovena commented Oct 28, 2020

cyriltovena commented Oct 28, 2020 •

edited

Loading

cyriltovena commented Nov 3, 2020 •

edited

Loading

rate() query performance regression in 2.0.0 #2827

rate() query performance regression in 2.0.0 #2827

Comments

scoof commented Oct 27, 2020 • edited Loading

cyriltovena commented Oct 28, 2020

cyriltovena commented Oct 28, 2020 • edited Loading

cyriltovena commented Nov 3, 2020 • edited Loading

scoof commented Oct 27, 2020 •

edited

Loading

cyriltovena commented Oct 28, 2020 •

edited

Loading

cyriltovena commented Nov 3, 2020 •

edited

Loading