Cache overlapping blocks #2239

cyriltovena · 2020-06-18T20:42:33Z

We (@slim-bean and I) realized that the batchIterator in Loki may re-process the same data over and over when more chunks are overlapping than the batch size. The side effect is that some users may process 30GIB of logs when in fact the real data is just 300MB. This affects a lot of queries.

This PR introduces a caches for block that are overlapping to avoid the costly decompression if we need to re-use a block when it overlaps with the next chunk. This is required for correctly deduping.

I've made a benchmark and run it before and after:

❯ benchcmp run1.txt run5.txt
benchmark                                old ns/op     new ns/op     delta
Benchmark_store_OverlappingChunks-16     26997116      23004418      -14.79%

benchmark                                old allocs     new allocs     delta
Benchmark_store_OverlappingChunks-16     352996         339839         -3.73%

benchmark                                old bytes     new bytes     delta
Benchmark_store_OverlappingChunks-16     31978684      9592088       -70.00%

I've also run this in our ops cluster and realize a 25% speed up for filter queries.

…ssing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

codecov-commenter · 2020-06-18T21:10:01Z

Codecov Report

Merging #2239 into master will increase coverage by 0.19%.
The diff coverage is 87.67%.

@@            Coverage Diff             @@
##           master    #2239      +/-   ##
==========================================
+ Coverage   62.07%   62.27%   +0.19%     
==========================================
  Files         156      157       +1     
  Lines       12531    12650     +119     
==========================================
+ Hits         7779     7878      +99     
- Misses       4145     4161      +16     
- Partials      607      611       +4

Impacted Files	Coverage Δ
pkg/chunkenc/dumb_chunk.go	`0.00% <0.00%> (ø)`
pkg/chunkenc/interface.go	`87.50% <ø> (ø)`
pkg/chunkenc/memchunk.go	`70.83% <27.27%> (-2.93%)`	⬇️
pkg/storage/store.go	`67.16% <71.42%> (-1.02%)`	⬇️
pkg/storage/batch.go	`84.31% <94.02%> (ø)`
pkg/storage/lazy_chunk.go	`97.29% <97.29%> (ø)`
pkg/ingester/stream.go	`77.50% <100.00%> (ø)`
pkg/storage/cache.go	`100.00% <100.00%> (ø)`
pkg/promtail/targets/tailer.go	`76.13% <0.00%> (-2.28%)`	⬇️
pkg/logql/evaluator.go	`92.30% <0.00%> (-0.42%)`	⬇️
... and 2 more

owen-d · 2020-06-19T13:50:23Z

pkg/storage/lazy_chunk.go

+	for _, b := range blocks {
+		// if we have already processed and cache block let's use it.
+		if cache, ok := c.overlappingBlocks[b.Offset()]; ok {
+			clone := *cache


slim-bean · 2020-06-19T18:19:24Z

pkg/chunkenc/memchunk.go

+			blocks = append(blocks, b)
+		}
+	}
+	return blocks


could we gain anything by slicing up the existing c.blocks instead of allocating a new slice? Also curious if c.blocks was a slice of pointers if we could save a copy of the block here?

I’ll check if it does help. There’s other place where I don’t reslice intentionally because reslicing keep underlying references.

slim-bean

Just one thought/question, aside from that this looks great!

cyriltovena added 4 commits June 17, 2020 19:00

Caches block iteration that are overlapping together to avoid reproce…

149aed3

…ssing. Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

lint.

b3c5a06

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Moar tests.

20b6357

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

Final touch.

921caa8

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

cyriltovena requested review from owen-d and slim-bean June 18, 2020 20:42

pull-request-size bot added the size/XL label Jun 18, 2020

lint.

86ee2b9

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

owen-d reviewed Jun 19, 2020

View reviewed changes

owen-d approved these changes Jun 19, 2020

View reviewed changes

slim-bean reviewed Jun 19, 2020

View reviewed changes

slim-bean approved these changes Jun 19, 2020

View reviewed changes

cyriltovena merged commit 6ab832b into grafana:master Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache overlapping blocks #2239

Cache overlapping blocks #2239

cyriltovena commented Jun 18, 2020

codecov-commenter commented Jun 18, 2020

owen-d Jun 19, 2020

slim-bean Jun 19, 2020

cyriltovena Jun 19, 2020

slim-bean left a comment

Cache overlapping blocks #2239

Cache overlapping blocks #2239

Conversation

cyriltovena commented Jun 18, 2020

codecov-commenter commented Jun 18, 2020

Codecov Report

owen-d Jun 19, 2020

Choose a reason for hiding this comment

slim-bean Jun 19, 2020

Choose a reason for hiding this comment

cyriltovena Jun 19, 2020

Choose a reason for hiding this comment

slim-bean left a comment

Choose a reason for hiding this comment