Query storage by iterating through chunks by batches. #782

cyriltovena · 2019-07-18T22:20:52Z

This changes how the store retrieves chunks, previously one chunks per stream were retrieved first, and in case of large query (high cardinality) even only one chunks per stream could return 3k chunks or more which can easily OOM Loki (3k*2m=6gib).

Now chunks are retrieved by batches of predefined size, default to 50, we iterate through those 50 chunks by first fetching one chunks per stream to filter out streams with matchers and finally we load the full batch (we don't need the lazy iterator anymore) and create an iterator out of it, when the iterator is exhausted we pull the next batch, until there is no more chunks to fetch.

I'd like to also mention that the slice of chunks ref within the batch iterator is split (copy and not resliced) when retrieving a batch to avoid to keep reference from loaded and used chunks.

I've added tests to make sure direction and overlapping are correctly handled but I've also took the time to add all missing tests within the storage package which brings this package to 90% coverage.

/cc @gouthamve I believe this is the continuation of your work so it should be fairly simple for your to review.

This should put an end to any memory issues related to query, except for the labels query which is also on my todo list.

cyriltovena · 2019-07-29T20:04:17Z

WDYT @gouthamve ?

pkg/storage/iterator.go

gouthamve

LGTM. Just curious if we can optimise by filtering in the beginning rather than in every batch.

gouthamve · 2019-08-02T10:10:10Z

pkg/storage/iterator.go

+			through = time.Unix(0, nextChunk.Chunk.From.UnixNano())
+		}
+		// we save all overlapping chunks as they are also needed in the next batch to properly order entries.
+		it.lastOverlapping = []*chunkenc.LazyChunk{}


it.lastOverlapping = it.lastOverlapping[:0]?

yep I think this would work.

Also adds more tests for the store

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

cyriltovena requested a review from gouthamve July 18, 2019 22:20

cyriltovena commented Aug 1, 2019

View reviewed changes

pkg/storage/iterator.go Show resolved Hide resolved

gouthamve approved these changes Aug 2, 2019

View reviewed changes

cyriltovena and others added 4 commits August 2, 2019 09:26

Query storage by iterating through chunks by batches.

342f5fa

Also adds more tests for the store

comments adjustment

e294e7e

Add comments for tricky code

34af657

Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>

reuse overlapping array in batch iterator

ce7b5a4

cyriltovena force-pushed the chunk-iterator branch from 943bc59 to ce7b5a4 Compare August 2, 2019 13:26

cyriltovena merged commit 2b40392 into grafana:master Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query storage by iterating through chunks by batches. #782

Query storage by iterating through chunks by batches. #782

cyriltovena commented Jul 18, 2019 •

edited

Loading

cyriltovena commented Jul 29, 2019

gouthamve left a comment

gouthamve Aug 2, 2019

cyriltovena Aug 2, 2019

Query storage by iterating through chunks by batches. #782

Query storage by iterating through chunks by batches. #782

Conversation

cyriltovena commented Jul 18, 2019 • edited Loading

cyriltovena commented Jul 29, 2019

gouthamve left a comment

Choose a reason for hiding this comment

gouthamve Aug 2, 2019

Choose a reason for hiding this comment

cyriltovena Aug 2, 2019

Choose a reason for hiding this comment

cyriltovena commented Jul 18, 2019 •

edited

Loading