storage: Refactor block compaction to allow shard-splitting #2366

cyriltovena · 2023-09-06T08:39:02Z

This refactors the current compaction function Compact to create a new CompactWithSplitting(...,shardCount uint64) function.

The new function compacts and deduplicates data away from input blocks while splitting it into shardCount shards new blocks. Some output blocks might be emtpy and not returned.

Right now the shards key is the labels finguerprint (hash of all labels pair). I suspect we might want to add different sharding strategy in the future after some experiment.

To split into multiple blocks we don't hook into the reader anymore, we read row by row and push to the correct output block, turns out to be much cleaner. Specially since we don't have to write row groups ourselves anymore I discovered that if you configure correctly the parquet writer it will automatically handle flushing row groups.

I'm still working on adding more tests now.

PS: I suggest to review this in split view.

cyriltovena · 2023-09-06T08:44:33Z

pkg/phlaredb/compact.go

-	sr := symbolsRewriter{
-		profiles:  it,
-		rewriters: make(map[BlockReader]*symdb.Rewriter, len(blocks)),
+func newSymbolsRewriter(path string) *symbolsRewriter {


@kolesnikovae

We now creates rewriters lazily as soon as I see the first row for a given block. Rewriters are not shared across multiple output blocks.

WDYT ?

I like it :) Each block has to have its own symdb

kolesnikovae

LGTM

cyriltovena · 2023-09-06T15:33:12Z

Just need to finish writing some more robust tests then we should be good. I spotted a bug where start/end time is from original block and not actual profiles so I'll have to fix this.

pkg/phlaredb/compact_test.go

pkg/phlaredb/compact.go

storage: Refactor block compaction to allow shard-splitting

6020102

cyriltovena requested a review from a team as a code owner September 6, 2023 08:39

cyriltovena commented Sep 6, 2023

View reviewed changes

Rename to CompactWithSplitting for consistency

63d1beb

kolesnikovae approved these changes Sep 6, 2023

View reviewed changes

cyriltovena added 2 commits September 6, 2023 11:29

Introduce back compaction series testing.

8129c06

Add sharding compaction level

cb310f8

cyriltovena marked this pull request as draft September 6, 2023 13:00

Add some tests for CompactWithSplitting

64d22fd

cyriltovena commented Sep 6, 2023

View reviewed changes

pkg/phlaredb/compact_test.go Show resolved Hide resolved

cyriltovena commented Sep 6, 2023

View reviewed changes

pkg/phlaredb/compact.go Show resolved Hide resolved

kolesnikovae mentioned this pull request Sep 7, 2023

Fix symbols split-compaction #2371

Merged

Fix symbols split-compaction (#2371)

22d6271

cyriltovena added the storage Low level storage matters label Sep 7, 2023

cyriltovena added 2 commits September 7, 2023 11:26

Merge remote-tracking branch 'origin/main' into feat/split-compaction

6a56550

Fixes a race that was actually surfacing another real issue.

816eaaa

cyriltovena marked this pull request as ready for review September 7, 2023 09:52

cyriltovena added 3 commits September 7, 2023 11:59

Add a tests for meta min/max time

d100b30

Fixes meta min/max after split

bd4f5e7

Merge branch 'main' into feat/split-compaction

aefb35f

cyriltovena enabled auto-merge (squash) September 7, 2023 12:37

cyriltovena merged commit 63561d7 into main Sep 7, 2023
15 checks passed

cyriltovena deleted the feat/split-compaction branch September 7, 2023 12:47

cyriltovena mentioned this pull request Sep 29, 2023

feat: Scalable Compactor #2466

Merged

7 tasks

jdbaldry mentioned this pull request Nov 3, 2023

jdb/2023 11 use github action jdbaldry/pyroscope#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Refactor block compaction to allow shard-splitting #2366

storage: Refactor block compaction to allow shard-splitting #2366

cyriltovena commented Sep 6, 2023 •

edited

cyriltovena Sep 6, 2023

kolesnikovae Sep 6, 2023 •

edited

kolesnikovae left a comment

cyriltovena commented Sep 6, 2023

storage: Refactor block compaction to allow shard-splitting #2366

storage: Refactor block compaction to allow shard-splitting #2366

Conversation

cyriltovena commented Sep 6, 2023 • edited

cyriltovena Sep 6, 2023

Choose a reason for hiding this comment

kolesnikovae Sep 6, 2023 • edited

Choose a reason for hiding this comment

kolesnikovae left a comment

Choose a reason for hiding this comment

cyriltovena commented Sep 6, 2023

cyriltovena commented Sep 6, 2023 •

edited

kolesnikovae Sep 6, 2023 •

edited