storage: Split chunks if more than 120 samples #8582

metalmatze · 2021-03-10T18:20:49Z

This is my attempt at fixing #5862.

It's based on the work that @bwplotka recently did and uses the NewSeriesSetToChunkSet.
While iterating over the samples we keep track of how many we've appened and if that's more than 120 we append the current chunk to the chunk slice creating a new one to keep appending to.

It's probably not perfect yet but I'd rather get feedback early.

/cc @codesome @hdost

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze · 2021-03-26T17:57:51Z

A friendly ping and reminder for @codesome 😊

roidelapluie · 2021-03-26T19:09:04Z

Did you benchmark this?

codesome

LGTM, only nits. And as Julien said, do you have any benchmark results to show difference in size and time taken to do compaction?

codesome · 2021-03-28T12:38:18Z

storage/merge_test.go

+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)),  // 0 - 90
+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150


Suggested change

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // [0 - 90)

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // [60 - 150)

codesome · 2021-03-28T12:38:48Z

storage/merge_test.go

+			),
+		},
+		{
+			name: "150 overlapping split chunk",


Suggested change

name: "150 overlapping split chunk",

name: "150 overlapping samples, split chunk",

codesome · 2021-03-28T12:39:20Z

storage/merge_test.go

+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110
+				NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110


Suggested change

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // [0 - 110)

NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // [60 - 110)

codesome · 2021-03-28T12:40:57Z

storage/series.go

+				MaxTime: maxt,
+				Chunk:   chk,
+			})
+			// TODO: There's probably a nicer way than doing this here.


I would remove this TODO

yea, there is not (:

codesome · 2021-03-28T12:41:37Z

storage/series.go

 	seriesIter := s.Series.Iterator()
 	for seriesIter.Next() {
+		// Create a new chunk if too many samples in the current one


Also in other places where missed.

Suggested change

// Create a new chunk if too many samples in the current one

// Create a new chunk if too many samples in the current one.

roidelapluie · 2021-03-29T20:47:07Z

/prombench main

prombot · 2021-03-29T20:47:07Z

Incorrect prombench syntax, please find correct syntax here.

roidelapluie · 2021-03-29T20:47:58Z

/prombench v2.26.0-rc.0

prombot · 2021-03-29T20:47:59Z

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-8582 and v2.26.0-rc.0

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.26.0-rc.0

codesome · 2021-03-30T08:09:27Z

Not much difference with prombench as expected as the split won't come into picture. Mostly relevant in vertical compaction and compacting blocks not written by Prometheus.

roidelapluie · 2021-03-30T20:44:40Z

/prombench cancel

prombot · 2021-03-30T20:44:42Z

Benchmark cancel is in progress.

bwplotka

Nice, some comments from my side.

I think it's great to see finally - it will bring more consistent results on vertical compactions.

bwplotka · 2021-03-31T10:54:17Z

storage/series.go

@@ -217,8 +217,11 @@ type seriesToChunkEncoder struct {
 	Series
 }

-// TODO(bwplotka): Currently encoder will just naively build one chunk, without limit. Split it: https://github.com/prometheus/tsdb/issues/670
+const seriesToChunkEncoderSplit = 120


hm, I wonder if we could use existing constant:

prometheus/tsdb/head.go

Line 2334 in d614ae9

const samplesPerChunk = 120

🤔 We only need to make sure we don't do any cycling package link.

I guess we could try to figure something out. As you suspected right now there will be a import cycle and we would need to move the constant elsewhere.
In the end it might not make such a big difference. I don't expect this change before Prometheus v3.0.

bwplotka · 2021-03-31T10:54:42Z

storage/series.go

 func (s *seriesToChunkEncoder) Iterator() chunks.Iterator {
+	chks := []chunks.Meta{}


Can we create this closer to the usage?

bwplotka · 2021-03-31T10:55:07Z

storage/series.go

+				MaxTime: maxt,
+				Chunk:   chk,
+			})
+			// TODO: There's probably a nicer way than doing this here.


yea, there is not (:

bwplotka · 2021-03-31T10:55:24Z

storage/series.go

+				return errChunksIterator{err: err}
+			}
+			mint = int64(math.MaxInt64)
+			// maxt is immediately overwritten below


Can we have this comment full sentence?

tsdb/tsdbutil/chunks.go

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze · 2021-04-09T21:37:01Z

Addressed the comments. Thanks for the good feedback 👍

yeya24 · 2021-05-17T21:50:04Z

It would be nice to have this feature merged!

bwplotka · 2021-05-18T16:37:20Z

Thanks!

storage: Split chunks if more than 120 samples

28ff171

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

metalmatze requested a review from codesome as a code owner March 10, 2021 18:20

storage: Don't set maxt which is overwritten right away

16ef84c

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

codesome reviewed Mar 28, 2021

View reviewed changes

prombot added the prombench label Mar 29, 2021

bwplotka approved these changes Mar 31, 2021

View reviewed changes

metalmatze added 3 commits April 9, 2021 23:35

storage: Improve comments on merge_test

0528e1b

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

storage: Improve comments and move code closer to usage

ec9b3b6

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

tsdb/tsdbutil: Add comment for GenerateSamples

c194c1f

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>

bwplotka approved these changes May 18, 2021

View reviewed changes

bwplotka merged commit 7e7efab into prometheus:main May 18, 2021

yeya24 mentioned this pull request May 18, 2021

tsdb: Avoid chunks with >120 samples in MergeOverlappingChunks #5862

Closed

metalmatze deleted the split-chunks branch May 30, 2021 12:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Split chunks if more than 120 samples #8582

storage: Split chunks if more than 120 samples #8582

metalmatze commented Mar 10, 2021

metalmatze commented Mar 26, 2021

roidelapluie commented Mar 26, 2021

codesome left a comment

codesome Mar 28, 2021

codesome Mar 28, 2021

codesome Mar 28, 2021

codesome Mar 28, 2021

bwplotka Mar 31, 2021

codesome Mar 28, 2021

roidelapluie commented Mar 29, 2021

prombot commented Mar 29, 2021

roidelapluie commented Mar 29, 2021

prombot commented Mar 29, 2021

codesome commented Mar 30, 2021

roidelapluie commented Mar 30, 2021

prombot commented Mar 30, 2021

bwplotka left a comment

bwplotka Mar 31, 2021

metalmatze Apr 9, 2021

bwplotka Mar 31, 2021

bwplotka Mar 31, 2021

bwplotka Mar 31, 2021

metalmatze commented Apr 9, 2021

yeya24 commented May 17, 2021

bwplotka commented May 18, 2021

		NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90
		NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150

	name: "150 overlapping split chunk",
	name: "150 overlapping samples, split chunk",

	// Create a new chunk if too many samples in the current one
	// Create a new chunk if too many samples in the current one.

		func (s *seriesToChunkEncoder) Iterator() chunks.Iterator {
		chks := []chunks.Meta{}

storage: Split chunks if more than 120 samples #8582

storage: Split chunks if more than 120 samples #8582

Conversation

metalmatze commented Mar 10, 2021

metalmatze commented Mar 26, 2021

roidelapluie commented Mar 26, 2021

codesome left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roidelapluie commented Mar 29, 2021

prombot commented Mar 29, 2021

roidelapluie commented Mar 29, 2021

prombot commented Mar 29, 2021

codesome commented Mar 30, 2021

roidelapluie commented Mar 30, 2021

prombot commented Mar 30, 2021

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metalmatze commented Apr 9, 2021

yeya24 commented May 17, 2021

bwplotka commented May 18, 2021