Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Split chunks if more than 120 samples #8582

Merged
merged 5 commits into from
May 18, 2021

Conversation

metalmatze
Copy link
Member

This is my attempt at fixing #5862.

It's based on the work that @bwplotka recently did and uses the NewSeriesSetToChunkSet.
While iterating over the samples we keep track of how many we've appened and if that's more than 120 we append the current chunk to the chunk slice creating a new one to keep appending to.

It's probably not perfect yet but I'd rather get feedback early.

/cc @codesome @hdost

Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
@metalmatze
Copy link
Member Author

A friendly ping and reminder for @codesome 😊

@roidelapluie
Copy link
Member

Did you benchmark this?

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only nits. And as Julien said, do you have any benchmark results to show difference in size and time taken to do compaction?

Comment on lines 481 to 482
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // [0 - 90)
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // [60 - 150)

),
},
{
name: "150 overlapping split chunk",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "150 overlapping split chunk",
name: "150 overlapping samples, split chunk",

Comment on lines 471 to 472
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // [0 - 110)
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // [60 - 110)

MaxTime: maxt,
Chunk: chk,
})
// TODO: There's probably a nicer way than doing this here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this TODO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, there is not (:

seriesIter := s.Series.Iterator()
for seriesIter.Next() {
// Create a new chunk if too many samples in the current one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in other places where missed.

Suggested change
// Create a new chunk if too many samples in the current one
// Create a new chunk if too many samples in the current one.

@roidelapluie
Copy link
Member

/prombench main

@prombot
Copy link
Contributor

prombot commented Mar 29, 2021

Incorrect prombench syntax, please find correct syntax here.

@roidelapluie
Copy link
Member

/prombench v2.26.0-rc.0

@prombot
Copy link
Contributor

prombot commented Mar 29, 2021

⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

Compared versions: PR-8582 and v2.26.0-rc.0

After successful deployment, the benchmarking metrics can be viewed at:

Other Commands:
To stop benchmark: /prombench cancel
To restart benchmark: /prombench restart v2.26.0-rc.0

@codesome
Copy link
Member

Not much difference with prombench as expected as the split won't come into picture. Mostly relevant in vertical compaction and compacting blocks not written by Prometheus.

@roidelapluie
Copy link
Member

/prombench cancel

@prombot
Copy link
Contributor

prombot commented Mar 30, 2021

Benchmark cancel is in progress.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, some comments from my side.

I think it's great to see finally - it will bring more consistent results on vertical compactions.

@@ -217,8 +217,11 @@ type seriesToChunkEncoder struct {
Series
}

// TODO(bwplotka): Currently encoder will just naively build one chunk, without limit. Split it: https://github.com/prometheus/tsdb/issues/670
const seriesToChunkEncoderSplit = 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I wonder if we could use existing constant:

const samplesPerChunk = 120
🤔 We only need to make sure we don't do any cycling package link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could try to figure something out. As you suspected right now there will be a import cycle and we would need to move the constant elsewhere.
In the end it might not make such a big difference. I don't expect this change before Prometheus v3.0.

func (s *seriesToChunkEncoder) Iterator() chunks.Iterator {
chks := []chunks.Meta{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create this closer to the usage?

MaxTime: maxt,
Chunk: chk,
})
// TODO: There's probably a nicer way than doing this here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, there is not (:

return errChunksIterator{err: err}
}
mint = int64(math.MaxInt64)
// maxt is immediately overwritten below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this comment full sentence?

tsdb/tsdbutil/chunks.go Show resolved Hide resolved
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
@metalmatze
Copy link
Member Author

Addressed the comments. Thanks for the good feedback 👍

@yeya24
Copy link
Contributor

yeya24 commented May 17, 2021

It would be nice to have this feature merged!

@bwplotka bwplotka merged commit 7e7efab into prometheus:main May 18, 2021
@bwplotka
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants