-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Split chunks if more than 120 samples #8582
Conversation
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
A friendly ping and reminder for @codesome 😊 |
Did you benchmark this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only nits. And as Julien said, do you have any benchmark results to show difference in size and time taken to do compaction?
storage/merge_test.go
Outdated
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90 | ||
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // 0 - 90 | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // 90 - 150 | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 90)), // [0 - 90) | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 90)), // [60 - 150) |
storage/merge_test.go
Outdated
), | ||
}, | ||
{ | ||
name: "150 overlapping split chunk", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "150 overlapping split chunk", | |
name: "150 overlapping samples, split chunk", |
storage/merge_test.go
Outdated
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110 | ||
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // 0 - 110 | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // 60 - 110 | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(0, 110)), // [0 - 110) | |
NewListChunkSeriesFromSamples(labels.FromStrings("bar", "baz"), tsdbutil.GenerateSamples(60, 50)), // [60 - 110) |
storage/series.go
Outdated
MaxTime: maxt, | ||
Chunk: chk, | ||
}) | ||
// TODO: There's probably a nicer way than doing this here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, there is not (:
storage/series.go
Outdated
seriesIter := s.Series.Iterator() | ||
for seriesIter.Next() { | ||
// Create a new chunk if too many samples in the current one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in other places where missed.
// Create a new chunk if too many samples in the current one | |
// Create a new chunk if too many samples in the current one. |
/prombench main |
Incorrect prombench syntax, please find correct syntax here. |
/prombench v2.26.0-rc.0 |
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️ Compared versions: After successful deployment, the benchmarking metrics can be viewed at: Other Commands: |
Not much difference with prombench as expected as the split won't come into picture. Mostly relevant in vertical compaction and compacting blocks not written by Prometheus. |
/prombench cancel |
Benchmark cancel is in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, some comments from my side.
I think it's great to see finally - it will bring more consistent results on vertical compactions.
@@ -217,8 +217,11 @@ type seriesToChunkEncoder struct { | |||
Series | |||
} | |||
|
|||
// TODO(bwplotka): Currently encoder will just naively build one chunk, without limit. Split it: https://github.com/prometheus/tsdb/issues/670 | |||
const seriesToChunkEncoderSplit = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I wonder if we could use existing constant:
Line 2334 in d614ae9
const samplesPerChunk = 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could try to figure something out. As you suspected right now there will be a import cycle and we would need to move the constant elsewhere.
In the end it might not make such a big difference. I don't expect this change before Prometheus v3.0.
storage/series.go
Outdated
func (s *seriesToChunkEncoder) Iterator() chunks.Iterator { | ||
chks := []chunks.Meta{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create this closer to the usage?
storage/series.go
Outdated
MaxTime: maxt, | ||
Chunk: chk, | ||
}) | ||
// TODO: There's probably a nicer way than doing this here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, there is not (:
storage/series.go
Outdated
return errChunksIterator{err: err} | ||
} | ||
mint = int64(math.MaxInt64) | ||
// maxt is immediately overwritten below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this comment full sentence?
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matthias Loibl <mail@matthiasloibl.com>
Addressed the comments. Thanks for the good feedback 👍 |
It would be nice to have this feature merged! |
Thanks! |
This is my attempt at fixing #5862.
It's based on the work that @bwplotka recently did and uses the
NewSeriesSetToChunkSet
.While iterating over the samples we keep track of how many we've appened and if that's more than 120 we append the current chunk to the chunk slice creating a new one to keep appending to.
It's probably not perfect yet but I'd rather get feedback early.
/cc @codesome @hdost