-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote_write: reduce blocking from garbage-collect of series #9109
Conversation
To allow a different implementation to be used when garbage-collecting. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Add a method `UpdateSeriesSegment()` which is used together with `SeriesReset()` to garbage-collect old series. This allows us to split the lock around queueManager series data and avoid blocking `Append()` while reading series from the last checkpoint. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks @bboreham
Overall this LGTM, I like the implementation via passing of a function.
Few comments that aren't really blockers:
- There are a few comments that aren't full sentences, doesn't really bother me but that's the convention we've gone with.
- Personally sometimes I find it hard to track down implementations of function types, I'm curious what you think about having
implements segmentReadFn
in the comments forreadSegment
andreadSegmentForGC
. In this case the naming is pretty self explanatory so I don't think it's necessary to make a change here. - Given we're making this and other changes to improve samples throughput while reading a checkpoint, I think it's worth adding a benchmark. If we just open an issue for now with a note to compare main/the most recent release at the time of the benchmark vs the release prior to your changes then that's okay.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Reduce the total number of samples per iteration from 5000*5000 (25 million) which is too big for my laptop, to 1*10000. Extend `createTimeseries()` to add additional labels, so that the queue manager is doing more realistic work. Move the Append() call to a background goroutine - this works because TestWriteClient uses a WaitGroup to signal completion. Call `StoreSeries()` and `SeriesReset()` while adding samples, to simulate the garbage-collection that wal.Watcher does. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This matches what Watcher.garbageCollectSeries() is doing now Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
I fixed the comments, and updated the existing This benchmark crashed on my machine before I changed it, running out of memory. It attempted to enqueue 25 million samples in one go, which is not very realistic. I reduced the number of samples and extended the label set on each series to give Results, comparing calling
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Personally I think the current benchmark should still call the StoreSeries
function and we should have a separate benchmark for checkpoint reading to differentiate between StoreSeries
and UpdateSeriesSegment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Separate benchmarks could be nice, I am curious how much of the current benchmark is spent processing samples vs running the gc? If they are similar then maybe one benchmark is OK.
It does, before the timing loop, just as it did before this PR. Can you describe the real-life scenario you want the benchmark to imitate? |
Removing the two lines calling
|
Thanks, I am happy with this as is. I will let Callum respond and merge if he is happy as well. |
I must have misread something then, LGTM
I think it would be worth exploring a benchmark between the two, but that's not necessary for this PR. |
Fixes #9082
As suggested at #9082 (comment),
Add a method
UpdateSeriesSegment()
which is used together withSeriesReset()
to garbage-collect old series.This allows us to split the lock around queueManager series data and avoid blocking
Append()
while reading series from the last checkpoint.Also
UpdateSeriesSegment()
is massively less expensive thanStoreSeries()
when called with existing series, unless something like #9084 is merged.