Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote_write: reduce blocking from garbage-collect of series #9109

Merged
merged 5 commits into from
Jul 27, 2021

Conversation

bboreham
Copy link
Member

@bboreham bboreham commented Jul 22, 2021

Fixes #9082

As suggested at #9082 (comment),
Add a method UpdateSeriesSegment() which is used together with SeriesReset() to garbage-collect old series.
This allows us to split the lock around queueManager series data and avoid blocking Append() while reading series from the last checkpoint.

Also UpdateSeriesSegment() is massively less expensive than StoreSeries() when called with existing series, unless something like #9084 is merged.

To allow a different implementation to be used when garbage-collecting.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Add a method `UpdateSeriesSegment()` which is used together with
`SeriesReset()` to garbage-collect old series. This allows us to
split the lock around queueManager series data and avoid blocking
`Append()` while reading series from the last checkpoint.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @bboreham

Overall this LGTM, I like the implementation via passing of a function.

Few comments that aren't really blockers:

  • There are a few comments that aren't full sentences, doesn't really bother me but that's the convention we've gone with.
  • Personally sometimes I find it hard to track down implementations of function types, I'm curious what you think about having implements segmentReadFn in the comments for readSegment and readSegmentForGC. In this case the naming is pretty self explanatory so I don't think it's necessary to make a change here.
  • Given we're making this and other changes to improve samples throughput while reading a checkpoint, I think it's worth adding a benchmark. If we just open an issue for now with a note to compare main/the most recent release at the time of the benchmark vs the release prior to your changes then that's okay.

tsdb/wal/watcher.go Outdated Show resolved Hide resolved
storage/remote/queue_manager.go Outdated Show resolved Hide resolved
storage/remote/queue_manager.go Outdated Show resolved Hide resolved
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Reduce the total number of samples per iteration from 5000*5000
(25 million) which is too big for my laptop, to 1*10000.

Extend `createTimeseries()` to add additional labels, so that the
queue manager is doing more realistic work.

Move the Append() call to a background goroutine - this works because
TestWriteClient uses a WaitGroup to signal completion.

Call `StoreSeries()` and `SeriesReset()` while adding samples, to
simulate the garbage-collection that wal.Watcher does.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This matches what Watcher.garbageCollectSeries() is doing now

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@bboreham
Copy link
Member Author

I fixed the comments, and updated the existing BenchmarkSampleDelivery to demonstrate the effect.

This benchmark crashed on my machine before I changed it, running out of memory. It attempted to enqueue 25 million samples in one go, which is not very realistic. I reduced the number of samples and extended the label set on each series to give StoreSeries() more work to do.

Results, comparing calling StoreSeries() against UpdateSeriesSegment():

name              old time/op    new time/op    delta
SampleDelivery-4    88.3ms ± 2%    61.8ms ± 2%  -29.94%  (p=0.008 n=5+5)

name              old alloc/op   new alloc/op   delta
SampleDelivery-4    52.0MB ± 0%    47.2MB ± 0%   -9.29%  (p=0.008 n=5+5)

name              old allocs/op  new allocs/op  delta
SampleDelivery-4      425k ± 0%      414k ± 0%   -2.56%  (p=0.008 n=5+5)

Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Personally I think the current benchmark should still call the StoreSeries function and we should have a separate benchmark for checkpoint reading to differentiate between StoreSeries and UpdateSeriesSegment.

Copy link
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Separate benchmarks could be nice, I am curious how much of the current benchmark is spent processing samples vs running the gc? If they are similar then maybe one benchmark is OK.

@bboreham
Copy link
Member Author

the current benchmark should still call the StoreSeries function

It does, before the timing loop, just as it did before this PR.
Are you suggesting that StoreSeries should be called inside the timing loop?

Can you describe the real-life scenario you want the benchmark to imitate?

@bboreham
Copy link
Member Author

I am curious how much of the current benchmark is spent processing samples vs running the gc?

Removing the two lines calling UpdateSeriesSegment and SeriesReset makes negligible difference.

$ benchstat nogc.txt withgc.txt 
name              old time/op    new time/op    delta
SampleDelivery-4    61.1ms ± 4%    60.7ms ± 3%   ~     (p=0.841 n=5+5)

name              old alloc/op   new alloc/op   delta
SampleDelivery-4    47.1MB ± 0%    47.2MB ± 0%   ~     (p=0.286 n=4+5)

name              old allocs/op  new allocs/op  delta
SampleDelivery-4      414k ± 0%      414k ± 0%   ~     (p=0.333 n=4+5)

@csmarchbanks
Copy link
Member

Thanks, I am happy with this as is. I will let Callum respond and merge if he is happy as well.

@cstyan
Copy link
Member

cstyan commented Jul 27, 2021

It does, before the timing loop, just as it did before this PR.

I must have misread something then, LGTM

Are you suggesting that StoreSeries should be called inside the timing loop?

I think it would be worth exploring a benchmark between the two, but that's not necessary for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remote-write: don't stall on garbageCollectSeries()
3 participants