Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC is slower with badger (with syncwrites enabled) #4298

Open
Stebalien opened this issue Oct 11, 2017 · 6 comments
Open

GC is slower with badger (with syncwrites enabled) #4298

Stebalien opened this issue Oct 11, 2017 · 6 comments
Labels
topic/badger Topic badger topic/blockstore Topic blockstore topic/perf Performance

Comments

@Stebalien
Copy link
Member

Stebalien commented Oct 11, 2017

GC is 4x slower with the badger datastore as it actually has to write data, not just delete files. We may need to better batch/parallelize deletes (probably want a DeleteBlocks method).

@Stebalien Stebalien added topic/blockstore Topic blockstore topic/perf Performance labels Oct 11, 2017
@magik6k
Copy link
Member

magik6k commented Oct 11, 2017

GC also needs to be called on badger instance, we might want to expose this too.

@Kubuxu
Copy link
Member

Kubuxu commented Oct 12, 2017

I can do batching as part of #4149 , go-ds-flatfs doesn't have real delete baching (it just was queuing them up and doing them all at the end) so that is why it was never used there.

@Stebalien
Copy link
Member Author

@Kubuxu Sounds like a good idea (although it can be a separate PR if it adds too much code, large PRs are a pain to review and/or rebase).

@Kubuxu
Copy link
Member

Kubuxu commented Oct 12, 2017

It shouldn't be but making it a separate PR is good idea either way.

@schomatis
Copy link
Contributor

@Stebalien

GC is 4x slower with the badger datastore as it actually has to write data, not just delete files.

Yes, and even more expensive than the rewrite operations during GC are it's searches of every key in the value log file being checked to decide if they exceed the threshold to trigger the rewrite. Do you have a test that would point to that 4x performance impact?

We may need to better batch/parallelize deletes (probably want a DeleteBlocks method).

I'm not understanding how the parallelization would help, is GC called after every block deletion?

@schomatis
Copy link
Contributor

So, trying to reduce my own noise: indeed GC is slower with syncWrites enabled (4x sound about right), Badger's creator suggested turning it off during GC (not sure if that is possible) and also to parallelize deletes as mentioned here (or alternatively running bs.DeleteBlock(k) concurrently in multiple goroutines).

I can confirm that (from simple tests) GC with syncWrites disabled has pretty much the same performance as flatfs and also that the actual Badger's GC (triggered when there are more than one value log file, i.e., more than 1GB of data in the repo) has a running time that is not much more than flatfs (1.25-1.5x), more tests are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/badger Topic badger topic/blockstore Topic blockstore topic/perf Performance
Projects
No open projects
Development

No branches or pull requests

4 participants