Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator to reference SuperVersion from a pool, just like the thread-local super version technique used in Get() #4765

Open
siying opened this issue Dec 10, 2018 · 10 comments
Labels
up-for-grabs Up for grabs

Comments

@siying
Copy link
Contributor

siying commented Dec 10, 2018

Right now, we reduce the mutex contention of grabbing and referencing count super version using thread-local super version. See https://github.com/facebook/rocksdb/blob/v5.17.2/db/column_family.cc#L1027-L1070

This works well for Get(), but we can't directly use it in iterators. Instead, when creating a new iterator we still increase the reference count while getting it from the thread local cached super version. The reference count is still a potential scalability bottleneck. It isn't usually exposed as a bottleneck but it's a good idea to clean the design to get rid of it.

To achieve it, we can turn the thread-local cache to a pool of cached super version. While creating an iterator, we can fetch a random one (or a core-local one) from the pool and remember where we should return it to. In the future we can consider whether it makes sense to replace Get() path can migrate to the same solution too.

Anyone who is interested is welcome to contribute.

@siying siying added the up-for-grabs Up for grabs label Dec 10, 2018
@jwasinger
Copy link

Hi @siying , I'd like to try and take this on.

@siying
Copy link
Contributor Author

siying commented Dec 11, 2018

@jwasinger awesome! Let me know if you need any help. CC @ajkr

@siying
Copy link
Contributor Author

siying commented Dec 11, 2018

I can't assign it to you. I don't know much about github system, but you are more than welcome to work on it!

@jwasinger
Copy link

Thanks :) . I will reach out with questions as they arise.

@siying
Copy link
Contributor Author

siying commented Dec 11, 2018

Btw this is @nbronson 's suggestion.

@ajkr
Copy link
Contributor

ajkr commented Dec 11, 2018

I think it's a good idea to have a pool of cached super versions for iterators. A couple thoughts:

(1) Before extending this change to Get, let's make sure it doesn't regress performance. I suspect it might not be as fast as thread local.
(2) I think random might make more sense than core-local for this case. If we did core-local, then a long-held iterator could prevent all other iterators created by the same core to not have a superversion available to them.

@siying
Copy link
Contributor Author

siying commented Dec 11, 2018

@ajkr how about core ID as a hint and can fall back to random?

@siying siying changed the title Turn thread-local super version to from a pool of super versions Iterator to reference SuperVersion from a pool, just like the thread-local super version technique used in Get() Dec 11, 2018
@siying
Copy link
Contributor Author

siying commented Dec 11, 2018

@ajkr I updated the original description according to your comment.

@ajkr
Copy link
Contributor

ajkr commented Dec 11, 2018

sure, that sounds good. then a thread serially creating short-lived iterators might get some benefit.

@jwasinger
Copy link

jwasinger commented Dec 16, 2018

@siying @ajkr , I've done a bit of a dive through the docs and codebase to get myself up to speed. I still have a lot of reading to do to get a full idea of what is going on.

One question: Is the bottleneck here due to lifetime of the SuperVersion reference being held longer for iterators than for Get? Presumably preventing the referenced resources from being garbage collected/compacted as they go stale. Ah whoops... I had misunderstood the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
up-for-grabs Up for grabs
Projects
None yet
Development

No branches or pull requests

3 participants