New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot management scales poorly #5083
Comments
The overhead is not zero but it is also not the major bottlenecks in the benchmarks that I work with, like sysbench read-write, read-only, update-noindex. I, as probably many others, tried a couple of approaches in the past but did not see a tangible improvement in the benchmark results.
|
rocksdb internally has another way to get consistent view for Get() or range scan when user don't provide a snapshot. It is by holding a sequence number and a |
Thanks @yiwu-arbug, but according to the comment in the code "accessing members of this class is not thread-safe and requires external synchronization (ie db mutex held or on write thread)", so I don't think that would help in this case. After a few iterations I now have a solution that scales quite well. I ran the following benchmark with varying number of threads:
The original code produced the following results (the first number is the number of threads).
My new snapshot list produced the following results:
There are still a few minor issues to solve (like at the moment a long running snapshot blocks reclamation of older SnapshotImpl instances), but IMHO this approach seems promising. |
@mpoeter may I know what's your change? |
I reimplemented the I will try to improve the code and fix the remaining issues. Once they are sorted out I'll create a PR where we can discuss the changes in more detail. |
Making |
@maysamyabandeh MyRocks uses 2PC, and has binlog dependency. It's likely that another application doesn't depend on those functionalities. So I would be careful in using sysbench to rule out a performance bottleneck. |
@yiwu-arbug my implementation is only partially lock-free - removing entries from the list still requires a lock. That's why most threads only mark their snapshot as "deleted" and leave removal of that entry to the owner of the oldest active snapshot. The main reason is that I want to avoid the additional complexity of introducing a reclamation scheme like EBR (epoch based reclamation) or DEBRA. Also, a lock-free remove operation would be more expensive, resulting in unnecessary overhead for uncontended remove ops. |
@yiwu-arbug are you still planning to look into this? |
No plan on our side. We identified this is not a bottleneck for us. |
@maysamyabandeh @yiwu-arbug Unfortunately I had to put my experiments on ice for the last few months. I still want to keep working on this at some point, I just have to find the time for it. |
@mpoeter We're running into this issue some (via TiKV's use of RocksDB snapshots) - did you ever end up being able to get back to this? |
@frew Yes, but only for a short time before I was again sucked into a different project. However, I switched jobs in the meantime and am now working even more with RocksDB, so there is a good chance I might look into this again, but probably not in the next weeks. |
@riversand963 Are you working on this issue? No worries either way - just trying to figure out logistics on our end. |
@frew I haven't started this yet, but I am interested in this. Self-assigning to either work on it or follow-up. |
GetSnapshotImpl
andReleaseSnapshot
acquire the globalmutex_
to perform their operations. This can result in high contention and therefore bad scalability for workloads that make heavy use of snapshots. This can be seen by running the "randomwithverify" benchmark (this seems to be the only one that uses snapshots).This is a screenshot from a VTune analysis of the "randomwithverify" benchmark with 4 threads. It clearly shows how the snapshot operations are serialized (the yellow lines indicate transitions where a mutex unlock wakes another thread that was waiting for it).
Obviously, a higher number of threads intensifies this problem...
I have considered different ways to improve the situation:
But obviously they all have their own pros and cons, so I would like to get some input from the core team.
I suppose you are aware of this potential bottleneck - do you have other ideas or even plans to improve scalability of snapshots? If we can agree on how to approach this, I am happy to make the according changes and create a PR.
The text was updated successfully, but these errors were encountered: