Optimize upgrade
method of alloc::sync::Weak
#123148
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is my first time contributing to Rust, so apologies if I’ve broken any rules or tagged the wrong people.
Background
The
upgrade
method onWeak
is currently implemented using a CAS loop, to ensure that the increment to the strong count cannot occur if it is zero. This quickly becomes inefficient if other threads attempt to upgrade theWeak
, clone or drop theArc
, or otherwise update the strong count. The CAS will fail every time the strong count changes, even if it’s nowhere near zero. In theory, it may never succeed.Consider a basic doubly linked list with strong next pointers and weak prev pointers. Given N nodes, M threads, and just 1 update to the strong count per thread, a simple backward iteration could end up being N*M operations in the worst case.
Solution
We can utilize a “sticky counter” mechanism, which I encountered in section 4.3 of this paper. This eliminates the CAS loop and replaces it with an “increment-if-not-zero” mechanism.
I propose a modified version of the counter described in the paper, which only steals a single high bit (indicating whether the count is zero).That actually fits nicely with the currentMAX_REFCOUNT
, as the weak count is already effectively using its high bit as a spinlock flag. The algorithm has no additional overhead; all increments and decrements will succeed or fail in a single atomic operation (until the counter reaches zero). This would make the list iteration example O(N) in all cases.Considerations
The authors of the paper also used a second “help bit” in their formulation, to allow a single decrement operation to “take credit” for bringing the counter to zero in some edge cases. I could not think of a reason why we would need that, so I omitted it. If it turns out that I made a mistake, then 2 bits would be required, and that may or may not be a dealbreaker.The correct memory orderings need to be hammered out - right now I have many of them set toSeqCst
as a proof of concept.Arc
/Weak
.Edit 1:
Edit 2:
Closing the PR - it turns out that the counter cannot be used in this context (it only works if there also exists a memory reclamation mechanism, which is not the case with Arc / Weak).