Reimplement AtomicRefCell with pure atomics #14828

Merged
merged 2 commits into from Jan 3, 2017

Projects

None yet

9 participants

@bholley
Contributor
bholley commented Jan 2, 2017 edited

While reviewing @bzbarsky's patches in [1], I started typing out some lore about how mutable AtomicRefCell borrows are actually cheaper than immutable ones, so we should prefer them where possible. But then I decided that this was a really dumb state of affairs and that we should just fix AtomicRefCell instead, and implement a proper AtomicRef{,Mut}::map while we were at it. So here we are.

This PR adds a from-scratch implementation of AtomicRefCell that aims to be 100% sound, even in unrealistic overflow scenarios. We should probably get this on crates.io eventually, but I want to land it landed in-tree first.

With this implementation, each operation (borrow or release) is one atomic instruction, and all borrow/release pairs (mutable or immutable) take 12 ns on my machine, which is what I'd expect. This is a 50% improvement over the previous implementation in the immutable case.

There may be some places where we could get away with Ordering::Release instead of Ordering::AcqRel, but it didn't seem worth it to try to reason it out.

r? @Manishearth

CC @emilio @SimonSapin @heycam @upsuper

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1298588


This change is Reviewable

@Manishearth Manishearth was assigned by highfive Jan 2, 2017
@highfive
highfive commented Jan 2, 2017

Heads up! This PR modifies the following files:

  • @emilio: components/style/atomic_refcell.rs
@highfive
highfive commented Jan 2, 2017

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!
@bholley
Contributor
bholley commented Jan 3, 2017
@bors-servo
Contributor

⌛️ Trying commit e62b9af with merge 76b54e6...

@bors-servo bors-servo added a commit that referenced this pull request Jan 3, 2017
@bors-servo bors-servo Auto merge of #14828 - bholley:faster_atomic_refcell, r=<try>
Reimplement AtomicRefCell with pure atomics

While reviewing @bzbarsky's patches in [1], I started typing out some lore about how mutable AtomicRefCell borrows are actually cheaper than immutable ones, so we should prefer them where possible. But then I decided that this was a really dumb state of affairs and that we should just fix AtomicRefCell instead, and implement a proper AtomicRef{,Mut}::map while we were at it. So here we are.

This PR adds a from-scratch implementation of AtomicRefCell that aims to be 100% sound, even in unrealistic overflow scenarios. We should probably get this on crates.io eventually, but I want to land it landed in-tree first.

With this implementation, each operation (borrow or release) is one atomic instruction, and all borrow/release pairs (mutable or immutable) take 12 ns on my machine, which is what I'd expect. This is a 50% improvement over the previous implementation in the mutable case.

There may be some places where we could get away with Ordering::Release instead of Ordering::AcqRel, but it didn't seem worth it to try to reason it out.

r? @Manishearth

CC @emilio @SimonSapin @heycam @upsuper

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1298588
76b54e6
#![allow(unsafe_code)]
+#![deny(missing_docs)]
@emilio
emilio Jan 3, 2017 Member

Please remove the allow() in components/style/lib.rs.

@bholley
bholley Jan 3, 2017 Contributor

Done.

components/style/atomic_refcell.rs
+ #[inline]
+ pub fn borrow(&self) -> AtomicRef<T> {
+ AtomicRef {
+ value: unsafe { &*self.value.get() },
@emilio
emilio Jan 3, 2017 Member

nit: Indentation is off, here and below.

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

components/style/atomic_refcell.rs
+impl<'b> AtomicBorrowRef<'b> {
+ #[inline]
+ fn new(borrow: &'b AtomicUsize) -> Self {
+ let new = borrow.fetch_add(1, atomic::Ordering::AcqRel) + 1;
@emilio
emilio Jan 3, 2017 Member

Similarly, I'd expect Acquire to be enough here?

@Amanieu
Amanieu Jan 3, 2017

Yes, Acquire is sufficient here. You only need to prevent memory operations after this line from being reordered before acquiring the guard.

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

components/style/atomic_refcell.rs
+impl<'b> Drop for AtomicBorrowRef<'b> {
+ #[inline]
+ fn drop(&mut self) {
+ let old = self.borrow.fetch_sub(1, atomic::Ordering::AcqRel);
@emilio
emilio Jan 3, 2017 Member

Does this need to be AcqRel? I'd expect Release to be enough.

@Manishearth
Manishearth Jan 3, 2017 Member

Yeah, all of the drop impls can use release and all of the ::new() functions can use acquire.

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

components/style/atomic_refcell.rs
+ // Use compare-and-swap to avoid corrupting the immutable borrow count
+ // on illegal mutable borrows.
+ let old = borrow.compare_and_swap(0, HIGH_BIT, atomic::Ordering::AcqRel);
+ assert!(old == 0, "already borrowed");
@emilio
emilio Jan 3, 2017 Member

nit: use assert_eq! to see the value?

@Amanieu
Amanieu Jan 3, 2017

I don't think the value is relevant to what a user would expect. However you could use 2 different panic messages depending on whether the AtomicRefCell is currently mutably or immutably borrowed.

@bholley
bholley Jan 3, 2017 Contributor

I was originally aiming for parity with the messages in RefCell, but I think it's true that more information here can actually make debugging easier. So I've gone ahead and changed this to "already {mutably,immutably} borrowed".

components/style/atomic_refcell.rs
+
+impl<'b, T: ?Sized + Debug + 'b> Debug for AtomicRefMut<'b, T> {
+ fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+ write!(f, "{:?}", self.value)
@emilio
emilio Jan 3, 2017 Member

self.value.fmt(f) should be enough I think, here and above.

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

+ Bar { f: Foo { u: 42 } }
+ }
+}
+
@emilio
emilio Jan 3, 2017 Member

We should probably test this in parallel conditions? I guess the style code is enough, but still...

@bholley
bholley Jan 3, 2017 Contributor

Would be good yeah, but I don't quite have the cycles to come up with an elegant way to actually exercise the concurrent cases. I'll add a FIXME.

@Manishearth

Looks okay to me. Bit wary about the lack of fences but I discussed it and it's not a problem in this case.

components/style/atomic_refcell.rs
+//!
+//! As such, we re-implement RefCell semantics from scratch with a single atomic
+//! reference count. The primary complication of this scheme relates to keeping
+//! things in a consistent state when one thread performs and illegal borrow and
components/style/atomic_refcell.rs
+//! reference count. The primary complication of this scheme relates to keeping
+//! things in a consistent state when one thread performs and illegal borrow and
+//! panics. Since an AtomicRefCell can be accessed by multiple threads, and since
+//! panics are recoverable, we need to ensure that an illegal (pancking) access by
@Manishearth
Manishearth Jan 3, 2017 Member

*panicking

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

components/style/atomic_refcell.rs
+ //
+ // FIXME(bholley): We can't currently whole-process abort in
+ // stable Rust, but it's coming soon. We should fix this when
+ // https://github.com/rust-lang/rust/issues/37838 hits release.
@Manishearth
Manishearth Jan 3, 2017 Member

If you really want an abort here you can trigger one by creating a struct which panics in its destructor. Double panics in Rust abort by default.

@Ms2ger
Ms2ger Jan 3, 2017 Contributor

std::process::exit()?

@Amanieu
Amanieu Jan 3, 2017

libc::abort generally does the right thing, but requires a dependency on libc.

@bholley
bholley Jan 3, 2017 Contributor

Oh nice, thanks @Ms2ger. I've added a call to ::std::process::exit().

components/style/atomic_refcell.rs
+impl<'b> Drop for AtomicBorrowRef<'b> {
+ #[inline]
+ fn drop(&mut self) {
+ let old = self.borrow.fetch_sub(1, atomic::Ordering::AcqRel);
@Manishearth
Manishearth Jan 3, 2017 Member

Yeah, all of the drop impls can use release and all of the ::new() functions can use acquire.

@nox
Member
nox commented Jan 3, 2017

This has value outside of Servo too, right? I'm wary of fancy types containing lots of unsafe code to live in Servo indefinitely, where less eyes can check for soundness in the long term.

components/style/atomic_refcell.rs
+ #[inline]
+ fn ge(&self, other: &AtomicRefCell<T>) -> bool {
+ *self.borrow() >= *other.borrow()
+ }
@nox
nox Jan 3, 2017 Member

All these methods don't need to be defined, only partial_cmp.

@bholley
bholley Jan 3, 2017 Contributor

Oh ok, I was just cribbing from RefCell.

@SimonSapin

@nox While I agree that more eyes would be good, past experience shows that moving some code to another repo or publishing it to crates.io is absolutely not enough for that to happen. So I think this is moot until we can get other people or projects interested. (Maybe this has its place in Crossbeam or some other umbrella project?)

components/style/atomic_refcell.rs
+ }
+ }
+
+ /// Returns a raw pointer to the underlying data in this cell.
@SimonSapin
SimonSapin Jan 3, 2017 Member

Although that may be "obvious" please add to this doc-comment that the pointer returned by this method must only be used with some external synchronization, since nothing in AtomicRefCell itself will stop other thread from accessing or mutating the data at the same time.

@bholley
bholley Jan 3, 2017 Contributor

Fixed.

components/style/atomic_refcell.rs
+ //
+ // FIXME(bholley): We can't currently whole-process abort in
+ // stable Rust, but it's coming soon. We should fix this when
+ // https://github.com/rust-lang/rust/issues/37838 hits release.
@Ms2ger
Ms2ger Jan 3, 2017 Contributor

std::process::exit()?

- where F: FnOnce(&Curr) -> &New, M: Map<'a, Base, Curr>
+impl<'b, T: ?Sized> AtomicRefMut<'b, T> {
+ /// Make a new `AtomicRefMut` for a component of the borrowed data, e.g. an enum
+ /// variant.
@Ms2ger
Ms2ger Jan 3, 2017 Contributor

I think an enum variant would want an and_then or filter_map function; map seems more useful for struct fields.

@bholley
bholley Jan 3, 2017 Contributor

The comment is just copy-pasted from RefCell.

+//! parking_lot) performs at least two atomic operations during immutable
+//! borrows. This makes mutable borrows significantly cheaper than immutable
+//! borrows, leading to weird incentives when writing performance-critical
+//! code.
@Amanieu
Amanieu Jan 3, 2017

A relaxed load/store isn't generally considered to be an atomic operation since it always translates directly to a normal load/store instruction. A better argument would be that parking_lot uses a compare-exchange for the read path while AtomicRefCell uses an atomic increment, which is slightly cheaper.

@bholley
bholley Jan 3, 2017 Contributor

I don't think that's the issue - immutable borrows are ~twice as a slow as mutable borrows using parking_lot::RwLock. I haven't looked at the source closely, but I'm pretty sure there are twice as many atomic operations happening on read acquires.

components/style/atomic_refcell.rs
+impl<'b> AtomicBorrowRef<'b> {
+ #[inline]
+ fn new(borrow: &'b AtomicUsize) -> Self {
+ let new = borrow.fetch_add(1, atomic::Ordering::AcqRel) + 1;
@Amanieu
Amanieu Jan 3, 2017

Yes, Acquire is sufficient here. You only need to prevent memory operations after this line from being reordered before acquiring the guard.

components/style/atomic_refcell.rs
+ let new = borrow.fetch_add(1, atomic::Ordering::AcqRel) + 1;
+
+ // If the new count has the high bit set, panic. The specifics of how
+ // we panic is interesting for soundness, but irrelevant for real programs.
@Amanieu
Amanieu Jan 3, 2017

It might be worth moving all the panic logic into a separate function and out of the inlined code path.

@bholley
bholley Jan 3, 2017 Contributor

Good idea, I'll do that.

@bholley
bholley Jan 3, 2017 Contributor

Hm, doing this actually costs ~2ns on the benchmarks. Maybe we're predicting the wrong branch and the out-of-line call is costing us?

@Amanieu
Amanieu Jan 3, 2017

Did you mark the out-of-line function with #[cold]? This should tell LLVM that any paths leading to it are unlikely.

components/style/atomic_refcell.rs
+ //
+ // This will never happen in a real program.
+ borrow.fetch_sub(1, atomic::Ordering::AcqRel);
+ panic!("too many immutable borrows");
@Amanieu
Amanieu Jan 3, 2017

Since this will never happen in a real program, it would be better to just abort here, like in the second case.

@bholley
bholley Jan 3, 2017 Contributor

Yeah, but it's slightly more plausible than the other case (you just need 2 billion borrows rather than 2 billion panicked threads on 32-bit). We can handle it so we might as well.

@bholley
bholley Jan 3, 2017 Contributor

(I've updated the comments to explain this better)

@Amanieu
Amanieu Jan 3, 2017

Well, if you're keeping this code then you should change the decrement to Relaxed instead of AcqRel.

@bholley
bholley Jan 3, 2017 Contributor

Yep, done.

components/style/atomic_refcell.rs
+ //
+ // FIXME(bholley): We can't currently whole-process abort in
+ // stable Rust, but it's coming soon. We should fix this when
+ // https://github.com/rust-lang/rust/issues/37838 hits release.
@Amanieu
Amanieu Jan 3, 2017

libc::abort generally does the right thing, but requires a dependency on libc.

components/style/atomic_refcell.rs
+impl<'b> Drop for AtomicBorrowRefMut<'b> {
+ #[inline]
+ fn drop(&mut self) {
+ let old = self.borrow.swap(0, atomic::Ordering::AcqRel);
@Amanieu
Amanieu Jan 3, 2017

Only Release ordering is needed here.

@Amanieu
Amanieu Jan 3, 2017

You should use store instead of swap since the former will only generate a single store instruction instead of an atomic xchg instruction.

@bholley
bholley Jan 3, 2017 Contributor

Good point.

components/style/atomic_refcell.rs
+ fn new(borrow: &'b AtomicUsize) -> AtomicBorrowRefMut<'b> {
+ // Use compare-and-swap to avoid corrupting the immutable borrow count
+ // on illegal mutable borrows.
+ let old = borrow.compare_and_swap(0, HIGH_BIT, atomic::Ordering::AcqRel);
@Amanieu
Amanieu Jan 3, 2017

Only Acquire ordering is needed here.

@Amanieu
Amanieu Jan 3, 2017

Additionally, you should use the compare_exchange_strong method, which allows you to specify a different memory ordering on failure (Relaxed is enough in this case).

@bholley
bholley Jan 3, 2017 Contributor

Nice idea.

components/style/atomic_refcell.rs
+ // Use compare-and-swap to avoid corrupting the immutable borrow count
+ // on illegal mutable borrows.
+ let old = borrow.compare_and_swap(0, HIGH_BIT, atomic::Ordering::AcqRel);
+ assert!(old == 0, "already borrowed");
@Amanieu
Amanieu Jan 3, 2017

I don't think the value is relevant to what a user would expect. However you could use 2 different panic messages depending on whether the AtomicRefCell is currently mutably or immutably borrowed.

@Amanieu
Amanieu commented Jan 3, 2017

Note that this still doesn't solve the initial problem, which is that immutable borrows are slower than mutable borrows:

  • Mutable borrow: 1 compare-exchange (the store at the end doesn't count as an atomic operation)
  • Immutable borrow: 1 atomic increment, 1 atomic decrement

The "proper" solution is to use something like RCU, which basically consists of an AtomicPtr<T> that is atomically replaced by a writers. The main advantage is that readers don't need any synchronization to access the data (multiple writers still need to be synchronized using a mutex).

The main downside of RCU is that you need some form of garbage collection to free any Ts once we are sure that no readers are using them any more. crossbeam has an implementation of epoch-based reclamation, but I haven't looked into its performance in detail.

@SimonSapin
Member

an AtomicPtr<T> that is atomically replaced by a writers.

That doesn’t give access to &mut T, does it? Creating a whole new T and throwing away the previous one is sometimes much more costly than mutating.

@bholley
Contributor
bholley commented Jan 3, 2017

Yeah, something that requires a GC is a pretty different use-case.

Also, @Amanieu, I don't quite follow your argument about the performance issue. The motivating issue here is that parking_lot::RwLock does more than two atomic operations for an immutable require/release pair. This implementation does a single fetch_add/fetch_sub for an immutable acquire/release, and single CAS/swap for a mutable acquire/release.

As noted above, both pairs cost 12ns on my machines per the benchmarks in these tests.

@bholley bholley Reimplement AtomicRefCell as a fork of RefCell.
c26b4be
@bholley
Contributor
bholley commented Jan 3, 2017

Switching swap() to store() and compare_swap to compare_exchange() shaved a few ns off the mutable borrow, so now it's down to ~10ns.

@bholley
Contributor
bholley commented Jan 3, 2017

@bors-servo r=Manishearth

@bors-servo
Contributor

📌 Commit c26b4be has been approved by Manishearth

@bors-servo
Contributor

⌛️ Testing commit c26b4be with merge 57b2c26...

@bors-servo bors-servo added a commit that referenced this pull request Jan 3, 2017
@bors-servo bors-servo Auto merge of #14828 - bholley:faster_atomic_refcell, r=Manishearth
Reimplement AtomicRefCell with pure atomics

While reviewing @bzbarsky's patches in [1], I started typing out some lore about how mutable AtomicRefCell borrows are actually cheaper than immutable ones, so we should prefer them where possible. But then I decided that this was a really dumb state of affairs and that we should just fix AtomicRefCell instead, and implement a proper AtomicRef{,Mut}::map while we were at it. So here we are.

This PR adds a from-scratch implementation of AtomicRefCell that aims to be 100% sound, even in unrealistic overflow scenarios. We should probably get this on crates.io eventually, but I want to land it landed in-tree first.

With this implementation, each operation (borrow or release) is one atomic instruction, and all borrow/release pairs (mutable or immutable) take 12 ns on my machine, which is what I'd expect. This is a 50% improvement over the previous implementation in the immutable case.

There may be some places where we could get away with Ordering::Release instead of Ordering::AcqRel, but it didn't seem worth it to try to reason it out.

r? @Manishearth

CC @emilio @SimonSapin @heycam @upsuper

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1298588

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14828)
<!-- Reviewable:end -->
57b2c26
@bors-servo bors-servo merged commit c26b4be into servo:master Jan 3, 2017

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details
@bholley
Contributor
bholley commented Jan 3, 2017

@Amanieu Hm. Oddly, when breaking out atomic_refcell into its own crate, the benchmark numbers seem better and more stable, and I don't see the difference with the inlining anymore. I've gone ahead and broken the panic section out, and marked it with [cold] and [inline(never)].

The time for an immutable borrow/release now holds steady at 12ns. The time for an immutable borrow/release seems to have gone down to 6ns, probably from the switch from swap to store and compare_swap to compare_exchange. This puts us right back where we started (with mutable borrows being twice as fast as immutable ones), but with everything being twice as fast as it was before. Oh well. :-)

@bholley
Contributor
bholley commented Jan 3, 2017

Hoisted this to crates.io, switching servo to use that in #14835.

@Manishearth
Member

I don't see what's wrong with mutable borrows being twice as fast, the point is that immutable borrows can still coincide so on the whole it's still faster 😄

@bholley
Contributor
bholley commented Jan 3, 2017

The point is that, at callsites where we know we have single-threaded access to the AtomicRefCell (i.e. most of the stuff in glue.rs that deals with PerDocumentStyleData or ElementData), there are conflicting incentives as to whether to .borrow() or .borrow_mut() when mutation is not required. The latter is faster, but the former is more idiomatic and won't cause trouble if some helper function also tries to borrow the same thing.

@Manishearth
Member

Ah, I see.

@bors-servo bors-servo added a commit that referenced this pull request Jan 3, 2017
@bors-servo bors-servo Auto merge of #14835 - bholley:external_atomic_refcell, r=Manishearth
Switch to crates.io for atomic_refcell

r? @Manishearth

See #14828 for backstory.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14835)
<!-- Reviewable:end -->
ac6bfc1
@bors-servo bors-servo added a commit that referenced this pull request Jan 4, 2017
@bors-servo bors-servo Auto merge of #14835 - bholley:external_atomic_refcell, r=Manishearth
Switch to crates.io for atomic_refcell

r? @Manishearth

See #14828 for backstory.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14835)
<!-- Reviewable:end -->
6f3876c
@bors-servo bors-servo added a commit that referenced this pull request Jan 4, 2017
@bors-servo bors-servo Auto merge of #14835 - bholley:external_atomic_refcell, r=Manishearth
Switch to crates.io for atomic_refcell

r? @Manishearth

See #14828 for backstory.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14835)
<!-- Reviewable:end -->
6e1aeaa
@bors-servo bors-servo added a commit that referenced this pull request Jan 4, 2017
@bors-servo bors-servo Auto merge of #14835 - bholley:external_atomic_refcell, r=Manishearth
Switch to crates.io for atomic_refcell

r? @Manishearth

See #14828 for backstory.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/14835)
<!-- Reviewable:end -->
1e927ca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment