Use a load rather than a fence when dropping the contents of an Arc. #41714

bholley · 2017-05-02T19:15:07Z

This is what Gecko does [1]. If I understand correctly, an Acquire load on
an address just needs to synchronize with other Release operations on that
specific address, whereas an Acquire fence needs to synchronize globally
with all other Release operations in the program.

[1] http://searchfox.org/mozilla-central/rev/ae8c2e2354db652950fe0ec16983360c21857f2a/xpcom/base/nsISupportsImpl.h#337

This is what Gecko does [1]. If I understand correctly, an Acquire load on an address just needs to synchronize with other Release operations on that specific address, whereas an Acquire fence needs to synchronize globally with all other Release operations in the program. [1] http://searchfox.org/mozilla-central/rev/ae8c2e2354db652950fe0ec16983360c21857f2a/xpcom/base/nsISupportsImpl.h#337

rust-highfive · 2017-05-02T19:15:22Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

bholley · 2017-05-02T19:15:36Z

It's entirely possible that I have this wrong somehow, but if so I'd love to understand why.

r? @aturon

…g Arcs.

bholley · 2017-05-02T19:39:53Z

Also added a long comment explaining why we need the Acquire/Release handshake.

CC @eddyb @golddranks @Manishearth @emilio @SimonSapin

alexcrichton · 2017-05-02T21:13:05Z

Note that there's another fence for the weak count which may matter here as well? Could you benchmark to see what the difference is?

bholley · 2017-05-02T21:32:26Z

Note that there's another fence for the weak count which may matter here as well?

Yeah, that should probably be a load as well I think.

Could you benchmark to see what the difference is?

I think a good benchmark will be hard to construct here, because it depends entirely on the cache line breakdown and what the other CPUs happen to be doing at the same time. This patch just gives the CPU more degrees of freedom in contended situations. If there's no contention, I'd expect there to be no difference.

aturon · 2017-05-02T22:08:14Z

@bholley I'm curious, though, if this change is motivated by some direct performance problems or benchmarking you've done? I ask just because this kind of change will entail some pretty intensive reviewing, and the code right now at least has solid lineage back through clang, IIRC.

bholley · 2017-05-02T22:43:26Z

Nope - I was just forking Arc for other reasons and came across this.

If the answer is that we think the risk/reward here isn't worth it for now, that's fine. We should at least check in the comment though.

golddranks · 2017-05-03T01:08:15Z

I don't think you meant to tag me? (Which means someone may be missing?)

bholley · 2017-05-03T01:16:11Z

Yes, I meant @gankro, sorry.

aturon · 2017-05-03T05:45:21Z

src/liballoc/arc.rs

@@ -767,8 +767,39 @@ unsafe impl<#[may_dangle] T: ?Sized> Drop for Arc<T> {
        // > through this reference must obviously happened before), and an
        // > "acquire" operation before deleting the object.
        //
+        // Note that Rust is not C++, and so it's valid to ask whether we could
+        // do without the Acquire/Release handshake here. The contents of an Arc
+        // are immutable, and thus we could hypothetically assume that there are


The contents aren't immutable, due to the potential for interior mutability (through UnsafeCell or the atomics you mention below). When thinking about this fence in the past, I'd always considered it as ensuring that any outstanding interior non-Relaxed writes are visible prior to the destructor running.

I do agree that allocators in general are virtually guaranteed to have enough fencing to at least avoid relaxed writes landing after the memory is transferred to another thread. But I worry about people relying on ordering with respect to the destructor code running.

When thinking about this fence in the past, I'd always considered it as ensuring that
any outstanding interior non-Relaxed writes are visible prior to the destructor running.

Oh I see. I'd been thinking of such writes as being either atomic (in which case they're governed by their own memory ordering) or protected by something like a Mutex, which have their own synchronization. But I realize now that the destructor doesn't actually acquire the mutex before dropping the contents, and so we can't rely on its internal synchronization to flush the writes.

bholley · 2017-05-03T17:35:38Z

Let me know what you want to do about the fence. If it's not worth the risk of fixing (at least right now), I can clean up the comment to take the above discussion into account and land it separately.

In general though, I do think it's worth fixing these kinds of potential performance issues when we find them. An overly conservative fence in Arc is exactly the sort of thing that can cause bad performance under high system load, but would never be discovered by profiling because (a) it's hard to reproduce those conditions reliably and (b) there are very few people, even in the Rust community, who are comfortable reasoning about memory orderings.

nagisa · 2017-05-03T17:45:09Z

cc @Amanieu

aturon · 2017-05-03T17:47:00Z

@bholley

I agree that it's worth exploring these kinds of fixes, but they are relatively high risk and cost, so I'd like to have some evidence that they make a difference (beyond purely hypothetical performance reasoning).

That's a standard we've been trying to apply across std, since we receive a lot of PRs with "implementation speedups" that don't come with any benchmarks, and make the code more complex.

In short, my feeling is that, absent concrete evidence of a performance improvement here, I'd prefer not to pursue it at this time.

(Thanks, though, for the PR!)

Amanieu · 2017-05-03T18:31:49Z

I think the comments about the allocator are misleading. The purpose of the fence is to ensure that any operations performed by other threads on the shared data happen before the Drop impl of the shared data.

I don't think there is much performance difference between an acquire fence and an acquire load:

ARM:
- Acquire fence: dmb ishld (ARMv8) dmb ish (ARMv7)
- Acquire load: ldaex (ARMv8) ldr; dmb ish (ARMv7)
AArch64:
- Acquire fence: dmb ishld
- Acquire load: ldar
x86/x86-64:
- Acquire fence: (none)
- Acquire load: mov

So basically, with a fence you avoid an extra load. Also, as far as I know, it is perfectly valid to have a fence synchronize with a atomic operation.

aturon · 2017-05-03T21:20:28Z

I've approved the pure-docs PR, and am going to close this one. Thanks again @bholley!

@aturon

Document the reasoning for the Acquire/Release handshake when dropping Arcs. Split out from rust-lang#41714. r? @aturon

bholley · 2017-05-05T18:37:36Z

@Amanieu interesting! I'd always thought that acquire/release fences imposed a greater restriction on the compiler and CPU than acquire/release loads/stores, since the latter are only enforced when the two threads are operating on the same memory address. So, for example, a CPU would need to synchronously flush the cache lines if core X released mutex A and then core Y acquired mutex A, but would not need to do so if core Y acquired mutex B instead.

Do I have that wrong, or does the C++11 memory model allow for a wider range of optimization than current compilers and CPUs actually use?

Gankra · 2017-05-05T19:09:07Z

The C11 model is engineered to support things like the DEC Alpha, so yeah modern hardware doesn't take full advantage of it. Making sloppy code work and fast is kinda the job of hardware devs.

nagisa · 2017-05-05T19:50:35Z

Making sloppy code work and fast is kinda the job of hardware devs.

Cough… compiler devs come before that.

mzabaluev · 2019-06-29T13:26:33Z

Comment withdrawn, I misunderstood the spec.

rust-highfive assigned alexcrichton May 2, 2017

rust-highfive assigned aturon and unassigned alexcrichton May 2, 2017

Document the reasoning for the Acquire/Release handshake when droppin…

b533756

…g Arcs.

aturon reviewed May 3, 2017

View reviewed changes

bholley mentioned this pull request May 3, 2017

Document the reasoning for the Acquire/Release handshake when dropping Arcs. #41730

Merged

aidanhs added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 3, 2017

aturon closed this May 3, 2017

frewsxcv added a commit to frewsxcv/rust that referenced this pull request May 3, 2017

Rollup merge of rust-lang#41730 - bholley:arc_comment, r=aturon

e20b282

Document the reasoning for the Acquire/Release handshake when dropping Arcs. Split out from rust-lang#41714. r? @aturon

alexcrichton mentioned this pull request Oct 15, 2019

Make std::sync::Arc compatible with ThreadSanitizer #65097

Merged

nox mentioned this pull request Apr 20, 2020

Make the rule tree actually threadsafe servo/servo#26227

Merged

thomcc mentioned this pull request Aug 19, 2020

Thoughts on using a more efficient ordering in ArcStr::drop thomcc/arcstr#16

Closed

mgeier mentioned this pull request Mar 5, 2022

Use a single dynamic allocation (making RingBuffer a DST) mgeier/rtrb#75

Draft

novacrazy mentioned this pull request May 13, 2023

Match other implementations fereidani/rclite#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a load rather than a fence when dropping the contents of an Arc. #41714

Use a load rather than a fence when dropping the contents of an Arc. #41714

bholley commented May 2, 2017

rust-highfive commented May 2, 2017

bholley commented May 2, 2017

bholley commented May 2, 2017

alexcrichton commented May 2, 2017

bholley commented May 2, 2017

aturon commented May 2, 2017

bholley commented May 2, 2017

golddranks commented May 3, 2017

bholley commented May 3, 2017

aturon May 3, 2017

bholley May 3, 2017

bholley commented May 3, 2017

nagisa commented May 3, 2017

aturon commented May 3, 2017

Amanieu commented May 3, 2017

aturon commented May 3, 2017

bholley commented May 5, 2017

Gankra commented May 5, 2017

nagisa commented May 5, 2017

mzabaluev commented Jun 29, 2019 •

edited

Loading

Use a load rather than a fence when dropping the contents of an Arc. #41714

Use a load rather than a fence when dropping the contents of an Arc. #41714

Conversation

bholley commented May 2, 2017

rust-highfive commented May 2, 2017

bholley commented May 2, 2017

bholley commented May 2, 2017

alexcrichton commented May 2, 2017

bholley commented May 2, 2017

aturon commented May 2, 2017

bholley commented May 2, 2017

golddranks commented May 3, 2017

bholley commented May 3, 2017

aturon May 3, 2017

Choose a reason for hiding this comment

bholley May 3, 2017

Choose a reason for hiding this comment

bholley commented May 3, 2017

nagisa commented May 3, 2017

aturon commented May 3, 2017

Amanieu commented May 3, 2017

aturon commented May 3, 2017

bholley commented May 5, 2017

Gankra commented May 5, 2017

nagisa commented May 5, 2017

mzabaluev commented Jun 29, 2019 • edited Loading

mzabaluev commented Jun 29, 2019 •

edited

Loading