Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient synchronization in servo_arc can lead to data race #21186

Closed
RalfJung opened this issue Jul 16, 2018 · 20 comments
Closed

Insufficient synchronization in servo_arc can lead to data race #21186

RalfJung opened this issue Jul 16, 2018 · 20 comments
Labels

Comments

@RalfJung
Copy link

@RalfJung RalfJung commented Jul 16, 2018

This is Servo's version of rust-lang/rust#51780, after @Manishearth pointed out there is a copy here: Arc::is_unique only performs a relaxed read, which is not enough to establish a happens-before relation between anything that happened prior to dropping the second-to-last Arc, and returning from get_mut.

So, here's an example of safe code exhibiting a read-write race:

fn main() {
    let a1 = Arc::new(0);
    let mut a2 = a1.clone();
    join(
        || {
            let _ : u32 = *a1; // non-atomic read
            drop(a1); // release write
        },
        || {
            loop {
                match Arc::get_mut(&mut a2) { // relaxed read
                    None => {}
                    Some(m) => {
                        *m = 1u32; // non-atomic write
                        return;
                    }
                }
            }
        }
    );
}

A relaxed read does not induce a happens-before relationship, so there is no happens-before between the non-atomic read and the non-atomic write. This is a data race according to C11, and hence undefined behavior in Rust.

The fix is to use Acquire in is_unique.

Cc @Manishearth @jhjourdan

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jul 16, 2018

Thanks for the report @RalfJung.

Firefox has another copy of this code. CC @emilio

@RalfJung
Copy link
Author

@RalfJung RalfJung commented Jul 16, 2018

Firefox has another copy of this code.

Is that this? Is that not auto-synced with the Servo repo?
https://searchfox.org/mozilla-central/source/servo/components/servo_arc/lib.rs#351

@jdm jdm added the I-safety label Jul 16, 2018
@jdm
Copy link
Member

@jdm jdm commented Jul 16, 2018

It is not auto-synced, no.

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Jul 16, 2018

cc @bholley

the synchronization is documented in is_unique, but perhaps it's flawed https://github.com/servo/servo/blob/master/components/servo_arc/lib.rs#L352-L359

@bholley
Copy link
Contributor

@bholley bholley commented Jul 16, 2018

Hm. I think I see the issue here, though I'm less sure whether the proposed Acquire actually fixes it.

If we switch thread B to load the refcount with Acquire, and it reads refcount 1, that ensures that any writes made by thread A before it dropped the refcount are observable by thread B. But thread A isn't doing any writing at all, and we're actually concerned about reads on thread A of writes by thread B.

My understand of the C++ memory model is that an acquire on thread B does not impact reads on thread A. Do I have that wrong?

@Amanieu
Copy link

@Amanieu Amanieu commented Jul 16, 2018

@bholley

The problem here is that, according to the C++ memory model, writes made by thread B may be reordered before the load of the ref count. This would cause a data race since thread A may still be reading the data when the write occurs.

Now, in practice, there is a control dependency (scroll down to the section on control dependencies) between the ref count load and later writes (but not reads). This happens because, even if the CPU predicts a branch as taken, it will not make writes visible to other cores unless it is sure that the store instruction is really executed.

However, the C++ memory model doesn't have a concept of control dependencies, so you really are relying on the compiler to not perform too much magic on your control flow so that the dependency is preserved.

@RalfJung
Copy link
Author

@RalfJung RalfJung commented Jul 16, 2018

To add to what @Amanieu said: The comment in the code here argues in terms of reorderings. The argument is correct (as far as reads of the other thread are concerned at least), but unfortunately it's the wrong argument -- what we really have to ensure is getting a happens-before edge between reads in thread A (the one that droped) and writes in thread B (the one that did get_mut). That's how a data race is defined. Relaxed will not add a happens-before edge, so we have a data race.

Now, another question is whether this should be a data race. Maybe the standard should be changed to make this legal code, because maybe there is no optimization that relies on this. The fact that the second operation is a write and we have a control dependency makes it very hard for me to conjure an example for a misoptimization -- I have nothing reasonable to offer.

Things become even more interesting when you look at the LLVM model. It defines data races like C does, but says that a read-write race is not UB -- instead, the read returns undef. However, in this case, there is no reasonable causal way to define the read in thread A to return undef -- there's no way for this read to "know" that it will race with a write later! Only the write can "detect" that it is in conflict with a read it was not synchronized with. To me, that's a strong argument that this should not be a race, but both C11 and LLVM currently clearly define it as such (in my reading).


There's another fact that the comment neglects: Thread A could have been writing! For example (this is another example from rust-lang/rust#51780), consider the following code:

fn main() {
    let a1 = Arc::new(Mutex::new(0));
    let mut a2 = &mut a1.clone();
    join(
        || {
            { let mut guard = a1.lock().unwrap();
              *guard = 1; // non-atomic write to data
              drop(guard); } // release write to lock flag
            drop(a1); // release write to refcount
        },
        || {
            loop {
                match Arc::get_mut(&mut a2) { // relaxed read from refcount
                    None => (),
                    Some(m) =>
                    { let _v = *m.get_mut().unwrap(); // non-atomic read from data
                      return; }
                }
            }
        }
    );
}

Now, we have a read in thread B racing with a write in thread A. The write has not been made properly visible to thread B. This time, it seems reasonable to make this a race in LLVM's model because the read, happening later, has a chance of changing its behavior based on the unsychronized write that previously happened in another thread, and hence the read will return undef.

It's also easier to imagine a miscompilation: For example, if LLVM realizes that a2 is a dereferencable pointer and get_mut performs only reads (no writes), it is allowed to do the read for _v speculatively before get_mut.

@bholley
Copy link
Contributor

@bholley bholley commented Jul 17, 2018

@Amanieu @RalfJung

I understand the potential race here, and that the comment in the code is wrong. My question is whether an Acquire load on B fixes the problem. My mental model is that an Acquire load in B ensures that B observes any writes made by A that occurred before A stored the value (via Release). But IIUC it technically has no impact on what A observes. Do I have that wrong?

Regarding @RalfJung's second point - if there's a mutex involved for internal mutability, then the mutex has its own atomic synchronization, and the Arc refcounts are irrelevant, right?

@Amanieu
Copy link

@Amanieu Amanieu commented Jul 17, 2018

@bholley Yes, an acquire load in get_mut will fix this problem.

@bholley
Copy link
Contributor

@bholley bholley commented Jul 17, 2018

I believe you, but I'm trying to understand why. Can you explain where my mental model differs from the C++ memory model?

@RalfJung
Copy link
Author

@RalfJung RalfJung commented Jul 17, 2018

The relevant part in the C++ model is getting a happens-before relationship between any two accesses (if at least one is non-atomic and at least one is a write) to the same location. It doesn't matter which of the two accesses is a load or a store.

An acquire load in B ensures that any subsequent event in B is "logically ordered after" (in the happens-before relation) any event in A that happened before the release write in A. It doesn't matter whether those events are loads or stores. So, it's not about what A observes. It's, in some funny sense, about B "observing" A's read (though I prefer to think about it in terms of X being ordered after Y, not about X observing Y).

Does that answer the question? (I am guessing a bit here which point exactly I should focus on.)

@bholley
Copy link
Contributor

@bholley bholley commented Jul 17, 2018

Yes, that's helpful.

So to summarize, my mental model was based on the note in the spec that says:

Informally, performing a release operation on A forces prior side effects on other memory
locations to become visible to other threads that later perform a consume or an acquire
operation on A.

But that's actually imprecise in an important way here. If I slog through the prose in the relevant sections, it's all defined in terms of happens-before, and doesn't specify who reads and who writes:

An atomic operation A that performs a release operation on an atomic object M synchronizes
with an atomic operation B that performs an acquire operation on M and takes its value from
any side effect in the release sequence headed by A.

And then:

An evaluation A inter-thread happens before an evaluation B if A synchronizes with B...

And then:

An evaluation A happens before an evaluation B if: A is sequenced before B, or A
inter-thread happens before B

And then:

The execution of a program contains a data race if it contains two conflicting actions
in different threads, at least one of which is not atomic, and neither happens before the other.

Where the definition of "conflict" is:

Two expression evaluations conflict if one of them modifies a memory location
and the other one accesses or modifies the same memory location

So all of that applies equally to (read in A, write in B) as it does to (write in A, read in B). The happens-before relationship is always in the order of Release-then-Acquire, which generally means store-then-load on the atomic. But the non-atomic operations that get entrained into the transitive happens-before chain can be either loads or stores. Do I have that right?

Also, @RalfJung, am I correct about the Mutex in my comment above?

@bholley
Copy link
Contributor

@bholley bholley commented Jul 17, 2018

Filed [1] to fix this for the m-c copy.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1476445

@bholley
Copy link
Contributor

@bholley bholley commented Jul 17, 2018

(And @emilio will presumably want to cherry-pick that one)

@emilio
Copy link
Member

@emilio emilio commented Jul 17, 2018

Yeah, I'll sync that over on the next sync.

@RalfJung
Copy link
Author

@RalfJung RalfJung commented Jul 18, 2018

@bholley

So all of that applies equally to (read in A, write in B) as it does to (write in A, read in B). The happens-before relationship is always in the order of Release-then-Acquire, which generally means store-then-load on the atomic. But the non-atomic operations that get entrained into the transitive happens-before chain can be either loads or stores. Do I have that right?

Yes.

One (strange?) way to see this is to consider that reads, in some sense, also have a side-effect. (The first paragraph you quoted talks about "making side-effects visible", it does not mention writes specifically.) They don't change memory, but they depend on memory. Also, due to per-location coherence, a read can change the thread-local state because once some some event is observed in a read, it is impossible for that thread to read older values later (that's the mo order, or memory order, constraint).


if there's a mutex involved for internal mutability, then the mutex has its own atomic synchronization, and the Arc refcounts are irrelevant, right?

No. The Mutex does have synchronization, but if you look at my example, you can see that it does not help. Thread A release the Mutex with a release write, but thread B "acquires" the Mutex through get_mut which does not do any synchronization. The "release write to lock flag" is never loaded from, so it never incurs a happens-before edge. (And, in fact, we could mem::forget the Mutex in thread A to release it without any synchronizing write.) Mutex relies on the rest of the exosystem "getting &mut right" so when you have &mut Mutex<T>, you are already synchronized. Arc, however, breaks that promise, leading to a "more traditional" synchronization violation, where thread A writes and thread B reads but the write has not been made properly visible.

@bholley
Copy link
Contributor

@bholley bholley commented Jul 18, 2018

Ah, right - I'd forgotten that get_mut relies on &mut and doesn't do anything similar to is_unique.

That makes sense - thanks for taking the time to help me debug my mental model!

@sweihub
Copy link

@sweihub sweihub commented Jun 1, 2020

Arc, however, breaks that promise, leading to a "more traditional" synchronization violation...

This data race issue persists in the latest nightly build, is the Arc data race condition as designed? If it’s as designed, should it be documented somewhere?

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Jun 1, 2020

@weisunding This is a two year old bug marked as fixed by a code change. If there is still a related issue today, could you please file a new bug with a more detailed explanation? I don’t fully understand the message you’re quoting…

@RalfJung
Copy link
Author

@RalfJung RalfJung commented Jun 1, 2020

@SimonSapin they might be referring to #26358: the issue is fixed in git but not on crates.io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.