Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomics and multi core support #5

Closed
japaric opened this issue Oct 31, 2017 · 19 comments
Closed

Atomics and multi core support #5

japaric opened this issue Oct 31, 2017 · 19 comments

Comments

@japaric
Copy link
Member

japaric commented Oct 31, 2017

The current implementation of ring_buffer::{Consumer,Producer} only works on single core systems because it's missing memory barriers / fences. The proper way to add those barries is to change the types of the head and tail fields in RingBuffer from usize to AtomicUsize.

We are not doing that right now because that change would make us drop support for ARMv6-M, which is one of the main use cases (single core microcontrollers). If rust-lang/rust#45085 was implemented we could do the change to AtomicUsize while still supporting ARMv6-M.

I think the options are: (a) wait for rust-lang/rust#45085 or (b) provide a Cargo feature that switches the implementation to atomics to enable multi-core support (seems error prone: you could forget to enable the feature -- the other option is to make multi-core support a default feature but default features are hard, and sometimes impossible, to disable which hinders ARMv6-M support).

@pftbest
Copy link

pftbest commented Oct 31, 2017

provide a Cargo feature that switches the implementation to atomics

Or you can just do #[cfg(target_has_atomic = "ptr")] instead of a cargo feature.

@japaric
Copy link
Member Author

japaric commented Oct 31, 2017

That's a very good idea. We can't still claim full multi-core support with that implementation though because multi-core Cortex-M microcontrollers are a thing: probably a Cortex-M0 + Cortex-Mx combination already exists out there and the cfg implementation will use usize on the Cortex-M0 core. Nonetheless, a cfg implementation is as good as we can do for now.

japaric added a commit that referenced this issue Oct 31, 2017
@pftbest
Copy link

pftbest commented Oct 31, 2017

I've looked again at your implementation, and it looks like it's not correct even for single core CPUs.

There is no guaranteed ordering between this two operations:

            unsafe { ptr::write(buffer.get_unchecked_mut(rb.tail), item) }
            rb.tail = next_tail;

So compiler has the right to generate this code instead:

            rb.tail = next_tail;
            unsafe { ptr::write(buffer.get_unchecked_mut(rb.tail), item) }

To prevent that you need to insert a compiler fence between this two operations, so they won't be reordered. Something like this should be sufficient, I think (But I would have also made rb.tail = next_tail; a volatile operation, because there might be some other reordering lurking around.)

@japaric
Copy link
Member Author

japaric commented Oct 31, 2017

Hmm, I don't think the compiler can do that kind of reordering since it changes the value passed to get_unchecked_mut; at least not in this context where a &mut rb.tail (i.e. no aliasing) implicitly exists due to the rb.tail = assignment.

japaric pushed a commit that referenced this issue Oct 31, 2017
use atomics where available

cc #5
cc @pftbest
@pftbest
Copy link

pftbest commented Oct 31, 2017

Oh, and similar thing with reads,

             let item = unsafe { ptr::read(buffer.get_unchecked(*head)) };
            *head = (*head + 1) % n;

It might increment the head first and then do read. Atomic operations guarantee ordering, but volatile (and regular) operations does not.

since it changes the value passed to get_unchecked_mut

I think this doesn't matter. It can save the old value in register, do assignment and only then do write to a location from register.

But you are right, my example code is not valid, it should have looked like:

            eax = rb.tail;
            rb.tail = next_tail;
            unsafe { ptr::write(buffer.get_unchecked_mut(eax), item) }

which is what I meant by my original comment.

japaric pushed a commit that referenced this issue Oct 31, 2017
use atomics where available

cc #5
cc @pftbest
@japaric
Copy link
Member Author

japaric commented Oct 31, 2017

I don't follow. Basically you are saying that the compiler can do this kind of reordering (swap A and B) in this code as well:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let item = slice[*head]; // A
    *head = (*head + 1) % n;  // B
}

(Which is equivalent to the last code you quoted.)

But that doesn't occur in practice; if it did a bunch of safe Rust would break. Adding an unsafe block doesn't make this reordering possible AFAIK.

@pftbest
Copy link

pftbest commented Oct 31, 2017

Why not? I can just swap them, like this:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let tmp = *head;
    *head = (*head + 1) % n;  // B
    let item = slice[tmp]; // A
}

Oh, I see it can panic, so someone who catches panic can observe that head changed. But `get_unchecked_mut` doesn't panic

japaric pushed a commit that referenced this issue Oct 31, 2017
use atomics where available

cc #5
cc @pftbest
@pftbest
Copy link

pftbest commented Oct 31, 2017

But I still can get around the panic like this:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let tmp = *head;
    if tmp >= n {
        panic!("...");
    }
    *head = (*head + 1) % n;  // B
    let item = slice[tmp]; // A
}

Now they should be identical

@japaric
Copy link
Member Author

japaric commented Oct 31, 2017

OK, I saw the modified read example and I see what you mean now. Sorry, I didn't realize earlier -- I'm already tired. Indeed, if head gets updated before before the data load then an interrupt between the two ops can overwrite the memory slot; I expect something similar can happen in the write case. A compiler barrier should do the trick I think. I'm off for today I'll probably review this again in two days.

@pftbest
Copy link

pftbest commented Oct 31, 2017

then an interrupt between the two ops can overwrite the memory slot

Yes, exactly. Sorry if I was unclear. I just burned myself a lot by writing similar code in C.

japaric pushed a commit that referenced this issue Nov 8, 2017
use atomics where available

cc #5
cc @pftbest
japaric pushed a commit that referenced this issue Nov 8, 2017
use atomics where available

cc #5
cc @pftbest
japaric pushed a commit that referenced this issue Nov 9, 2017
use atomics where available

cc #5
cc @pftbest
japaric pushed a commit that referenced this issue Nov 9, 2017
use atomics where available

cc #5
cc @pftbest
@pftbest
Copy link

pftbest commented Nov 9, 2017

Just tested the latest code from git using tsan and it complains about data race:
https://gist.github.com/pftbest/183d7850a3f4535b7e5f9343929f46ad

Here is the code that I used:
https://gist.github.com/pftbest/3642ffba88e7117094bb2053d27637a1

I think the reason why it fails is your recent commit 9faea68

If you look at some other popular implementations like boost
https://github.com/boostorg/lockfree/blob/develop/include/boost/lockfree/spsc_queue.hpp#L267

And linux kernel:
https://github.com/torvalds/linux/blob/master/Documentation/circular-buffers.txt#L200

They both do acquire at least for consumer side.

I think it maybe OK to lift the acquire for producer side like they did in linux, because producer doesn't read any data so it doesn't need it. But removing acquire for consumer side is not good, because it is the one thing that provides the actual synchronization. One thread(core) releases the data in buffer and the other thread must acquire it before reading, or else it can read the old value.

@pftbest
Copy link

pftbest commented Nov 9, 2017

Tested without this commit and still get the data race, so there might be something else that I don't see.

@japaric
Copy link
Member Author

japaric commented Nov 9, 2017

I tested locally and I can repro. I changed both loads to acquire and the problem remains. What I find strange is that the report mentions that some backtrace (?) operation is contending with the consumer / producer:

  Previous write of size 4 at 0x56253944a018 by thread T1:
    #0 std::sys_common::backtrace::__rust_begin_short_backtrace hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa58e)
    #1 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #2 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #3 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #4 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)

But maybe the backtrace is just pointing to the wrong place.

@pftbest
Copy link

pftbest commented Nov 9, 2017

Found why it fails even if I revert the commit, you had the head and tail mixed up:

https://github.com/japaric/heapless/blob/9533e27f44b21eb331188074771610eb9dae661b/src/ring_buffer/spsc.rs#L46

You release tail in producer, but then acquire head in consumer

@pftbest
Copy link

pftbest commented Nov 9, 2017

Just tested this change against master:

diff --git a/src/ring_buffer/mod.rs b/src/ring_buffer/mod.rs
index 3f53646..abee440 100644
--- a/src/ring_buffer/mod.rs
+++ b/src/ring_buffer/mod.rs
@@ -36,6 +36,10 @@ impl AtomicUsize {
     pub fn store_release(&self, val: usize) {
         unsafe { intrinsics::atomic_store_rel(self.v.get(), val) }
     }
+
+    pub fn load_acquire(&self) -> usize {
+        unsafe { intrinsics::atomic_load_acq(self.v.get()) }
+    }
 }
 
 /// An statically allocated ring buffer backed by an array `A`
diff --git a/src/ring_buffer/spsc.rs b/src/ring_buffer/spsc.rs
index 488c07a..082aab7 100644
--- a/src/ring_buffer/spsc.rs
+++ b/src/ring_buffer/spsc.rs
@@ -45,7 +45,7 @@ where
         let n = rb.capacity() + 1;
         let buffer: &[T] = unsafe { rb.buffer.as_ref() };
 
-        let tail = rb.tail.load_relaxed();
+        let tail = rb.tail.load_acquire();
         let head = rb.head.load_relaxed();
         if head != tail {
             let item = unsafe { ptr::read(buffer.get_unchecked(head)) };

It doesn't report any races when N is smaller than buffer size and reports write after read when N > buffer size. But write after read should be OK since it's a circular buffer after all so we can allow overwriting data.

@fluffysquirrels
Copy link

For now I think it would be nice to add to the module-level documentation that this only works on single-core CPU's.

@pftbest
Copy link

pftbest commented Dec 18, 2017

@fluffysquirrels I think this issue was fixed in v0.2.0. No need to update documentation

@fluffysquirrels
Copy link

@pftbest, OK, nice. In which case this issue can be closed. I think it would still be nice to add a line to the ring_buffer module documentation saying what guarantees are provided (i.e. single producer and consumer, as checked by implementing the Send trait but not the Sync trait).

@japaric
Copy link
Member Author

japaric commented Apr 8, 2018

This was fixed in v0.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants