Ringbuffers are not thread safe #388

nicholashaydensmith · 2018-10-28T22:04:05Z

The ringbuffer code is not thread safe because variables write_ptr and read_ptr are read from and written to by different threads. Not only are the size_t variables themselves subject to data hazard, the compiler and processor are free to reorder any of the operations inside of jack_ringbuffer_read and jack_ringbuffer_write because there is zero fencing. Consider the following scenario:

Producer thread: calls jack_ringbuffer_write and updates write_ptr before we actually memcpy any data and gets context switched out.
Consumer thread: calls jack_ringbuffer_read and thinks there is data to be consumed when in fact the producer thread hasn't actually written anything to memory.

The volatile keyword does not make anything threadsafe and shouldn't be used in this context. volatile is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.

Instead of using volatile, the write_ptr and read_ptr variables should be made atomic and use proper memory order semantics to ensure thread safety, https://en.cppreference.com/w/cpp/atomic/memory_order.

99.9% of the time I bet the code, as it stands, works perfectly fine so I understand this is probably not a super high priority issue. Just thought I'd raise this issue for correctness purposes. I would submit a pull request, but it would violate the terms and agreement with my current employer.

The text was updated successfully, but these errors were encountered:

falkTX · 2018-10-28T22:22:32Z

I am aware of this "issue", but tbh it seems all just theory.
Before trying to fix this, we need to have a test case that proves the bad behavior.

kmatheussen · 2018-10-29T12:16:18Z

Hopefully it's only in theory that it would misbehave, at least on intel CPUs. But it's undoubtedly a real problem is if you use tsan, so this must be fixed.

Here's a disussion about it from a couple of years ago: http://linux-audio.4202.n7.nabble.com/Realtime-inter-thread-communication-td99157.html

I did supply a patch for the ringbuffer in that thread, but I've lost access to the account where it was uploaded. Anyway, it's an easy fix.

kmatheussen · 2018-10-29T12:22:33Z

The volatile keyword does not make anything threadsafe and shouldn't be used in this context. volatile is really only useful for memory mapped IO and is never appropriate to use as a means for inter-thread communication.

Actually, the microsoft c compiler guarantee instructions on volatile variables not to be reordered: https://preshing.com/20141024/my-multicore-talk-at-cppcon-2014/
https://msdn.microsoft.com/en-us/library/12a04hfd.aspx
Seems like Microsoft did the right thing there.

jcelerier · 2018-11-02T12:10:39Z

Before trying to fix this, we need to have a test case that proves the bad behavior.

I had similar code in my codebase that worked fine and then failed when compiling at high optimisation levels with LTO, since the compiler was able to "assume" that because there is no actual protection, then writing from two threads would be undefined behaviour, and since undefined behaviour cannot ever happen from the point of view of the compiler, the only logical alternative was to assume that everything would happen on a single thread.

Actual writes were hence eschewed and everything would stay in CPU caches for who knows how long (in practice write synchronization didn't happen at all from what I remember). Using atomic<T>* instead of volatile T* solved it.

falkTX · 2018-11-02T13:20:13Z

Can you create a similar case in some small code?

In any case, multiple writers to a single ringbuffer does not sound like a good idea to me, even with memory fences and other techniques.

jcelerier · 2018-11-02T14:19:39Z

the simplest example code that reproduces this kind of problems would be this :

#include <thread>
#include <iostream>
int main()
{
    int i = 0;
    volatile int* p = &i;
    
    std::thread t([p] { 
        (*p)++;
    });
    
    while(i == 0) ;
    
    std::cerr << i << std::endl;   
    
    t.join();
}

for me with gcc 8, it works fine at -O0 / -O1 but stays in an infinite loop at -O2 / -O3

falkTX · 2018-11-02T14:58:46Z

I know about that case, the optimization makes the final binary skip the while loop.
The "issue" there is the local variable.

A ringbuffer is quite different, the class variables in there are not local scope variables like in the example.

I am yet to see working (proof of mis-write/read) test code against it.

jpcima · 2019-01-09T00:09:28Z

I've tried porting the robustness check from my own ring-buffer library to test.
I send a sequence of integers and I check on the receiving side.

Apparently, I am able to reproduce a data race problem.
Without a forced delay following read or write, I have a failure and a report by thread sanitizer.

test_case.cc.gz
test_case.txt.gz

Hope this helps.

jackdmp version 1.9.12 tmpdir /dev/shm protocol 8

7890 · 2019-01-10T22:39:45Z

Hi, I think you are using the ringbuffer in a wrong way.

When you use jack_ringbuffer_write() and it was successful, there's no need to call jack_ringbuffer_write_advance(). The latter is only useful when the pointers were retrieved via jack_ringbuffer_get_write_vector().

Now if you use jack_ringbuffer_write() and count wasn't equal, you'll already have written a portion of data to your buffer and have messed up the event boundaries from there on.

I've slightly modified your testcase:

test_case.cc.txt

It runs without errors with delay=0.
Greetings

jpcima · 2019-01-10T22:56:38Z

Tried and you're right @7890, it was a misuse and the test case seems to work robustly now.
While thread sanitizer still emits its complaint, it ran without error on my side.

7890 · 2019-01-10T22:58:02Z

Excellent :)

7890 · 2019-01-14T22:56:51Z

Please re-open if this poses an "real" issue with a reproducible case.

Krasjet · 2021-01-23T02:51:08Z

Hey, I want to give some updates about this issue. I am able to consistently reproduce this issue on a raspberry pi 4 using the test cases from #388 (comment). It appears that the relaxed memory ordering on ARM platforms does pose a problem to the current implementation.

I wonder if anyone is able to reproduce this issue on Apple Silicon?

Krasjet · 2021-01-23T02:57:02Z

Here is a sample output:

...
---------------- 418756/1000000 ----------------
---------------- 418757/1000000 ----------------
message (0) != expected (82)

I have deleted all the print statements between each iteration to speed up the testing.

Krasjet · 2021-01-23T03:08:26Z

Here is the test program I'm using, with some minor cleanups

//          Copyright Jean Pierre Cimalando 2018.
// Distributed under the Boost Software License, Version 1.0.
//    (See accompanying file LICENSE or copy at
//          http://www.boost.org/LICENSE_1_0.txt)

// g++ -O2 -o test_case test_case.cc -ljack -lpthread

#include <jack/ringbuffer.h>
#include <memory>
#include <thread>
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>

jack_ringbuffer_t *ring_buffer;
size_t capacity = 1024;
size_t message_count = 100;
size_t message_size = 100;

void thread1()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    *(size_t *)message_buffer.get() = i;

    size_t can_write=jack_ringbuffer_write_space(rb);
    size_t count=0;
    if (can_write>=message_size) {
      count = jack_ringbuffer_write(rb, (const char *)message_buffer.get(), message_size);
    }
    if (count == message_size)
      ++i;
    else
      sched_yield();
  }
}

void thread2()
{
  jack_ringbuffer_t *rb = ::ring_buffer;
  const size_t message_size = ::message_size;
  const size_t message_count = ::message_count;
  std::unique_ptr<uint8_t[]> message_buffer(new uint8_t[message_size]);

  for (size_t i = 0; i < message_count;) {
    size_t can_read=jack_ringbuffer_read_space(rb);
    size_t count=0;
    if (can_read>=message_size) {
      count = jack_ringbuffer_read(rb, (char *)message_buffer.get(), message_size);
    }

    if (count == message_size) {
      size_t msg = *(size_t *)message_buffer.get();
      if (msg != i) {
        printf("message (%zu) != expected (%zu)\n", msg, i);
        exit(1);
      }
      ++i;
    }
  }
}

int main()
{
  size_t ntimes = 1000000;
  for (size_t i = 0; i < ntimes; ++i)
  {
    printf("---------------- %4zu/%4zu ----------------\n", i + 1, ntimes);
    jack_ringbuffer_t *rb = ::ring_buffer = jack_ringbuffer_create(::capacity);
    std::thread t1(thread1);
    std::thread t2(thread2);
    t1.join();
    t2.join();
    ::ring_buffer = nullptr;
    delete rb;
  }
  printf("-------------------------------------------\n");
  printf("success!\n");

  return 0;
}

This patch addresses the thread safety problem of `jack_ringbuffer_t` mentioned in jackaudio#715 and jackaudio#388. The overbound read bug caused by this problem is impossible to reproduce on x86 due to its strong memory ordering, but it is a problem on ARM and other weakly ordered architectures. Basically, the main problem is that, on a weakly ordered architecture, it is possible that the pointer increment after `memcpy` becomes visible to the other thread before `memcpy` finishes: memcpy (&(rb->buf[rb->write_ptr]), src, n1); // vvv can be visible to reading thread before memcpy finishes rb->write_ptr = (rb->write_ptr + n1) & rb->size_mask; If this happens, the other thread can read the remaining garbage values in `rb->buf` due to be overwritten by the unfinished `memcpy`. To fix this, an explicit pair of release/acquire memory fences [1] is used to ensure the copy on the other thread *happens after* the `memcpy` finishes so no garbage values can be read. [1]: https://preshing.com/20130922/acquire-and-release-fences/

* fix ringbuffer thread safety on ARM. fix #715 #388 This patch addresses the thread safety problem of `jack_ringbuffer_t` mentioned in #715 and #388. The overbound read bug caused by this problem is impossible to reproduce on x86 due to its strong memory ordering, but it is a problem on ARM and other weakly ordered architectures. Basically, the main problem is that, on a weakly ordered architecture, it is possible that the pointer increment after `memcpy` becomes visible to the other thread before `memcpy` finishes: memcpy (&(rb->buf[rb->write_ptr]), src, n1); // vvv can be visible to reading thread before memcpy finishes rb->write_ptr = (rb->write_ptr + n1) & rb->size_mask; If this happens, the other thread can read the remaining garbage values in `rb->buf` due to be overwritten by the unfinished `memcpy`. To fix this, an explicit pair of release/acquire memory fences [1] is used to ensure the copy on the other thread *happens after* the `memcpy` finishes so no garbage values can be read. [1]: https://preshing.com/20130922/acquire-and-release-fences/ * remove volatile qualifier on ringbuf r/w pointers The volatile constraints are excess when compiler barriers are present. It generates unnecessary `mov` instructions when pointers aren't going to be updated. * simplify read/write space calculations This optimization is possible because the buffer size is always a power of 2. See [1] for details. [1]: drobilla/zix#1 (comment) * move acq fences to separate lines

7890 closed this as completed Jan 14, 2019

7890 added the can't reproduce label Jan 14, 2019

Krasjet mentioned this issue Jan 26, 2021

Thread safety of jack ringbuffer on ARM #715

Closed

Krasjet mentioned this issue Jul 21, 2022

Fix ringbuffer thread safety on ARM. Fix #715 #388 #886

Merged

raphlinus mentioned this issue Jan 3, 2024

Add subgroup barrier to subgroups proposal gpuweb/gpuweb#4437

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ringbuffers are not thread safe #388

Ringbuffers are not thread safe #388

nicholashaydensmith commented Oct 28, 2018

falkTX commented Oct 28, 2018

kmatheussen commented Oct 29, 2018

kmatheussen commented Oct 29, 2018

jcelerier commented Nov 2, 2018 •

edited

Loading

falkTX commented Nov 2, 2018

jcelerier commented Nov 2, 2018 •

edited

Loading

falkTX commented Nov 2, 2018

jpcima commented Jan 9, 2019

7890 commented Jan 10, 2019

jpcima commented Jan 10, 2019

7890 commented Jan 10, 2019

7890 commented Jan 14, 2019

Krasjet commented Jan 23, 2021

Krasjet commented Jan 23, 2021 •

edited

Loading

Krasjet commented Jan 23, 2021

Ringbuffers are not thread safe #388

Ringbuffers are not thread safe #388

Comments

nicholashaydensmith commented Oct 28, 2018

falkTX commented Oct 28, 2018

kmatheussen commented Oct 29, 2018

kmatheussen commented Oct 29, 2018

jcelerier commented Nov 2, 2018 • edited Loading

falkTX commented Nov 2, 2018

jcelerier commented Nov 2, 2018 • edited Loading

falkTX commented Nov 2, 2018

jpcima commented Jan 9, 2019

7890 commented Jan 10, 2019

jpcima commented Jan 10, 2019

7890 commented Jan 10, 2019

7890 commented Jan 14, 2019

Krasjet commented Jan 23, 2021

Krasjet commented Jan 23, 2021 • edited Loading

Krasjet commented Jan 23, 2021

jcelerier commented Nov 2, 2018 •

edited

Loading

jcelerier commented Nov 2, 2018 •

edited

Loading

Krasjet commented Jan 23, 2021 •

edited

Loading