Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSan: possible false positive with atomics #1009

Closed
mpoeter opened this issue Oct 18, 2018 · 13 comments
Closed

TSan: possible false positive with atomics #1009

mpoeter opened this issue Oct 18, 2018 · 13 comments

Comments

@mpoeter
Copy link

mpoeter commented Oct 18, 2018

The following program:

#include <thread>
#include <atomic>

struct node {std::atomic<int> value; };
std::atomic<node*> _node{nullptr};

void f1() {
  auto n = new node();
  n->value.store(1, std::memory_order_release);
  _node.store(n, std::memory_order_relaxed);
}

void f2() {
  auto n = _node.load(std::memory_order_relaxed);
  while (n == nullptr)
    n = _node.load(std::memory_order_relaxed);
  n->value.fetch_add(1, std::memory_order_acquire);
}

int main() {
    std::thread t1(f1);
    std::thread t2(f2);

    t1.join();
    t2.join();

    return 0;
}

produces the following report:

WARNING: ThreadSanitizer: data race (pid=6648)
  Atomic write of size 4 at 0x7b0400000800 by thread T2:
    #0 __tsan_atomic32_fetch_add <null> (libtsan.so.0+0x67e25)    #1 std::__atomic_base<int>::fetch_add(int, std::memory_order) /usr/include/c++/7/bits/atomic_base.h:514 (main+0x1098)
    #2 f2() /home/mpoeter/dev/tsan_test/main.cpp:17 (main+0x1098)
    #3 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/7/bits/invoke.h:60 (main+0x11b9)
    #4 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/7/bits/invoke.h:95 (main+0x11b9)
    #5 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/7/thread:234 (main+0x11b9)
    #6 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/7/thread:243 (main+0x11b9)
    #7 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/7/thread:186 (main+0x11b9)
    #8 <null> <null> (libstdc++.so.6+0xbd57e)

  Previous write of size 8 at 0x7b0400000800 by thread T1:
    #0 operator new(unsigned long) <null> (libtsan.so.0+0x73f5a)
    #1 f1() /home/mpoeter/dev/tsan_test/main.cpp:8 (main+0xffe)
    #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/7/bits/invoke.h:60 (main+0x11b9)
    #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/7/bits/invoke.h:95 (main+0x11b9)
    #4 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/7/thread:234 (main+0x11b9)
    #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/7/thread:243 (main+0x11b9)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/7/thread:186 (main+0x11b9)
    #7 <null> <null> (libstdc++.so.6+0xbd57e)

  Location is heap block of size 4 at 0x7b0400000800 allocated by thread T1:
    #0 operator new(unsigned long) <null> (libtsan.so.0+0x73f5a)
    #1 f1() /home/mpoeter/dev/tsan_test/main.cpp:8 (main+0xffe)
    #2 void std::__invoke_impl<void, void (*)()>(std::__invoke_other, void (*&&)()) /usr/include/c++/7/bits/invoke.h:60 (main+0x11b9)
    #3 std::__invoke_result<void (*)()>::type std::__invoke<void (*)()>(void (*&&)()) /usr/include/c++/7/bits/invoke.h:95 (main+0x11b9)
    #4 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/7/thread:234 (main+0x11b9)
    #5 std::thread::_Invoker<std::tuple<void (*)()> >::operator()() /usr/include/c++/7/thread:243 (main+0x11b9)
    #6 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run() /usr/include/c++/7/thread:186 (main+0x11b9)
    #7 <null> <null> (libstdc++.so.6+0xbd57e)

  Thread T2 (tid=6651, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bd4e)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd834)
    #2 main /home/mpoeter/dev/tsan_test/main.cpp:22 (main+0x10e5)

  Thread T1 (tid=6650, finished) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x2bd4e)
    #1 std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) <null> (libstdc++.so.6+0xbd834)
    #2 main /home/mpoeter/dev/tsan_test/main.cpp:21 (main+0x10d4)

SUMMARY: ThreadSanitizer: data race (/usr/lib/x86_64-linux-gnu/libtsan.so.0+0x67e25) in __tsan_atomic32_fetch_add
==================
ThreadSanitizer: reported 1 warnings

Essentially TSan reports a data race between the fetch_add in f2 and the implicit initialization of value by the operator new in f1.

Obviously this could be fixed by using acquire/release order for the operations on _node, but IMHO this should not be necessary. f1 performs a store-release on value before it "publishes" the node. The fetch_add uses memory_order_acquire. In the modification order of _node the fetch_add is ordered after the store, so it reads that value. Therefore, the fetch_add synchronizes-with the store, and the store is sequenced after the initialization in operator new. So IMHO this is no data race as there is a proper happens-before relation between the initialization and the fetch_add.

What do you think?

@dvyukov
Copy link
Contributor

dvyukov commented Oct 18, 2018

In the modification order of _node the fetch_add is ordered after the store

This is part is not true.
For that to be true, there must be another happens-before relation that orders the store and fetch_add. There is no such relation.

@dvyukov dvyukov closed this as completed Oct 18, 2018
@mpoeter
Copy link
Author

mpoeter commented Oct 19, 2018

I tend to disagree...

All modifications to a particular atomic object M occur in some particular total order, called the modification order of M . If A and B are modifications of an atomic object M and A happens before B, then A shall precede in the modification order of M.

But not necessarily the other way round.

In my example there is no happens before relation between the store and the fetch_add, but of course they are ordered in some way in the modification order. Granted, my example provides no guarantee that the store is ordered before the fetch_add in the modification order (even though I would assume that this is the case on virtually all architectures), but I still stand by my original point: If the fetch_add reads the value written by the store, it synchronizes-with that store. Since I am working on an Intel architecture I know that this is the case.

Let's extend the example:

#include <thread>
#include <atomic>

struct node { std::atomic<int> value; };
std::atomic<node*> _node{nullptr};
int y;

void f1() {
  auto n = new node();
  y=42;
  n->value.store(1, std::memory_order_release);
  _node.store(n, std::memory_order_relaxed);
}

void f2() {
  auto n = _node.load(std::memory_order_relaxed);
  while (n == nullptr)
    n = _node.load(std::memory_order_relaxed);
  n->value.fetch_add(1, std::memory_order_acquire);
  y=43;
}

int main() {
    std::thread t1(f1);
    std::thread t2(f2);

    t1.join();
    t2.join();

    return 0;
}

So we have the following relations (sb = sequenced-before, rf = reads-from):

      Thread 1                               Thread 2: 

    init(value)
         | sb
   write(y, 42)
         | sb
store(n->value, 1, release)--\
         | sb                |
store(_node, n, relaxed) --- | -- rf ---> read(_node, relaxed)
                             |                 | sb
                             \--- rf -> fetch_add(n->value, acquire)
                                               | sb
                                           write(y, 43)

If I make the fetch_add relaxed, TSan correctly reports a race on y. But if I use acquire it correctly recognizes the synchronize-with relation that orders the two writes on y. However, even though the initialization of value is sequenced-before the first write to y, TSan still reports a race on value.

Without knowing any details of TSan's inner workings, I would assume that this is because the putative race happens on the same object that is used to establish the synchronization.

Interestingly, TSan always reports a race on value, even when I make both operations sequentially-consistent. Equally interesing: when I make y a member of node TSan fails to detect the race on y when I relax the fetch_add.

@dvyukov
Copy link
Contributor

dvyukov commented Oct 19, 2018

The race happens in the executions that don't have rf between fetch_add and store. There are such executions because by the time fetch_add starts executing there are no ordering relations established between the threads.
Acquire/release pair can't synchronize itself. It only synchronizes satellite data after the hb relation is established.

@dvyukov
Copy link
Contributor

dvyukov commented Oct 19, 2018

Try to use this tool:
http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/
It should draw all relations in all executions and show if it's racy in some executions or not.
You can replace the loop with readsvalue predicate, e.g. y.load(memory_order_acquire).readsvalue(1).

@mpoeter
Copy link
Author

mpoeter commented Oct 21, 2018

Interesting... I was under the impression that in case of an acquire-fetch-add the read-part of the read-modify-write operation establishes the synchronize-with relation. Can you point me to some sources where I can find more information about this synchronization of satellite data? I did not find anything in the C++ standard...

True, there is no enforced order between the store and the fetch-add, so there can be executions where the fetch-add does not read the value from the store. In my specific case I know that it does, so I expected it to establish the happens-before relation at that point, but you are probably right to treat it as a data race.

Thanks for the link, I will take a look it!

@dvyukov
Copy link
Contributor

dvyukov commented Oct 29, 2018

Interesting... I was under the impression that in case of an acquire-fetch-add the read-part of the read-modify-write operation establishes the synchronize-with relation.

This is true. But it only affects subsequent memory operations, not the RMW itself. A memory operations can't synchronize itself. Memory ordering is only about dependent/satellite data.

Can you point me to some sources where I can find more information about this synchronization of satellite data?

This is an informal term.
C/C++ standards say things along the lines of "a memory access A will see this and that if there is a happens-before relation already in place (by the time A starts)". So that something that established the happens-before relation previously can be though of as a "primary" synchronization. And now this access A is only dependent/satellite data for that primary synchronizing memory access.

In my specific case I know that it does

If here you mean that I am working on an Intel architecture I know that this is the case, then this is irrelevant here. Intel rules are relevant only if you program in assembly. If you program in C/C++ you need to follow C/C++ rules. For one compiler can mess things even on Intel in ways you don't expect.

@mpoeter
Copy link
Author

mpoeter commented Oct 30, 2018

I have studied the C++ standard and its definitions quite extensively, but unfortunately even though it tries to provide formal definitions it still leaves some room for interpretation. Due to the lack of more specific information I was assuming that if the read-part of an RMW establishes the synchronize-with relation, then the write-part is already synchronized. I take your word for it that this is not the case. 🙂

If you program in C/C++ you need to follow C/C++ rules. For one compiler can mess things even on Intel in ways you don't expect.

I am well aware. I just wanted to point out that in this specific case I knew that the fetch-add would read the value written by the store-release, and I was therefore expecting that the write-part would not cause a data race. Obviously the example code does not provide any guarantee that the fetch-add would always see that value, so it can race. But since TSan only reports races that actually occur I did not expect a race report in this specific case (due to my assumption of an implicit synchronization in the RMW operation).

@dvyukov
Copy link
Contributor

dvyukov commented Oct 30, 2018

Due to the lack of more specific information I was assuming that if the read-part of an RMW establishes the synchronize-with relation, then the write-part is already synchronized.

True. But what you are assuming is that the write part then also synchronizes the read part in hindsight. That is, read synchronizes write, which then synchronizes read and justifies own synchronization. In other words, an infinite loop that self-justifies itself. This is this part that is not true.
What happens in practice: by the time read happens, not synchronization is establishes yet. So read reads uninit garbage and does not actually synchronize anything.

I knew that the fetch-add would read the value written by the store-release

How? Why? Nobody provides any guarantees for this.

Well, if you printed the read value later and concluded that read in fact read the stored value, then you may think of this in 3 ways:

  1. The read observed uninit garbage, but that uninit garbage looked like the stored value.
    or 2. The race happened, tsan detected and reported it and then fixed it up.
    or 3. Data race renders behavior of the program undefined, so anything you observe wrt program execution is undefined, does not matter and does not allow to build any conclusions.

@mpoeter
Copy link
Author

mpoeter commented Oct 30, 2018

But what you are assuming is that the write part then also synchronizes the read part in hindsight. That is, read synchronizes write, which then synchronizes read and justifies own synchronization.

I can't follow. Why would the write-part synchronize the read-part?

How? Why? Nobody provides any guarantees for this.

The C++ code does not provide any such guarantee, but the underlying x86 memory model that my code was running on does. I know that this cannot be generalized and I never claimed that. But assuming that this is the case (again, only in this very specific scenario), I expected TSan to recognize the synchronize-with relation. If I replace the fetch-add with separate load and store operations I have the same potential race, but TSan does not report anything. I expected the same for the fetch-add.

But I am already convinced that it is correct to report a race in this scenario! 🙂

@dvyukov
Copy link
Contributor

dvyukov commented Oct 30, 2018

I can't follow. Why would the write-part synchronize the read-part?

It does not. But that's what I figured from your argumentation like: 'If the fetch_add reads the value written by the store, it synchronizes-with that store`.

The C++ code does not provide any such guarantee, but the underlying x86 memory model that my code was running on does.

The underlying architecture does not matter here in any way, shape or form. It only matters if you program in assembly. If you program in C/C++, C/C++ rules are in play.
To put it to extreme: a conforming compiler can detect that the program triggers a race in every execution, and instead emit code that collects all credit card numbers from the machine and sends them to a public web page. Strictly saying, this would be a legal behavior of the program. x86 won't save you in any way here.

If I replace the fetch-add with separate load and store operations I have the same potential race, but TSan does not report anything.

Looks like something to improve in tsan, I filed #1014 for this.

@dvyukov
Copy link
Contributor

dvyukov commented Oct 30, 2018

Just to clarify to future readers, the proper way to express this in C++ is (acquire/release should be on _node and value does not need to be atomic at all):

struct node { int value; };
void f1() {
  auto n = new node();
  n->value = 1;
  _node.store(n, std::memory_order_release);
}

void f2() {
  auto n = _node.load(std::memory_order_acquire);
  while (n == nullptr)
    n = _node.load(std::memory_order_acquire);
  n->value++;
}

@mpoeter
Copy link
Author

mpoeter commented Oct 30, 2018

The underlying architecture does not matter here in any way, shape or form. It only matters if you program in assembly. If you program in C/C++, C/C++ rules are in play.

Absolutely agree! I was just expecting that TSan would not be able to detect this race simply because the underlying memory model provides this guarantee - similar to the case when I use separate load/store operations instead of the fetch-add. But as I said, I am already convinced that TSan is right to report a race, and it is a good idea to improve TSan to also recognize a race when using a load operation.

But for what it's worth, I think the reported race in the original program should be adapted. It reports a race between the two write operations, but it should probably report the read of the RMW as the conflicting operation. Would you agree?

@dvyukov
Copy link
Contributor

dvyukov commented Oct 30, 2018

Agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants