-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TSan false positives due to volatile write handling #12681
Conversation
I half expected my proof to be false, but glad to have a definitive counter-example, thanks! I didn’t know (or had forgotten) about I will therefore close this, as I do not see a way to repair it for now. |
The counter-example of @maranget does not necessarily mean that we must give up on this approach, it only means that volatile reads (after the transformation) can have behaviors not allowed for relaxed reads in the C memory model. My mental model of relaxed reads (which, I learned in @maranget's office this afternoon, is weaker than the spec in the current memory model) is that we can read any previous value. (This is what we get with non-atomic reads in the OCaml memory model for example.) In that weaker model, the transformation may be correct, at least the counter-example does not introduce a new behavior. For me that would be a good-enough model of volatile reads to program the runtime with. |
Intuitively, I am also tempted to say that only unreasonable programs would rely on this aspect of the spec. And I doubt that the runtime does. For what it’s worth, I have the testsuite passing with |
I would like to suggest another attempt for this volatile write: CAMLreally_no_tsan void __tsan_volatile_write8(void *ptr)
{
const bool is_atomic = size <= sizeof(long long) && is_aligned(ptr, 8);
if (is_atomic) {
/* Signal a relaxed atomic store to TSan. In order not to change the
semantics of the program, write the last seen value. As these are
relaxed operations, no synchronization is introduced and the semantics
over multiple threads should not change. */
uint64_t value = atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
while (!__tsan_atomic64_compare_exchange_weak((volatile uint64_t *)ptr, &value, value,
memory_order_relaxed, memory_order_relaxed)) {}
} else
__tsan_write8(ptr);
} We've tried to convince ourselves (with @OlivierNicole) that this is correct and tried to prove it with Herd7, but we couldn't (probably because our litmus wasn't correct: it never terminates). What do you think of this proposal @maranget? |
Since the test was causing some issues with herd, I tried it with another similar tool.
The result is the same as the variant without the extra loads and whiles, so at least for the program that @maranget sent earlier, the transformation seems to be correct. |
Thank you @hernanponcedeleon for running this! I can’t find another counterexample; but then, I have a poor track record of making predictions in C11. 😄 |
As a non-expert I find the code bewildering. It is much more complex than the original proposal, and my uninformed guess would be that the complexity hides the potential issues rather than removing them. uint64_t value = atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
while (!__tsan_atomic64_compare_exchange_weak((volatile uint64_t *)ptr, &value, value,
memory_order_relaxed, memory_order_relaxed)) {} Naive questions:
|
Sorry for barging in the middle of the discussion, but I think translating a volatile write as a As far as I understand from the discussion, we want to signal a relaxed atomic write to TSAN. The implementation for an AtomicRMW is at - https://github.com/llvm/llvm-project/blob/b14d3441a67939533df1b10cc456516c85a3386a/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp#L286
As we can see, the library only signals a write access to the location if memory order is
This has the advantage of keeping the memory write as a single event in the TSAN memory order, thus not requiring any complex arguments about how splitting the operation still guarantees the same semantics. |
It's actually exactly the same principle as @OlivierNicole suggested, but it correctly handles the case where another thread performs a write in the meantime, as reported by @maranget.
For more information on CAS in C my go-to is https://en.cppreference.com/w/c/atomic/atomic_compare_exchange
The
It would be possible for the |
Yes, I believe that was @fabbing’s reasoning too, but it doesn’t hurt that you made it explicit. Just a remark, to make sure we are on the same page: you showed that the TSan view of events is what we want it to be. That is a part of the correctness theorem that we want to ascertain, but not all of it. The other part is that the semantics of the instrumented program does not change (disregarding TSan reports). That is what @maranget disproved about my initial proposition. |
Not sure we're on the same page. I'm saying that instead of translating a volatile write as a relaxed read followed by a CAS loop, we translate it to an atomic_fetch_add operation. This preserves the semantics - an atomic write gets mapped to a single operation that performs a relaxed reads and then writes the same value back to the memory location. Unlike the CAS loop version, this does not require reasoning across multiple memory events - its a 1:1 mapping which preserves TSAN semantics and program semantics. |
I might be missing something trivial here, but I think what you propose would not "write the same value", but rather the read value plus one. This is my understanding of what
|
std::atomic_fetch_add takes an argument to add to the memory location. Passing that as zero guarantees its the same value - |
It was indeed something trivial I missed. I still do not understand how what you propose gives same instruction semantics.
It should not write the same value back, but rather the value that was specified. If we transform |
Now i think i'm missing something, I was trying to replicate this code -
this code isn't writing the provided value but instead just reads the pointer and writes it. So i was suggesting replacing this code with an atomic_fetch_add |
2e46759
to
7697bea
Compare
Ah, I understand now. That is simpler, and I think correct. I implemented this solution: it passes the testsuite, and @fabbing verified that it does not seem significantly slower than the current state of trunk. |
7697bea
to
36f0806
Compare
If I correctly understand the current solution, it transforms this litmus test
into this
I wrote a simple litmus transformation pass implementing this and simulated the 137 tests from here under this c11 model and this rc11 model. In all cases, the results with and without the transformation match. |
If I understand, the key difference with the earlier read-then-write proposal is that the read-write is atomic with those operators (exchange and fetch_add), and this atomicity helps avoid unpleasant orderings. If we atomically read something and write it again, we do have a no-op. @maranget, do you have an opinion on this new proposal using a relaxed fetch-add? |
Hi all, I'd tend to trust this last solution, especially after @hernanponcedeleon tests. Nitpicking, I'd rather see atomic_fetch_or as the "read-dont-modify-write" atomic primitive. |
Nitpicking^2: for real lock-free code, fetch-add is a hardware instruction in x86, while fetch-or is not and must be compiled as a CAS loop. But maybe it doesn't matter here, as the atomic operation is handled/simulated by TSAN anyway? (not sure). |
The TSan function does perform the atomic operation in all cases. For some reason, it is done using legacy builtins like |
A quick test on https://godbolt.org suggests that on x86 they are both compiled to
by a recent Clang, while GCC compiles
|
I tested on goldbolt, thank you very much. The All this is getting seriously out of topic. |
No problem! I wanted to check with the I’m leaving the |
36f0806
to
f726a42
Compare
Replace with finer silencing where needed, although ocaml#12681 should alleviate the need for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable, I agree that the code does what it claims, and I am reassured by @maranget's intuition that the memory-model stuff works out. See minor nitpick comments/questions.
runtime/tsan.c
Outdated
@@ -395,7 +390,15 @@ CAMLreally_no_tsan void __tsan_unaligned_volatile_read##size(void *ptr) \ | |||
} \ | |||
CAMLreally_no_tsan void __tsan_volatile_write##size(void *ptr) \ | |||
{ \ | |||
__tsan_write##size(ptr); \ | |||
const bool is_atomic = size <= sizeof(long long) && \ | |||
is_aligned(ptr, 8); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I find the indentation choice here confusing. I would expect you to either 2-align, or align on size
, but the visual alignment of is_
looks intentional and carries no meaning that I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it didn’t make sense. Fixed in 4fd66b1.
runtime/tsan.c
Outdated
DEFINE_TSAN_VOLATILE_READ_WRITE(16, 128); | ||
|
||
/* We do not treat accesses to 128-bit values as atomic, since it is dubious | ||
that they can be treated as such. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any volatile accesses to 128-bit values in the runtime? Where do they come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, according to my grep
ping there aren’t any. The 128-bit versions of these functions are there because, without them, building a C library for OCaml with TSan enabled will fail at linking with an unresolved symbol error if it contains volatile
accesses to 128-bit values. So I figured it would be better to have 128-bit volatile
s behave silently like plain 128-bit values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this explanation to the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work as always, thanks to everyone involved (@OlivierNicole, @maranget, @fabbing, @hernanponcedeleon, @anmolsahoo25, @xavierleroy). I'm not a memory-model person and I could follow the change and the resulting code. This is good to merge when the CI agree. |
For reasons related to the C FFI, it has been decided that in the runtime and in C libraries, we will consider
volatile
accesses as equivalent to relaxed atomics (see #12473).It is not trivial to explain that to TSan. Fortunately, both GCC and Clang have an option to distinguish
volatile
writes in a custom way, by instrumenting them with a call to a function that we must implement. This option has been introduced to support KCSan, the kernel concurrency sanitizer of the Linux kernel. However, as KCSan is very different from TSan,volatile
accesses are instrumented in a way that makes it difficult to handlevolatile
writes the way we would like. To explain it I will draw from an explanation by @fabbing from a while ago. Whilegcc
/clang
replaceatomic
read/write operations with TSan calls that update TSan's internal state and perform the actual memory operation themselves,volatile
read/write operations are merely decorated with a TSan call, and then the actual operation is performed.An example using an atomic read operation
Reading
a
is replaced with a call to__tsan_atomic64_load
that notifies TSan of the read operation and returns the actual value.An example using a volatile read operation
When reading
v
a call to__tsan_volatile_read8
(at11b5
) precedes the actual memory read operation (at11ba
).Similarly, atomic stores are instrumented by replacing the memory access, whereas
volatile
stores are only decorated.Our current make-do solution is that
__tsan_volatile_readN
performs a dummy call to__tsan_atomic64_load
, which is sufficient for correctness; and__tsan_volatile_writeN
simply calls__tsan_write8
, i.e., is treated as a plain write. The consequence is that (rare) false positives can arise, such as #12282.Proposed change
I propose a new make-do solution: signal the write to TSan as a relaxed atomic write. Because TSan insists on actually performing the write, we first read the location using a relaxed load and use that value in the write, thus making sure that we merely write the current value (from the point of view of the thread). Example for 64-bit words:
This should remove the existing false positives caused by
volatile
writes being seen by TSan as plain stores. And indeed it makes the false positive noticed inweaktest_par_load
in the CI (#12644 (comment)) disappear.Correctness arguments
The three questions that arise upon such a change are: does it change the actual semantics of the program, defined as the set of possible execution traces (modulo TSan reports)? Does it introduce new false positives? Does it introduce false negatives, i.e., does it hide races?
Relaxed atomics do not imply any synchronizations between threads, so no races should be hidden by this. And atomics are not racy, so the change does not create new possibilities of false positives, either.
Finally, is the semantics of the program preserved? I believe so.
Attempt at a proof sketch
Disclaimer: I am no memory model expert, so this is only an amateur’s poor attempt at manipulating the tricky concepts of C11.
I think this fact can be derived from the following, more general statement: in a C program, inserting a relaxed load that yields
v
followed by a relaxed store ofv
into the same locationl
preserves the semantics. It is obviously the case inside the thread where the insertion occurs. Other threads do not see any “new” values for that location, because we wrote one of the previous values for that location.More formally, let us show that the instructions inserted into thread A do not modify the semantics by showing that any trace in thread B that reads the value
v
inl
is, in fact, already present in the initial program.There are two possibilities: either there exists a synchronizes-with relation from A to B after the initial write of
v
in A, or not. If there isn’t, then nothing prevents B from readingv
inl
.If there are such synchronizes-with relation, then the departing point of the first such synchronizes-with relation in A—let’s call it S— is either sequenced-before our inserted instructions in A, or after (indeed, our inserted instructions are relaxed atomics, and cannot be themselves synchronizing).
v
inl
even in the initial program.l
to containv'
≠v
from the point of view of A; or there isn’t. If there is no such event, then B can readv
inl
even in the initial program. If there is one, then our inserted instructions read and re-write the valuev'
≠v
inl
, such that B will seev'
or a newer value. But this was already the case in the initial program.