Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TSan false positives due to volatile write handling #12681

Merged
merged 3 commits into from
Nov 10, 2023

Conversation

OlivierNicole
Copy link
Contributor

@OlivierNicole OlivierNicole commented Oct 20, 2023

For reasons related to the C FFI, it has been decided that in the runtime and in C libraries, we will consider volatile accesses as equivalent to relaxed atomics (see #12473).

It is not trivial to explain that to TSan. Fortunately, both GCC and Clang have an option to distinguish volatile writes in a custom way, by instrumenting them with a call to a function that we must implement. This option has been introduced to support KCSan, the kernel concurrency sanitizer of the Linux kernel. However, as KCSan is very different from TSan, volatile accesses are instrumented in a way that makes it difficult to handle volatile writes the way we would like. To explain it I will draw from an explanation by @fabbing from a while ago. While gcc/clang replace atomic read/write operations with TSan calls that update TSan's internal state and perform the actual memory operation themselves, volatile read/write operations are merely decorated with a TSan call, and then the actual operation is performed.

An example using an atomic read operation
_Atomic uint64_t a = 41;

__attribute__((noinline))
uint64_t an_atomic_read(void)
{
  return a;
}

Reading a is replaced with a call to __tsan_atomic64_load that notifies TSan of the read operation and returns the actual value.

__attribute__((noinline))
uint64_t an_atomic_read(void)
{
    117a:       53                      push   %rbx
    117b:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
    1180:       e8 bb fe ff ff          call   1040 <__tsan_func_entry@plt>
  return a;
    1185:       be 05 00 00 00          mov    $0x5,%esi
    118a:       48 8d 3d af 2e 00 00    lea    0x2eaf(%rip),%rdi        # 4040 <a>
    1191:       e8 ba fe ff ff          call   1050 <__tsan_atomic64_load@plt>
    1196:       48 89 c3                mov    %rax,%rbx
    1199:       e8 c2 fe ff ff          call   1060 <__tsan_func_exit@plt>
}
    119e:       48 89 d8                mov    %rbx,%rax
    11a1:       5b                      pop    %rbx
    11a2:       c3                      ret
An example using a volatile read operation
uint64_t v = 42;

__attribute__((noinline))
uint64_t a_volatile_read(void)
{
  return *(volatile uint64_t*)&v;
}

When reading v a call to __tsan_volatile_read8 (at 11b5) precedes the actual memory read operation (at 11ba).

__attribute__((noinline))
uint64_t a_volatile_read(void)
{
    11a3:       53                      push   %rbx
    11a4:       48 8b 7c 24 08          mov    0x8(%rsp),%rdi
    11a9:       e8 92 fe ff ff          call   1040 <__tsan_func_entry@plt>
  return *(volatile uint64_t*)&v;
    11ae:       48 8d 3d 83 2e 00 00    lea    0x2e83(%rip),%rdi        # 4038 <v>
    11b5:       e8 bf ff ff ff          call   1179 <__tsan_volatile_read8>
    11ba:       48 8b 1d 77 2e 00 00    mov    0x2e77(%rip),%rbx        # 4038 <v>
    11c1:       e8 9a fe ff ff          call   1060 <__tsan_func_exit@plt>
}
    11c6:       48 89 d8                mov    %rbx,%rax
    11c9:       5b                      pop    %rbx
    11ca:       c3                      ret

Similarly, atomic stores are instrumented by replacing the memory access, whereas volatile stores are only decorated.

Our current make-do solution is that __tsan_volatile_readN performs a dummy call to __tsan_atomic64_load, which is sufficient for correctness; and __tsan_volatile_writeN simply calls __tsan_write8, i.e., is treated as a plain write. The consequence is that (rare) false positives can arise, such as #12282.

Proposed change

I propose a new make-do solution: signal the write to TSan as a relaxed atomic write. Because TSan insists on actually performing the write, we first read the location using a relaxed load and use that value in the write, thus making sure that we merely write the current value (from the point of view of the thread). Example for 64-bit words:

CAMLreally_no_tsan void __tsan_volatile_write8(void *ptr)
{
  if (is_aligned(ptr, 8)) {
    uint64_t value =
      atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
    __tsan_atomic64_store(ptr, value, memory_order_relaxed);
  } else
    __tsan_write8(ptr);
}

This should remove the existing false positives caused by volatile writes being seen by TSan as plain stores. And indeed it makes the false positive noticed in weaktest_par_load in the CI (#12644 (comment)) disappear.

Correctness arguments

The three questions that arise upon such a change are: does it change the actual semantics of the program, defined as the set of possible execution traces (modulo TSan reports)? Does it introduce new false positives? Does it introduce false negatives, i.e., does it hide races?

Relaxed atomics do not imply any synchronizations between threads, so no races should be hidden by this. And atomics are not racy, so the change does not create new possibilities of false positives, either.

Finally, is the semantics of the program preserved? I believe so.

Attempt at a proof sketch

Disclaimer: I am no memory model expert, so this is only an amateur’s poor attempt at manipulating the tricky concepts of C11.

I think this fact can be derived from the following, more general statement: in a C program, inserting a relaxed load that yields v followed by a relaxed store of v into the same location l preserves the semantics. It is obviously the case inside the thread where the insertion occurs. Other threads do not see any “new” values for that location, because we wrote one of the previous values for that location.

More formally, let us show that the instructions inserted into thread A do not modify the semantics by showing that any trace in thread B that reads the value v in l is, in fact, already present in the initial program.

There are two possibilities: either there exists a synchronizes-with relation from A to B after the initial write of v in A, or not. If there isn’t, then nothing prevents B from reading v in l.

If there are such synchronizes-with relation, then the departing point of the first such synchronizes-with relation in A—let’s call it S— is either sequenced-before our inserted instructions in A, or after (indeed, our inserted instructions are relaxed atomics, and cannot be themselves synchronizing).

  • If S is sequenced-before the insertion point in A: then B can read v in l even in the initial program.
  • If S is sequenced-after the insertion point in A: then, either there is an event E in A that is sequenced-after the inserted instructions and before S, and that causes l to contain v'v from the point of view of A; or there isn’t. If there is no such event, then B can read v in l even in the initial program. If there is one, then our inserted instructions read and re-write the value v'v in l, such that B will see v' or a newer value. But this was already the case in the initial program.

@maranget
Copy link
Contributor

maranget commented Oct 20, 2023

Hi, I may be wrong but the suggested transformation may add behaviours. Consider the following tests consisting of three threads (x is a pointer to an int whose initial value is zer, all reads and writes are volatile reads and write whose semantics we consider to be the same as relaxed atomics)

C X

{}

P0 (atomic_int* x) {
  int r0 = atomic_load_explicit(x,memory_order_relaxed);
  int r1 = atomic_load_explicit(x,memory_order_relaxed);
}

P1 (atomic_int* x) {
  atomic_store_explicit(x,1,memory_order_relaxed);
}

P2 (atomic_int* x) {
  atomic_store_explicit(x,2,memory_order_relaxed);
}

exists (0:r0=2 /\ 0:r1=0)

The C memory model (at least the repaired C11 memory model) forbids the final state 0:r0=2; r1=0;, as can be checked with herd:

% herd7 -c11 -cat rc11.cat X.litmus
...
Observation X Never 0 12
...

Now, here is the transformed test:

C Y

{}

P0 (atomic_int* x) {
  int r0 = atomic_load_explicit(x,memory_order_relaxed);
  int r1 = atomic_load_explicit(x,memory_order_relaxed);
}

P1 (atomic_int* x) {
  int t1 = atomic_load_explicit(x,memory_order_relaxed);
  atomic_store_explicit(x,t1,memory_order_relaxed);
  atomic_store_explicit(x,1,memory_order_relaxed);
}

P2 (atomic_int* x) {
  int t2 = atomic_load_explicit(x,memory_order_relaxed);
  atomic_store_explicit(x,t1,memory_order_relaxed);
  atomic_store_explicit(x,2,memory_order_relaxed);
}

exists (0:r0=2 /\ 0:r1=0)

The final state 0:r0=2; r1=0; is now allowed, as can be checked with herd:

%  herd7 -c11  -cat rc11.cat Y.litmus
...
Observation Y Sometimes 2 208
..

Here is the diagram that justifies the behaviour:
Y
One can observe that Thread 0 performs two reads sequenced by the sb relation, the first read is the read of 2, the second read is more interesting : it reads zero. However this value is not coming from the initial write (top of image, label ix), but from Thread 1. This read is consistent withe the ordering of writes (co in blue) and thus allowed by the model.

@OlivierNicole
Copy link
Contributor Author

I half expected my proof to be false, but glad to have a definitive counter-example, thanks! I didn’t know (or had forgotten) about co.

I will therefore close this, as I do not see a way to repair it for now.

@gasche
Copy link
Member

gasche commented Oct 20, 2023

The counter-example of @maranget does not necessarily mean that we must give up on this approach, it only means that volatile reads (after the transformation) can have behaviors not allowed for relaxed reads in the C memory model.

My mental model of relaxed reads (which, I learned in @maranget's office this afternoon, is weaker than the spec in the current memory model) is that we can read any previous value. (This is what we get with non-atomic reads in the OCaml memory model for example.) In that weaker model, the transformation may be correct, at least the counter-example does not introduce a new behavior. For me that would be a good-enough model of volatile reads to program the runtime with.

@OlivierNicole
Copy link
Contributor Author

Intuitively, I am also tempted to say that only unreasonable programs would rely on this aspect of the spec. And I doubt that the runtime does.

For what it’s worth, I have the testsuite passing with --enable-tsan and this change.

@fabbing
Copy link
Contributor

fabbing commented Oct 24, 2023

I would like to suggest another attempt for this volatile write:

CAMLreally_no_tsan void __tsan_volatile_write8(void *ptr)
{
  const bool is_atomic = size <= sizeof(long long) && is_aligned(ptr, 8);
  if (is_atomic) {
    /* Signal a relaxed atomic store to TSan. In order not to change the
       semantics of the program, write the last seen value. As these are
       relaxed operations, no synchronization is introduced and the semantics
       over multiple threads should not change. */
    uint64_t value = atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
    while (!__tsan_atomic64_compare_exchange_weak((volatile uint64_t *)ptr, &value, value,
      memory_order_relaxed, memory_order_relaxed)) {}
  } else
    __tsan_write8(ptr);
}

We've tried to convince ourselves (with @OlivierNicole) that this is correct and tried to prove it with Herd7, but we couldn't (probably because our litmus wasn't correct: it never terminates).

What do you think of this proposal @maranget?

@hernanponcedeleon
Copy link

Since the test was causing some issues with herd, I tried it with another similar tool.

> cat litmus/C11/manual/TSan.litmus 
C TSan

{}

P0 (atomic_int* x) {
  int r0 = atomic_load_explicit(x, memory_order_relaxed);
  int r1 = atomic_load_explicit(x, memory_order_relaxed);
}

P1 (atomic_int* x, int* y) {
  *y = atomic_load_explicit(x, memory_order_relaxed);
  while(atomic_compare_exchange_weak_explicit(x, y, *y, memory_order_relaxed, memory_order_relaxed) == 0) {}
  atomic_store_explicit(x, 1, memory_order_relaxed);
}

P2 (atomic_int* x, int* a) {
  *a = atomic_load_explicit(x, memory_order_relaxed);
  while(atomic_compare_exchange_weak_explicit(x, a, *a, memory_order_relaxed, memory_order_relaxed) == 0) {}
  atomic_store_explicit(x, 2, memory_order_relaxed);
}

exists (0:r0=2 /\ 0:r1=0)
> java -jar dartagnan/target/dartagnan.jar --method=assume cat/rc11.cat litmus/C11/manual/TSan.litmus 
Condition exists (0:bv64 r0==bv64(2) && 0:bv64 r1==bv64(0))
No
Total verification time(ms): 393

The result is the same as the variant without the extra loads and whiles, so at least for the program that @maranget sent earlier, the transformation seems to be correct.

@OlivierNicole
Copy link
Contributor Author

Thank you @hernanponcedeleon for running this!

I can’t find another counterexample; but then, I have a poor track record of making predictions in C11. 😄

@gasche
Copy link
Member

gasche commented Nov 3, 2023

As a non-expert I find the code bewildering. It is much more complex than the original proposal, and my uninformed guess would be that the complexity hides the potential issues rather than removing them.

    uint64_t value = atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
    while (!__tsan_atomic64_compare_exchange_weak((volatile uint64_t *)ptr, &value, value,
      memory_order_relaxed, memory_order_relaxed)) {}

Naive questions:

  • For people who are not C atomics experts, what does compare_exchange_weak(ptr, &value, value) do? My guess is that it uses value as the expected value, and replaces it with value, returning false if the value at ptr has changed in the meantime.
  • Why do we have a while loop instead of a single exchange call?
  • Is it possible that the while loop would not terminate? In particular, if the value of ptr is different (because some other thread changed it and the relaxed load observed a stale value), when does the loop stop?

@anmolsahoo25
Copy link

Sorry for barging in the middle of the discussion, but I think translating a volatile write as a __tsan_atomic8_fetch_add should work.

As far as I understand from the discussion, we want to signal a relaxed atomic write to TSAN. The implementation for an AtomicRMW is at - https://github.com/llvm/llvm-project/blob/b14d3441a67939533df1b10cc456516c85a3386a/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp#L286

template <typename T, T (*F)(volatile T *v, T op)>
static T AtomicRMW(ThreadState *thr, uptr pc, volatile T *a, T v, morder mo) {
  MemoryAccess(thr, pc, (uptr)a, AccessSize<T>(), kAccessWrite | kAccessAtomic);
  if (LIKELY(mo == mo_relaxed))
    return F(a, v);

As we can see, the library only signals a write access to the location if memory order is mo_relaxed. Here is the implementation for fetch_add at https://github.com/llvm/llvm-project/blob/b14d3441a67939533df1b10cc456516c85a3386a/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp#L347 -

template<typename T>
static T AtomicFetchAdd(ThreadState *thr, uptr pc, volatile T *a, T v,
    morder mo) {
  return AtomicRMW<T, func_add>(thr, pc, a, v, mo);
}

This has the advantage of keeping the memory write as a single event in the TSAN memory order, thus not requiring any complex arguments about how splitting the operation still guarantees the same semantics.

@fabbing
Copy link
Contributor

fabbing commented Nov 6, 2023

As a non-expert I find the code bewildering. It is much more complex than the original proposal, and my uninformed guess would be that the complexity hides the potential issues rather than removing them.

It's actually exactly the same principle as @OlivierNicole suggested, but it correctly handles the case where another thread performs a write in the meantime, as reported by @maranget.

  • For people who are not C atomics experts, what does compare_exchange_weak(ptr, &value, value) do? My guess is that it uses value as the expected value, and replaces it with value, returning false if the value at ptr has changed in the meantime.

For more information on CAS in C my go-to is https://en.cppreference.com/w/c/atomic/atomic_compare_exchange

__tsan_atomic64_compare_exchange_weak tries to replace the value at ptr with the desired value value (the 3rd argument). If the current value of ptr is the expected one as the value pointed to by &value (the 2nd argument), it replaces it atomically and returns true.
In the case where it fails to replace the value at ptr because it's no longer the expected one, it updates the &value pointer with the current value of ptr. And so another explicit atomic_load to get the last value of *ptr isn't necessary, but the 2nd argument must be a pointer.

  • Why do we have a while loop instead of a single exchange call?

The while loop is here to handle the case pointed out by @maranget where a thread store to ptr was lost because __tsan_volatile_write would overwrite it with a stale load of that value.
We want __tsan_volatile_write to be a no-op for other threads, and not cause a new synchronisation. So the CAS is relaxed, and we make sure to overwrite *ptr with the last value of *ptr by checking if __tsan_atomic64_compare_exchange_weak was successful or not. If not, another thread has modified *ptr in the meantime, so we try again with the last known value (returned by the CAS).

  • Is it possible that the while loop would not terminate? In particular, if the value of ptr is different (because some other thread changed it and the relaxed load observed a stale value), when does the loop stop?

It would be possible for the while loop never to terminate if another thread was infinitely updating *ptr and always winning the race at writing it beforce the CAS finished. This is very unlikely.

@OlivierNicole
Copy link
Contributor Author

Sorry for barging in the middle of the discussion, but I think translating a volatile write as a __tsan_atomic8_fetch_add should work.
[…]
This has the advantage of keeping the memory write as a single event in the TSAN memory order, thus not requiring any complex arguments about how splitting the operation still guarantees the same semantics.

Yes, I believe that was @fabbing’s reasoning too, but it doesn’t hurt that you made it explicit.

Just a remark, to make sure we are on the same page: you showed that the TSan view of events is what we want it to be. That is a part of the correctness theorem that we want to ascertain, but not all of it. The other part is that the semantics of the instrumented program does not change (disregarding TSan reports). That is what @maranget disproved about my initial proposition.

@anmolsahoo25
Copy link

Not sure we're on the same page. I'm saying that instead of translating a volatile write as a relaxed read followed by a CAS loop, we translate it to an atomic_fetch_add operation.

This preserves the semantics - an atomic write gets mapped to a single operation that performs a relaxed reads and then writes the same value back to the memory location. Unlike the CAS loop version, this does not require reasoning across multiple memory events - its a 1:1 mapping which preserves TSAN semantics and program semantics.

@hernanponcedeleon
Copy link

This preserves the semantics - an atomic write gets mapped to a single operation that performs a relaxed reads and then writes the same value back to the memory location.

I might be missing something trivial here, but I think what you propose would not "write the same value", but rather the read value plus one. This is my understanding of what atomic_fetch_add does

atomic {
  r = load(x);
  store(x,r+1);
  return r;
}

@anmolsahoo25
Copy link

std::atomic_fetch_add takes an argument to add to the memory location. Passing that as zero guarantees its the same value -

https://en.cppreference.com/w/cpp/atomic/atomic_fetch_add

@hernanponcedeleon
Copy link

It was indeed something trivial I missed.

I still do not understand how what you propose gives same instruction semantics.

... performs a relaxed reads and then writes the same value back

It should not write the same value back, but rather the value that was specified. If we transform volatile x=1 to atomic_fetch_add(&x,0), then the code would be wrong in any state where x!=1.

@anmolsahoo25
Copy link

Now i think i'm missing something, I was trying to replicate this code -

CAMLreally_no_tsan void __tsan_volatile_write8(void *ptr)
{
  const bool is_atomic = size <= sizeof(long long) && is_aligned(ptr, 8);
  if (is_atomic) {
    /* Signal a relaxed atomic store to TSan. In order not to change the
       semantics of the program, write the last seen value. As these are
       relaxed operations, no synchronization is introduced and the semantics
       over multiple threads should not change. */
    uint64_t value = atomic_load_explicit((_Atomic uint64_t *)ptr, memory_order_relaxed);
    while (!__tsan_atomic64_compare_exchange_weak((volatile uint64_t *)ptr, &value, value,
      memory_order_relaxed, memory_order_relaxed)) {}
  } else
    __tsan_write8(ptr);
}

this code isn't writing the provided value but instead just reads the pointer and writes it. So i was suggesting replacing this code with an atomic_fetch_add

@OlivierNicole
Copy link
Contributor Author

Ah, I understand now. That is simpler, and I think correct.

I implemented this solution: it passes the testsuite, and @fabbing verified that it does not seem significantly slower than the current state of trunk.

@hernanponcedeleon
Copy link

If I correctly understand the current solution, it transforms this litmus test

C Original

{}

P0 (atomic_int* x) {
  int r0 = atomic_load_explicit(x, memory_order_relaxed);
  int r1 = atomic_load_explicit(x, memory_order_relaxed);
}

P1 (atomic_int* x) {
  atomic_store_explicit(x, 1, memory_order_relaxed);
}

P2 (atomic_int* x) {
  atomic_store_explicit(x, 2, memory_order_relaxed);
}

exists (0:r0=2 /\ 0:r1=0)

into this

C Transformed

{}

P0 (atomic_int* x) {
  int r0 = atomic_load_explicit(x, memory_order_relaxed);
  int r1 = atomic_load_explicit(x, memory_order_relaxed);
}

P1 (atomic_int* x) {
  int r0 = atomic_fetch_add_explicit(x, 0, memory_order_relaxed);
  atomic_store_explicit(x, 1, memory_order_relaxed);
}

P2 (atomic_int* x) {
  int r0 = atomic_fetch_add_explicit(x, 0, memory_order_relaxed);
  atomic_store_explicit(x, 2, memory_order_relaxed);
}

exists (0:r0=2 /\ 0:r1=0)

I wrote a simple litmus transformation pass implementing this and simulated the 137 tests from here under this c11 model and this rc11 model. In all cases, the results with and without the transformation match.

@gasche
Copy link
Member

gasche commented Nov 7, 2023

If I understand, the key difference with the earlier read-then-write proposal is that the read-write is atomic with those operators (exchange and fetch_add), and this atomicity helps avoid unpleasant orderings. If we atomically read something and write it again, we do have a no-op.

@maranget, do you have an opinion on this new proposal using a relaxed fetch-add?

@maranget
Copy link
Contributor

maranget commented Nov 8, 2023

Hi all, I'd tend to trust this last solution, especially after @hernanponcedeleon tests. Nitpicking, I'd rather see atomic_fetch_or as the "read-dont-modify-write" atomic primitive.

@xavierleroy
Copy link
Contributor

Nitpicking, I'd rather see atomic_fetch_or as the "read-dont-modify-write" atomic primitive.

Nitpicking^2: for real lock-free code, fetch-add is a hardware instruction in x86, while fetch-or is not and must be compiled as a CAS loop. But maybe it doesn't matter here, as the atomic operation is handled/simulated by TSAN anyway? (not sure).

@OlivierNicole
Copy link
Contributor Author

The TSan function does perform the atomic operation in all cases. For some reason, it is done using legacy builtins like __sync_fetch_and_add or __sync_fetch_and_or.

@OlivierNicole
Copy link
Contributor Author

A quick test on https://godbolt.org suggests that on x86 they are both compiled to

        lock            or      dword ptr [rsp - 64], 0

by a recent Clang, while GCC compiles __sync_fetch_or like Clang and __sync_fetch_add to

        lock            add      dword ptr [rsp - 64], 0

@xavierleroy
Copy link
Contributor

xavierleroy commented Nov 8, 2023

I tested on goldbolt, thank you very much. The lock or / lock add idiom doesn't provide you with the old value. (I suspect you discarded the return values.) That's why x86 has a special lock xadd instruction to handle fetch-and-add, but it doesn't have a matching instruction for fetch-and-or.

All this is getting seriously out of topic.

@OlivierNicole
Copy link
Contributor Author

I tested on goldbolt, thank you very much.

No problem! I wanted to check with the __sync builtins that I hadn’t encountered before. Apologies for the lock / xlock confusion.

I’m leaving the fetch_and_add for now.

OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 9, 2023
Replace with finer silencing where needed, although ocaml#12681 should
alleviate the need for this.
Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, I agree that the code does what it claims, and I am reassured by @maranget's intuition that the memory-model stuff works out. See minor nitpick comments/questions.

runtime/tsan.c Outdated
@@ -395,7 +390,15 @@ CAMLreally_no_tsan void __tsan_unaligned_volatile_read##size(void *ptr) \
} \
CAMLreally_no_tsan void __tsan_volatile_write##size(void *ptr) \
{ \
__tsan_write##size(ptr); \
const bool is_atomic = size <= sizeof(long long) && \
is_aligned(ptr, 8); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I find the indentation choice here confusing. I would expect you to either 2-align, or align on size, but the visual alignment of is_ looks intentional and carries no meaning that I can see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it didn’t make sense. Fixed in 4fd66b1.

runtime/tsan.c Outdated
DEFINE_TSAN_VOLATILE_READ_WRITE(16, 128);

/* We do not treat accesses to 128-bit values as atomic, since it is dubious
that they can be treated as such. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any volatile accesses to 128-bit values in the runtime? Where do they come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, according to my grepping there aren’t any. The 128-bit versions of these functions are there because, without them, building a C library for OCaml with TSan enabled will fail at linking with an unresolved symbol error if it contains volatile accesses to 128-bit values. So I figured it would be better to have 128-bit volatiles behave silently like plain 128-bit values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this explanation to the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gasche
Copy link
Member

gasche commented Nov 10, 2023

Good work as always, thanks to everyone involved (@OlivierNicole, @maranget, @fabbing, @hernanponcedeleon, @anmolsahoo25, @xavierleroy). I'm not a memory-model person and I could follow the change and the resulting code. This is good to merge when the CI agree.

@gasche gasche merged commit 560216c into ocaml:trunk Nov 10, 2023
10 checks passed
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 13, 2023
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 15, 2023
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 15, 2023
@OlivierNicole OlivierNicole deleted the tsan_fix_volatile_writes branch November 15, 2023 11:19
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 30, 2023
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Dec 13, 2023
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Dec 15, 2023
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Jan 8, 2024
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants