New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix datarace between oldify and allocation #12894
Conversation
a71d2f1
to
b968e7e
Compare
Noob question: for me happens-before is a binary relation (a memory event happens before/after another). What does |
TSAN simulates a release barrier on encountering |
You are pointing at the TSAN runtime source code. Are those operations documented anywhere (else)? |
There is a paper on TSAN which mentions about the annotions. |
For context, these two race reports are from #11040 (in fact, the last remaining) and @Johan511 has worked with Tarides to help us understand their origin and fix them. This is one of the cases where there is a race according to the C11 memory model, but not according to a more practical one like, e.g., the Linux kernel memory model (LKMM). The situation is similar to publication safety of OCaml values, which is justified by arguments outside of the scope of C11.
TSan’s API has close to no documentation at all, unfortunately. But, having practiced reading its source code to understand what it does, I can confirm what @Johan511 says: these two annotations behave like a release store – acquire load pair. |
I trust @Johan511's expertise and his description of the situation, but I would have liked to have some documentation to reference in the PR source code itself for these obscure operations. (Also, from just the source we have no information about whether these operations will remain available in the future. There is a possibility that the TSan mode of past OCaml releases would become unusable with more recent clang versions due to a change to this undocumented stuff.) |
That is true, but since nothing is documented, the same can be said of all the functions in the TSan API; including the ones we use at the assembly level to instrument memory accesses. |
That being said, I share your concern. On the other hand I am reassured by the fact that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me, however I think this PR should also remove the “no tsan” attribute on pool_initialize
and caml_shared_try_alloc
, since they are no longer needed:
diff --git a/runtime/shared_heap.c b/runtime/shared_heap.c
index 7d13dcd287..6031a98a01 100644
--- a/runtime/shared_heap.c
+++ b/runtime/shared_heap.c
@@ -253,7 +253,6 @@ static void calc_pool_stats(pool* a, sizeclass sz, struct heap_stats* s)
}
/* Initialize a pool and its object freelist */
-CAMLno_tsan /* Disable TSan reports from this function (see #11040) */
Caml_inline void pool_initialize(pool* r,
sizeclass sz,
caml_domain_state* owner)
@@ -426,7 +425,6 @@ static void* large_allocate(struct caml_heap_state* local, mlsize_t sz) {
return (char*)a + LARGE_ALLOC_HEADER_SZ;
}
-CAMLno_tsan /* Disable TSan reports from this function (see #11040) */
value* caml_shared_try_alloc(struct caml_heap_state* local, mlsize_t wosize,
tag_t tag, reserved_t reserved)
{
@Johan511 could you clean up the history a bit? In principle I don't mind the "change to easily trigger the bug then revert that" trick, but in practice your change is rather invasive (it's a full revert of a PR, not just tweaking the default size constant). I think you could either drop these commits or simplify them to make them smaller. |
@Johan511 I happen to have done this work of cleaning the history in order to test your PR together with some CI changes I am working on. The cleaned history is on this branch https://github.com/OlivierNicole/ocaml/commits/pr12894-rebased/ if that saves you time. |
509064e
to
2a977aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AnnotateHappensXXX
names are currently in use by llvm's own sources, so one can reasonably expect that they won't disappear anytime soon (see llvm/include/llvm/Support/{Compiler,Threading}.h
for their use)
With apologies for being dense, now that I sort of understand what those annotations are doing, I would like to better understand why we are using them in this PR. The explanation of @Johan511 is that the code in As a non-memory model expert, it is not clear why we want to teach TSAN that there is a barrier / happens-before relashionship here without inserting such a barrier. Is this a case of address dependency, on what? Could this reasoning be explained in documentation comments around the place where the annotations are inserted? For the code of OCaml mutators (the code that corresponds to source code, not to the concurrent functioning of the GC), our reasoning is that not having a barrier on initializing writes is fine because we have a barrier on non-initializing writes ( Remark: how can it happen that two hardware threads allocate and mark the same shared heap? I thought that shared heaps are domain-local. The reports suggest a race between the domain domain thread and its backup thread, isn't there enough synchronization between those to rule this out? Remark: the explanation of @Johan511 mentions a race between oldify and allocation, but the traces that I see in his post correspond to a race between marking the shared heap and initializing the first header word of a new shared heap pool. Are these considered the same thing? |
The same reasoning applies here:
|
Followup naive question: if the barrier on non-initializing mutations is enough to avoid races here, why does TSan not notice this? Is it missing an annotation or TSan function call on the write that publishes the value in this example? (I feel that we are adding annotations here that have a delicate justification, to composate for writes that use barriers in a way that TSan does not recognize. It would be much easier to justify adding an annotation on those writes, right?) |
@gasche Your questions are perfectly legitimate and with @Johan511, it took us a while to arrive to a satisfactory answer. Sorry if this post is a bit lengthy. Allow me to answer first to:
Short version: the major heap pool is owned by a domain but can be marked by other domains. The backup thread is a red herring. There are two distinct causes to the lack of synchronization seen by TSan, as I explain below. To be more concrete, here is what happened during that execution:
I agree with you that the title of the PR should be changed: the “race” does not happen between oldify and allocation but rather between allocation and marking. As @stedolan says the race is not a race in practice thanks to address dependencies. I would add another ingredient, namely the CAS / release store (pick any) that happens after between the allocation of the promoted value and its publication. Here is an idealized view of what happens. To simplify things, I’m pretending that the promoted value is itself a global root. In reality, it is only reachable from the global roots, but I believe that the reasoning still holds (just replace “address dependency” by “chain of address dependencies”). /* These values are inintially published to all threads */
value v_in_minor0 = Val_op(caml_alloc_small(6, Custom_tag));
value *global_root = Op_val(v_in_minor0);
P0() { /* Domain 0 */
/* Beginning of do_some_marking */
/* Follow the global root... */
value block =
Val_op(atomic_load_explicit(&global_root, memory_order_relaxed));
header_t hd = Hd_val(block); /* <-- race? */
/* ... rest of do_some_marking */
}
P1() { /* Domain 1 */
/* Beginning of oldify_one */
/* Beginning of caml_shared_try_alloc */
header_t *new_v_hp = &pool_start[count_allocated];
Hd_hp(new_v_hp) = 0; /* <-- race? */
/* and then later... */
Hd_hp(new_v_hp) = Make_header(...); /* <-- race? */
value *new_v = Val_hp(new_v_hp);
/* End of caml_shared_try_alloc */
for(int i = 0; i < 6; i++) {
Field(new_v, i) = Field(v_in_minor0, i);
}
/* Beginning of try_update_object_header */
header_t hd = atomic_load(Hp_atomic_val(v_in_minor0));
header_t desired_hd = In_progress_update_val;
if( atomic_compare_exchange_strong(Hp_atomic_val(v_in_minor0), &hd, desired_hd) ) {
/* Success. Now we can write the forwarding pointer. */
atomic_store_relaxed(Op_atomic_val(v_in_minor0), new_v);
/* And update header ('release' ensures after update of fwd pointer) */
atomic_store_release(Hp_atomic_val(v_in_minor0), 0);
} else {
/* Updated by another domain. Spin for that update to complete and
then throw away the result and use the one from the other domain. */
spin_on_header(v_in_minor0);
new_v = Field(v_in_minor0, 0);
}
atomic_store_explicit(&global_root, new_v, memory_order_relaxed);
/* End of try_update_object_header */
/* End of oldify_one */
} TSan reports a race between the two initializing stores in P0 and the header read in P1. There is no race in practice by the reasoning that:
Of those three happens-before edges, TSan does not see nos. 2 and 3.
I also think that a comment should be added near the annotations with a justification.
Because we purposefully did not instrument initializing writes as we anticipated precisely this kind of spurious reports. Only, we missed this spot.
We could de-instrument the two initializing writes involved. But it’s not a strong opinion. And I admit that de-instrumenting all initializing writes would be more coherent. |
Sorry for being unclear, I was not suggesting to de-instrument the initializing writes, but rather to ensure that TSan sees the barrier in the modifying writes. (Possibly by using one of those Happens{Before,After} annotation there.) |
If you suggest to use annotations to make TSan take into account the happens-before relation no. 2 in my description, that’s possible, but with that alone, no. 3 (the hb stemming from address dependencies) would be missing and the ordering between the accesses would not be established, I fear. |
fe24ea2
to
b447e93
Compare
At this point I am convinced that this PR is a good approach to fix the false report by TSan, but I would enjoy seeing a bit of documentation (as source comments), at some or all of the places where an annotation is added, on why the annotation is helpful/necessary and correct there. Surely the very detailed explanations of @OlivierNicole (thanks!) can be condensed in a few sentences that would help code readers have not all the details, but at least a sense of what they are looking at. (You should also feel free to include a link to the present PR for the full details.) |
Changes
Outdated
- #11040: Silences false data race observed between caml_shared_try_alloc | ||
and oldify. Introduces Macros to call tsan annotations which help annotate | ||
a happens before relatioship |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- #11040: Silences false data race observed between caml_shared_try_alloc | |
and oldify. Introduces Macros to call tsan annotations which help annotate | |
a happens before relatioship | |
- #11040: Silences false data race observed between caml_shared_try_alloc | |
and oldify. Introduces macros to call tsan annotations which help annotate | |
a ``happens before'' relationship. |
b447e93
to
b6cf6e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update. I am overall happy with what the comments say, but below are suggestions regarding typos, and other small things.
runtime/caml/tsan.h
Outdated
/* TSAN annotates a release operation on encountering ANNOTATE_HAPPENS_BEFORE | ||
* and similarly an acquire operation on encountering ANNOTATE_HAPPENS_AFTER */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* TSAN annotates a release operation on encountering ANNOTATE_HAPPENS_BEFORE | |
* and similarly an acquire operation on encountering ANNOTATE_HAPPENS_AFTER */ | |
/* TSan records a release operation on encountering ANNOTATE_HAPPENS_BEFORE | |
* and similarly an acquire operation on encountering ANNOTATE_HAPPENS_AFTER. | |
These annotations are used to eliminate false positives. */ |
“TSan records an operation” seems more accurate to me than “TSan annotates an operation”. Also minor: TSAN -> TSan.
runtime/major_gc.c
Outdated
* Changes here should probably be reflected in do_some_marking. */ | ||
/* Annotating a acquire barrier on `p` because TSAN does not realise | ||
* a happens-before relatioship established due to address dependencies | ||
* with the read in shared_heap.c allocation (#12894) */ | ||
CAML_TSAN_ANNOTATE_HAPPENS_AFTER(Hp_val(child)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Changes here should probably be reflected in do_some_marking. */ | |
/* Annotating a acquire barrier on `p` because TSAN does not realise | |
* a happens-before relatioship established due to address dependencies | |
* with the read in shared_heap.c allocation (#12894) */ | |
CAML_TSAN_ANNOTATE_HAPPENS_AFTER(Hp_val(child)); | |
* Changes here should probably be reflected in do_some_marking. */ | |
/* Annotating an acquire barrier on the header because TSan does not see the | |
* happens-before relationship established by address dependencies with | |
* initializing writes in shared_heap.c allocation (#12894) */ | |
CAML_TSAN_ANNOTATE_HAPPENS_AFTER(Hp_val(child)); |
The comment refers to p
which is probably an unintentional leftover from copy-pasting.
I suggest to add a blank line to separate from the previous unrelated comment. The above also suggests a few minor word tweaks that make the sentence clearer IMO. (And it’s not a read in shared_heap.c, it’s two writes)
runtime/major_gc.c
Outdated
/* Annotating a acquire barrier on `p` because TSAN does not realise | ||
* a happens-before relatioship established due to address dependencies | ||
* with the read in shared_heap.c allocation (#12894) */ | ||
CAML_TSAN_ANNOTATE_HAPPENS_AFTER(Hp_val(block)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remarks here.
runtime/shared_heap.c
Outdated
/* Annotating a release barrier on `p` because TSAN does not realise | ||
* a happens-before relatioship established due to address dependencies | ||
* with the read in major_gc.c marking (#12894) */ | ||
CAML_TSAN_ANNOTATE_HAPPENS_BEFORE(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Annotating a release barrier on `p` because TSAN does not realise | |
* a happens-before relatioship established due to address dependencies | |
* with the read in major_gc.c marking (#12894) */ | |
CAML_TSAN_ANNOTATE_HAPPENS_BEFORE(p); | |
/* Annotating a release barrier on `p` because TSan does not see the | |
* happens-before relationship established by address dependencies | |
* between the initializing writes here and the read in major_gc.c | |
* marking (#12894) */ | |
CAML_TSAN_ANNOTATE_HAPPENS_BEFORE(p); |
@Johan511 By the way @dustanddreams’s suggested changes were applied only partially and not quite faithfully—the quotes around |
b6cf6e0
to
83fe7e7
Compare
I am planning to merge this (if the CI agrees) and also include in 5.2 -- which we want to have as TSan-clean as easily possible. |
@gasche Could you add the run-thread-sanitizer label? |
83fe7e7
to
388eb25
Compare
Changes
Outdated
- #11040: Silences false data race observed between caml_shared_try_alloc | ||
and oldify. Introduces macros to call tsan annotations which help annotate | ||
a ``happens before'' relationship. | ||
(Olivier Nicole, Hari Hara Naveen S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If you are still amending the PR, I think you could list me as a reviewer in the Changes entry.)
@gasche I fixed some code hygiene changes which were reported, can you please approve the workflows once more? |
388eb25
to
a284d62
Compare
a284d62
to
0202a53
Compare
Fix datarace between oldify and allocation (cherry picked from commit 6f5fe0e)
For the record, you might remember me mentioning that TSan on riscv64 becoming noticeably slower recently. I have tracked this slowdown to this PR, which is not a surprise since it causes I am nevertheless surprised by the particular slowdown on risvc64. Measuring the time used to compile OCaml itself (thus a bit of C compile and a lot of OCaml compile), it causes an increase of the total time of about 10% on arm64, but 65% (!) on riscv64. It would be interesting to figure out why such a large difference. |
|
Intriguing. @dustanddreams a simple way to test your hypothesis would be to un-instrument these functions again and see if the overhead goes away. For the record, we now have |
Interestingly, the overhead does not come from the instrumentation of these two function, but from the |
There had been 2 dataraces reported in
parallel/multicore_systhreads .ml
andtest_issue_11094.ml
between oldify and allocation of the object being oldified.TSAN logs reported are :
parallel/multicore_systhreads.ml
test_issue_11094.ml
Code can be fetched using
To replicate the datarace
(datarace is much harder to trigger in
test_issue_11094.ml
)The datarace is reported because TSAN is unable to realise a happens before (hb) relatioship between the initialising write to a header and the read from it.
The fix uses TSAN's AnnotateHappensBefore and AnnotateHappensAfter functions is annotate the hb relatioship