Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data races in minor_gc.c and caml_natdynlink_open #12737

Merged
merged 4 commits into from
Dec 14, 2023

Conversation

OlivierNicole
Copy link
Contributor

@OlivierNicole OlivierNicole commented Nov 13, 2023

This proposes to fix two data races reported in #11040.

The first race happens during minor GC, when promoting the values that are in the remembered set. This is a task done in parallel by all existing domains, with a static work sharing policy, to avoid situations where a domain has much more work to do, e.g. if it continually updates some globals with newly allocated data. In the section of caml_empty_minor_heap_promote that performs this task, a thread can read a value

oldify_one (&st, **r, *r);

(the problematic operation is the second plain read resulting from **r)

… that is being promoted by another thread (through a volatile write in oldify_one).

*p = v;

It suffices to make the read volatile as well: promotion to the major heap can only happen once, so even if the first thread reads an old value, when attempting to promote it oldify_one will detect that it has already been done and will simply update the value.

The second race is in caml_natdynlink_open: this primitive is accessing one of its CAMLlocal variables after calling caml_enter_blocking_section (it’s against the rules!) allowing this to happen even during a stop-the-world section. As a result, the GC can concurrently promote the value (by a direct write on the stack). To get rid of the race, one just needs to copy the value (which is cast using Int_val) to a local int.

This PR is best reviewed commit by commit.

(Edit: the rest of this description no longer applies and was moved to a separate PR: #12746)

It also tidies up a few things regarding TSan annotations:

  • Initially, we introduced CAMLreally_no_tsan as a complement to CAMLno_tsan. The idea was that CAMLno_tsan should be used to de-instrument functions that we don’t want instrumented with --enable-tsan, while CAMLreally_no_tsan de-instruments them in all cases, including when, e.g., -fsanitize=thread is passed through the CFLAGS. However, experience shows that it is vastly more convenient to chase data races in the runtime using --enable-tsan than by modifying CFLAGS. As a consequence, CAMLreally_no_tsan is not really relevant anymore. I replace the pair CAMLno_tsan / CAMLreally_no_tsan with CAMLno_tsan / CAMLno_tsan_for_perf. Functions marked CAMLno_tsan are never instrumented, whereas functions marked CAMLno_tsan_for_perf are not instrumented to save performance, except when TSAN_INSTRUMENT_ALL is defined. Defining TSAN_INSTRUMENT_ALL ensure maximum coverage when looking for data races in the runtime.
  • A number of TSan false positives related to volatile writes were removed by the merge of Fix TSan false positives due to volatile write handling #12681, and we no longer need to silence the corresponding reports.
  • Outdated TSan silencing annotations are removed, and 2 added, to match the state of the todo-list in ThreadSanitizer issues #11040.

@gasche
Copy link
Member

gasche commented Nov 13, 2023

I don't mind reviewing such changes, but i don't have the energy to review a medium-sized PR right now. I would prefer to receive smaller PRs for the changes that are independent from each other. Of course anyone else is welcome to review the PR as is.

@smuenzel
Copy link
Contributor

The second race is in caml_natdynlink_open: this primitive is accessing one of its CAMLlocal variables after calling caml_enter_blocking_section (it’s against the rules!) allowing this to happen even during a stop-the-world section. As a result, the GC can concurrently promote the value (by a direct write on the stack). To get rid of the race, one just needs to copy the value (which is cast using Int_val) to a local int.

I'm a little confused by this one. Is global actually an int? If so, the the gc doesn't affect it, so the original read is legal. If it is actually a block, then Int_val is not the right macro to use, . Am I missing something?

@OlivierNicole
Copy link
Contributor Author

I would prefer to receive smaller PRs for the changes that are independent from each other.

Sure, I will extract the annotations changes in a separate PR.

I'm a little confused by this one. Is global actually an int? If so, the the gc doesn't affect it, so the original read is legal. If it is actually a block, then Int_val is not the right macro to use, . Am I missing something?

In fact, even if the value is an integer, the GC does perform a write on it. It writes… the same value. See the line in minor_gc.c I was mentioning:

*p = v;

This line is executed when the value is not a minor block, which includes the case when it is an integer. I don’t know what the reason is for this, but well, it does result in a data race in this case.

@OlivierNicole
Copy link
Contributor Author

I removed the commits that don’t remove data races.

@smuenzel
Copy link
Contributor

smuenzel commented Nov 15, 2023

I don’t know what the reason is for this, but well, it does result in a data race in this case.

Interesting, I took a look. It looks like oldify_one is also used as a kind of set_field when oldifying a value recursively. So this line is necessary to oldify the contents of a block. A different fix would be not adding global to the local roots by omitting it from CAMLparam.

Edit: Not doing the write when it is unnecessary seems to be difficult, because of the many spots that oldify_one is called from, and the lack of information about the context of the call inside the function.

@@ -546,7 +545,7 @@ void caml_empty_minor_heap_promote(caml_domain_state* domain,

for( r = ref_start ; r < foreign_major_ref->ptr && r < ref_end ; r++ )
{
oldify_one (&st, **r, *r);
oldify_one (&st, *((volatile value *) *r), *r);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly that the problem is the read of the third argument and second argument could end up being different due to the race?
If that's right, then maybe it's clearer to use an intermediate value, .e.g.

value * pr = *r;
oldify_one (&st, *pr,pr);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not that the reads can end up being different: if *pr is a pointer and another thread has promoted it, reading the old value here is not an issue as it now points to a forwarding block, which will be detected by oldify_one. The problem is that the race is between a volatile write and a plain load, namely, *pr in your proposal. To avoid undefined behaviour, in principle, we would need to use relaxed atomic accesses. But we have decided that volatile is to be used as an equivalent of relaxed accesses in some cases in the runtime; hence my proposal of making the read volatile.

I agree that using an intermediate value is more legible. How about:

volatile value * pr = *r;
oldify_one (&st, *pr, pr);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks for the explanation.


/* TODO: dlclose in case of error... */

p = caml_stat_strdup_to_os(String_val(filename));
global_dup = Int_val(global);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this instead? It has the advantage of not introducing a dup variable, which we aren't doing elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It goes against Rule 1, but if it’s done elsewhere, or if people are fine with not strictly applying the rules in compiler-internal libraries, I won’t fight it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds fine indeed as the runtime follows relaxed rules. This could be strengthened with a CAMLassert(Is_long(global)) for readability in place of the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 42bcd53

@gasche
Copy link
Member

gasche commented Nov 16, 2023

I haven't had the time to look at this yet (and given the nice feedback, there is a chance that I don't have to!) but a meta-comment: whenever you are doing something non-obvious that could be easily be undone by someone meaning well, please have a comment explaining that you are doing it that way for a good reason -- and giving the reason, obviously. I suspect that both change sites meet this requires-an-explanation criterion.

@gadmm
Copy link
Contributor

gadmm commented Nov 16, 2023

I had a similar train of thought, in fact the documentation itself might not warn explicitly-enough against accessing GC roots during blocking sections even if the value is known to be an immediate. This looks innocuous and the danger is counterintuitive. Here is the relevant part of the documentation that could perhaps be made clearer:

After caml_release_runtime_system() was called and until caml_acquire_runtime_system() is called, the C code must not access any OCaml data, nor call any function of the run-time system, nor call back into OCaml code. Consequently, arguments provided by OCaml to the C primitive must be copied into C data structures before calling caml_release_runtime_system(), and results to be returned to OCaml must be encoded as OCaml values after caml_acquire_runtime_system() returns.

Similarly, mlvalues.h now contains typedef volatile value * value_ptr but few people know about it, it does not appear in the documentation yet.

Not saying that it is the role of anyone in particular to do this improvement; just thinking out loud.

@@ -546,7 +545,8 @@ void caml_empty_minor_heap_promote(caml_domain_state* domain,

for( r = ref_start ; r < foreign_major_ref->ptr && r < ref_end ; r++ )
{
oldify_one (&st, **r, *r);
volatile value* pr = *r;
oldify_one (&st, *pr, pr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is value_ptr which could be used here to make the replacement more obvious.

In fact, I wonder if it is not a case where value_ptr should replace value * further in the code base.

e.g. could it be that in minor_gc.h, the following:

struct caml_ref_table CAML_TABLE_STRUCT(value *);

should be:

struct caml_ref_table CAML_TABLE_STRUCT(value_ptr);

or something like that (with all the changes throughout the codebase that this would require)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And, in this case, would it be a sign that we should inspect all the uses of value* in the runtime to see if they should be replaced with value_ptr? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Often the elements of a caml_table_ref are manipulated without any risk of conflicting access, e.g., outside of minor collection remembered sets are domain-local. Making all these elements volatile would remove opportunities for compiler optimization for no good reason.

(And, in this case, would it be a sign that we should inspect all the uses of value* in the runtime to see if they should be replaced with value_ptr? :)

I haven’t had a response on #12512, except from @gasche kindly trying to unstall the PR, so this auditing process is paused as far as I’m concerned.

@gasche
Copy link
Member

gasche commented Nov 16, 2023

The proposed fix for caml_dynlink_open is perplexing to me. The current fix (suggested by @smuenzel) is written as if it was a mistake to register an immediate as a root and then use it in a blocking section. But if this is the case, we are in trouble, because there is probably a lot of code around doing that it all needs to be fixed, right? Can we think of another fix for this category of race that preserves this idiom? (Maybe we cannot, really people should either skip root-registration or not access in blocking sections. But then we have to document this carefully and think about how to detect the issue in existing code.)

I think that this is what @gadmm was saying above, but I would say it more explicitly: I think that the mental model of people before was that non-immediate roots could be moved around during blocking sections and were thus unsafe, and that this PR is in the process of strengthening the specification (not just clarifying it). I would argue that "not access any OCaml data" in the previous paragraph was previously understood to mean any OCaml block, not all values of value type.

If we must change the specification, we must. The new, stronger specification can be justified. But do we really must?

@OlivierNicole
Copy link
Contributor Author

We can modify oldify_one to return early without writing if the value passed is not a block. Although I have to mention, I haven’t this kind of race anywhere else than caml_natdynlink_open when running the test suite. For code in the wild, it’s hard to say. I agree that it’s tempting for expert users to reason “it’s an immediate, I can use it as I please”.

@xavierleroy
Copy link
Contributor

The current fix (suggested by @smuenzel) is written as if it was a mistake to register an immediate as a root and then use it in a blocking section. But if this is the case, we are in trouble, because there is probably a lot of code around doing that it all needs to be fixed, right?

Registering (as a root) a value that is known to be a tagged integer is a no-op: the tagged integer will not change anything to the set of blocks traversed by the GC, and the GC will not change the value of the tagged integer. So, it's not wrong to do it, but it's not wrong to not do it.

At any rate, the only recommended idiom around blocking sections is to extract C values from OCaml values before entering the blocking section. (That's what the manual tries to say.) Whether these values are tagged integers or not is better ignored. (Why make confusing special cases?)

@gasche
Copy link
Member

gasche commented Nov 17, 2023

Notes:

  • I am hesitant about an is_immediate check in oldify_one: should we worry about a performance impact?
  • I wondered if we should add an is_immediate check at local root registration time (and simply ignore immediates), but this would not be correct, as the value variable could be mutated later to become a block.

@OlivierNicole
Copy link
Contributor Author

  • I am hesitant about an is_immediate check in oldify_one: should we worry about a performance impact?

Well, technically there is already one

if (!(Is_block(v) && Is_young(v))) {
. It would only be a matter of taking it out into a standalone early-return case. I will not venture to make a prediction about performance impacts.

@gadmm
Copy link
Contributor

gadmm commented Nov 17, 2023

AFAIU this was already a data race in OCaml 4 with systhreads, which is why I was being less assertive than @gasche. But if such code exists in the wild, some could fall on the theoretically-UB-but-accidentally-correct side of the spectrum.

The code does not look like it has been micro-optimised enough for performance to be a factor (e.g. with my limited experience, it is wasteful to optimise for branch prediction if it has not even been optimised for data accesses).

Is it clear though that the assignment *p = v has no useful effect in the few cases where oldify_one is not called with redundant arguments *p and p?

OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Nov 17, 2023
@OlivierNicole
Copy link
Contributor Author

I checked, and there are no such cases.

@OlivierNicole
Copy link
Contributor Author

Is it clear though that the assignment *p = v has no useful effect in the few cases where oldify_one is not called with redundant arguments *p and p?

I checked, and there are no such cases.

I was wrong, actually: I hadn’t considered the recursive tail calls in oldify_one (performed through goto) performed on blocks of size 1 or when short-circuiting a lazy block. In those cases, the write instruction does perform useful work, namely copying the unique field of the block from the minor heap to its new location in the major heap, or, in the case of the lazy block, perform the actual shortcutting.

I see two solutions:

  • Moving this conditional write-and-return from the beginning of oldify_one to before each of the three tail calls. It adds a few lines of code but gets rid of the data race.
  • Documenting that accessing any values, even immediates, when the runtime lock is released is a programming error.

To make the discussion more concrete, I implemented the first solution in the last two commits 2238a6f and 38f8ee7.

@gadmm
Copy link
Contributor

gadmm commented Nov 28, 2023

One would also have to adapt in the same way all other scanning actions (beyond oldify_one) to make the immediate special-case sound, and document the constraint for maintainers and users of the scanning hooks… Or do I miss something?

Unless it is clear that some programs will be fixed in this way, the second option sounds simpler, doesn't it?

@OlivierNicole
Copy link
Contributor Author

Well, other scanning need to be checked, but they may be sound in this regard. Regarding scanning hooks, I’m missing some context. In the compiler code, they only seem to be used by systhreads.

@gadmm
Copy link
Contributor

gadmm commented Nov 29, 2023

Sorry, there is nothing to do for users of scanning hooks.

@OlivierNicole
Copy link
Contributor Author

Well, I do agree that they can be used in racy ways. I was just wondering whether they were exposed and/or ever used outside of systhreads.

@gadmm
Copy link
Contributor

gadmm commented Nov 29, 2023

Yes, for instance by coq and boxroot. But I think here we only need to check the various scanning actions; 3rd party code do not define new scanning actions.

@OlivierNicole
Copy link
Contributor Author

Could you point me to where these hooks are defined?

@OlivierNicole
Copy link
Contributor Author

Thanks! I thus reverted to the initial fix for the Dynlink “race”.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this decision on the design, I looked at the implementation again and it is still fine. The extra implementation comments are warmly welcome.

@OlivierNicole could you squash together the two commits that are only about the intermediate variable? (Otherwise the history is fine.)

@gasche
Copy link
Member

gasche commented Dec 14, 2023

(Are you intending to write a Changes entry?)

@OlivierNicole
Copy link
Contributor Author

Thanks, updated Changes and squashed.

@gasche
Copy link
Member

gasche commented Dec 14, 2023

(The PR complains about conflicts.)

Remove a data race between a `volatile` write in `oldify_one` and a
plain read in `caml_empty_minor_heap_promote`. (The racing read was the
second dereference that is performed when `**r` is evaluated.) This is a
real race but it suffices to make the read `volatile` in this case.
This race was due to the C primitive `caml_natdynlink_open` accessing a
local OCaml `value` after calling `caml_enter_blocking_section()`.
@gasche gasche merged commit 784fe56 into ocaml:trunk Dec 14, 2023
9 checks passed
OlivierNicole added a commit to OlivierNicole/ocaml that referenced this pull request Dec 15, 2023
@OlivierNicole OlivierNicole mentioned this pull request Dec 15, 2023
20 tasks
@OlivierNicole OlivierNicole deleted the fix_data_races branch December 20, 2023 16:06
@OlivierNicole
Copy link
Contributor Author

Not that it changes anything, but for the record, @jmid just encountered the “code smell” of using an immediate in a “blocking section” without a copy outside of the compiler repo. And it’s in Dune:

https://github.com/ocaml/dune/blob/d19d92823c15c5c61f663076f0b50826a7d4e8ef/otherlibs/stdune/src/wait4_stubs.c#L61-L65

(TSan was triggered)

emillon added a commit to emillon/dune that referenced this pull request May 21, 2024
Fixes ocaml#10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request May 21, 2024
Fixes ocaml#10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/ocaml that referenced this pull request May 22, 2024
As discussed in ocaml#12737, using `Int_val` inside blocking sections can
cause data races and is now seen as a bad idea.

(this causes a TSAN warning when using Dune, see ocaml/dune#10554)
emillon added a commit to emillon/ocaml that referenced this pull request May 22, 2024
As discussed in ocaml#12737, using `Int_val` inside blocking sections can
cause data races and is now seen as a bad idea.

(this causes a TSAN warning when using Dune, see ocaml/dune#10554)
emillon added a commit to ocaml/dune that referenced this pull request May 23, 2024
Fixes #10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request May 24, 2024
Fixes ocaml#10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/dune that referenced this pull request May 24, 2024
Fixes ocaml#10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to ocaml/dune that referenced this pull request May 24, 2024
Fixes #10553

Quoting @jmid, using a local variable without the runtime lock in place,
is against the rules. For integer values, sometimes the rules are bent,
but this is not a good idea. See ocaml/ocaml#12737.

Signed-off-by: Etienne Millon <me@emillon.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants