Skip to content

Conversation

@OlivierNicole
Copy link
Contributor

This fixes #12916. As reminded by @stedolan, a sweeping thread is allowed to read a header while a marking concurrently writes to it: the marking thread can only make UNMARKED objects MARKED, both being treated identically by the sweeping code. All it takes is therefore to mark the header read as a relaxed atomic load.

@OlivierNicole OlivierNicole force-pushed the fix-race-marking-sweeping branch from 62fd1af to 0ae2802 Compare January 24, 2024 11:33
@gasche
Copy link
Member

gasche commented Jan 24, 2024

At the risk of sounding like a broken record :-), I would enjoy seeing more comments in the code to explain the subtle reasoning that is going on.

Naive questions:

  • Should we also mark the write relaxed on the marking end?
  • Who else reads or writes object readers? (I suppose: OCaml code (through direct reads?), and allocation code in the runtime.) Should the other readers be changed in the same way?
  • In the memory-model recommendation we recommend using C11 atomics for memory that is only touched by the runtime, not the mutators. If I understand correctly, we are following a different approach here. Would using volatile (which iirc would be closer to the recommendation) not work? Is there something interesting to explain about the difference?

@OlivierNicole OlivierNicole force-pushed the fix-race-marking-sweeping branch 2 times, most recently from 75ad442 to af1519d Compare January 24, 2024 13:42
@OlivierNicole
Copy link
Contributor Author

I added a justification comment and used the Hd_hp macro (which uses (volatile *)) rather than C11 atomics.

  • Should we also mark the write relaxed on the marking end?

No need, it already is:

ocaml/runtime/major_gc.c

Lines 1038 to 1040 in a682d51

atomic_store_relaxed(
Hp_atomic_val(block),
With_status_hd(hd, caml_global_heap_state.MARKED));

  • Who else reads or writes object readers? (I suppose: OCaml code (through direct reads?), and allocation code in the runtime.) Should the other readers be changed in the same way?

Plain reads from C without synchronization are a bug, this is one of them but there shouldn’t be many left. Header reads from OCaml are compiled to assembly in a way that should not cause problems. (They are intended to behave at the very least like relaxed loads, see #10995)

  • In the memory-model recommendation we recommend using C11 atomics for memory that is only touched by the runtime, not the mutators. If I understand correctly, we are following a different approach here. Would using volatile (which iirc would be closer to the recommendation) not work? Is there something interesting to explain about the difference?

I changed to use volatile through a macro.

I remark that our current mixed memory model guidelines are in fact not respected by bits of the runtime (e.g. GC marking) that predate it. Although it is probably a separate discussion, this makes me a bit unsatisfied about these guidelines, as:

  • it would feel absurd to me to spend time porting existing runtime code from relaxed atomics to volatile;
  • In the justifications given for the guidelines there is no reason to favor volatile over relaxed atomics, except when using a legacy macro.

For this reason I am tempted to defend that we should amend these guidelines and either: use volatile and relaxed atomics indifferently in the runtime, or rather favor relaxed atomics for newly written code, except when there is a macro for what we want to do and that macro uses volatile.

@gasche
Copy link
Member

gasche commented Jan 24, 2024

I like when there is a way for the authors of the runtime to express their intent: "I know that all accessors of this location respect the C11 memory model" vs. "This is a mixed-model location where different reasoning applies".

Currently we use volatile for compatibility reasons, and also to signal that we are in the second category. We could change to a different style, but I wish that it would still allow expressing this intent. Maybe we could have a macro or function for the second case that is defined in terms of relaxed (if we want)? But wouldn't using relaxed atomics there create incompatibility issues in practice when using the volatile macros? (Or maybe the second-case functions should take a volatile input and cast it into an atomic to perform a relaxed access? The worst of both worlds, happily.)

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the new change -- the code remains simple, and there is an explanatory comment about the subtle thing going on.

I approve to mark this, but of course I'm not competent to vet the memory model questions, so I think it would be helpful if for example @stedolan or @gadmm could say whether they agree that the code change goes in the right direction.

@gasche
Copy link
Member

gasche commented Jan 24, 2024

(If no one has time to comment then I'll just merge, because this is a not-incorrect change that silences an incorrect TSan alarm.)

@OlivierNicole OlivierNicole force-pushed the fix-race-marking-sweeping branch from af1519d to 543b253 Compare January 24, 2024 15:27
@OlivierNicole
Copy link
Contributor Author

The advantage of expressing programmer intent is a good point. It looks to me like relaxed atomics present some tiny advantages (more guarantees to not break in a distant future on a new compiler that doesn’t behave like the existing ones wrt volatile, probably some slight performance gains by not inhibiting hoisting out of loops etc.) so it would make sense to favor them when possible, but I agree that it should be behind opaque macros (like Linux’s READ_ONCE and WRITE_ONCE) that can be updated if need be. (Like, if a new crazy architecture like the old Alpha appears.) Maybe I will think more about this in the future.

I have updated the changelog.

@OlivierNicole OlivierNicole force-pushed the fix-race-marking-sweeping branch from 543b253 to 60e1644 Compare January 25, 2024 13:04
@OlivierNicole
Copy link
Contributor Author

I included the changes from the very similar #12939 into this PR.

@OlivierNicole
Copy link
Contributor Author

I approve to mark this, but of course I'm not competent to vet the memory model questions, so I think it would be helpful if for example @stedolan or @gadmm could say whether they agree that the code change goes in the right direction.

I am not against asking for an expert opinion, of course, but I am curious: which part of the change are you did not convince you in terms of correctness?

@gasche
Copy link
Member

gasche commented Jan 25, 2024

I am not worried about correctness, but rather about the fact that the code is or isn't "right": principled, justifiable, and thus easier to understand and evolve. @stedolan recommended relaxed atomics and we end up using volatile instead, my feeling is that this is "right" but I would prefer an approval from someone who actually understands the underlying questions. If no one is available, I'll fly blind and merge.

@damiendoligez
Copy link
Member

I'm not nearly as kowledgeable as @stedolan on these matters, but the change to Hd_hp looks nice: this macro is the canonical way of reading a header when marking may change it concurrently (mostly: all the time).

A sweeping thread is allowed to read a header while a marking
concurrently writes to it: the marking thread can only make UNMARKED
objects MARKED, both being treated identically by the sweeping code.
`verify_pool` is a function used to check the invariant that a major
pool should contain no garbage after sweeping. It is obviously allowed
to race with marking (which can only turn UNMARKED objects into MARKED
or NOT_MARKABLE, but not into GARBAGE), we only need to mark the read as
a relaxed atomic read.
The debug runtime should be exercised by the TSan CI as it can be more
efficient at revealing some data races. Since the test
`regression/pr9853/compaction_corner_case.ml` takes more than 30 minutes
with the debug runtime, this commit also skips it when TSan is enabled.
@OlivierNicole OlivierNicole force-pushed the fix-race-marking-sweeping branch from 60e1644 to 7bb5cae Compare February 2, 2024 17:46
@OlivierNicole
Copy link
Contributor Author

I applied @damiendoligez's suggestion, and also took the occasion to fix the test regression/pr9853/compaction_corner_case.ml, which due to a small mistake in #12907 was no longer running anything.

@OlivierNicole
Copy link
Contributor Author

The macOS job was cancelled without any error message. I don’t really know how to fix that 🤔

@gasche gasche merged commit 33eae42 into ocaml:trunk Feb 2, 2024
gasche added a commit that referenced this pull request Feb 2, 2024
Fix data race between marking and sweeping

(cherry picked from commit 33eae42)
@OlivierNicole
Copy link
Contributor Author

@Octachron Should this be considered for cherry-picking into 5.2?

@gasche
Copy link
Member

gasche commented Feb 5, 2024

I did, it's listed above ( 4b92d4a ), sorry for not being explicit about it. At this point in the freeze period I don't bother pinging @Octachron for every small bugfix.

@OlivierNicole
Copy link
Contributor Author

Oh, my bad, I missed the notification item. Thanks!

@OlivierNicole OlivierNicole deleted the fix-race-marking-sweeping branch February 6, 2024 00:19
@NickBarnes
Copy link
Contributor

I'm surprised that this change silences TSAN here. Surely what we actually want is a relaxed atomic read, and I don't think *(volatile header_t *)p is guaranteed to do that; it's just guaranteed to do a read (rather than some cunning compiler optimisation producing the value without going to memory). I suspect that just about every header read in the runtime should be a relaxed atomic read, for the same reason.

@OlivierNicole
Copy link
Contributor Author

I understand your suprise, the subject relaxed atomic vs volatile has been the subject of much discussion. See

/* Note [MMMOC]: Mixing the Memory Models of OCaml and C.
and
3.3. volatile accesses
. (These comments have been written after discussion in a few places such as #10992 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TSan-reported race between GC marking and sweeping

4 participants