Make signals safe for multicore #630

sadiqj · 2021-07-29T13:11:45Z

This is the first of three PRs that overhaul Multicore's signals implementation to:

Take components that have diverged from trunk back to trunk with minimal changes (where trunk is ocaml/ocaml 4.12 branch)
Provide some clear semantics around how signals should work in the presence of multiple Domains and a correct implementation for that

The following PRs deal with asynchronous exceptions and changes to IO to make it safe in their presence.

Behaviour of signals in the presence of multiple domains

The intended behaviour for this PR is that:

Signal behaviour should behave no differently from trunk OCaml if there is only one domain
If there is more than one domain, any domain may execute the OCaml signal handler i.e there is no guarantee that the thread that receives a signal is the one that executes it

This is achieved by changing the existing flags implementation into one that uses atomic counters and a CAS when executing a signal to handle races. I would appreciate someone checking the memory orderings.

Code motion

There is a little code motion around signals/domains, which attempts to line things up as they are in 4.12 trunk.

ctk21

I'm not a deep signals expert, but I think this refactor does what it says.

I had one behaviour change for caml_leave_blocking_section where I wonder if it is intended. A couple of nits on cleanliness of header files.

Regarding the atomics, I think it is ok; but this stuff can be hard to reason about. However I did wonder why we don't try to just play safe and use seq-cst for all the atomic fetch add/sub and writes. I guess the place where the memory ordering optimization might be worth it is the fast path of enter/leave of blocking sections. Happy to play it safe, until profiling on ARM64 says otherwise, or take it as it is.

runtime/caml/platform.h

runtime/caml/signals.h

runtime/domain.c

ctk21 · 2021-07-30T12:18:52Z

runtime/signals.c

+  caml_enter_blocking_section_hook ();
+}
+
+CAMLexport void caml_leave_blocking_section(void)


I'm not sure if there is a deliberate change here. Previously we had:

CAMLexport void caml_leave_blocking_section() { caml_leave_blocking_section_hook(); caml_process_pending_signals(); }

I see upstream having:

CAMLexport void caml_leave_blocking_section(void) { int saved_errno; /* Save the value of errno (PR#5982). */ saved_errno = errno; caml_leave_blocking_section_hook (); ... if (check_for_pending_signals()) { signals_are_pending = 1; caml_set_action_pending(); } errno = saved_errno; }

Is it deliberate to lose the processing of pending signals in caml_leave_blocking_section or the setting of a flag (or other)?

It's worth pointing out that the one you have from trunk (which the EINTR pretty much matches other than check_for_pending_signals) doesn't actually process pending signals but just checks if they're pending due because it's possible for signals_are_pending to be flagged off even if there is one available in the specific signal flags.

We don't have that problem in this PR because we're using counters.

abbysmal · 2021-08-02T09:23:16Z

This is great work, thank you very much for this!

I had a first shot at reading this, I will submit a proper review later if I find anything more than what @ctk21 did.

I feel like we have some missing processing still in the bytecode runtime, there's something that bugs me with the way it is done in interp.c, where process_signals and CHECK_SIGNALS are not doing any signal related work. (rather just checking GC interrupts.)
I do know that this codepath specifically is what makes testprempt in #603 fail. (test which is disabled in your branch because there's no tick thread here.)

This reverts commit 4ab286d.

sadiqj · 2021-08-05T08:51:42Z

Regarding the atomics, I think it is ok; but this stuff can be hard to reason about. However I did wonder why we don't try to just play safe and use seq-cst for all the atomic fetch add/sub and writes. I guess the place where the memory ordering optimization might be worth it is the fast path of enter/leave of blocking sections. Happy to play it safe, until profiling on ARM64 says otherwise, or take it as it is.

You're right, we should just play it safe here until we know otherwise. I've moved the relaxed orderings to their appropriate stronger equivalents.

abbysmal · 2021-08-06T09:08:03Z

I took a look at the changes and it looks good to me.
However while giving it a try with testpreempt on the tick thread branch, it seems there's still an issue with how signals are processed in the bytecode version.
I did not investigate it further, however my take on that specific one is that we should merge this and figure out this issue (if it's still a thing) after the EINTR fix is merged, I do not think it is worthwhile pursuing this further since we have a better situation down the path anyway.

sadiqj · 2021-08-08T08:07:24Z

Thanks @Engil

@kayceesrk are you planning to review or are we good to go on this?

Once we've got #631 merged too I can put the EINTR PR up.

kayceesrk · 2021-09-06T04:06:25Z

runtime/signals.c

-  caml_pending_signals[signal_number] = 1;
-  caml_signals_are_pending = 1;
+  atomic_fetch_add_explicit(&caml_pending_signals[signal_number], 1, memory_order_seq_cst);
+  atomic_fetch_add_explicit(&total_signals_pending, 1, memory_order_seq_cst);


I am not an expert on memory ordering. Can you explain why we do memory_order_acq_rel at line 98 [1] but memory_order_seq_cst here? I wonder whether adding comments on the choice of memory ordering should be attached to the declaration of atomic variables, if they use interesting memory ordering. For total_signals_pending, we may mention that the we use acquire for loads and sequential consistency for CASes?

[1] https://github.com/ocaml-multicore/ocaml-multicore/pull/630/files#diff-93ee9a0b40685be35da1909f57cba82cfe51d5a8de495947e5068a8b8d8172cbR98

That the other ordering is acquire release is an oversight. I went with Tom's suggestion of going for the strongest possible ordering and we can look at loosening them up if they proved to be a problem.

I've moved the other signals.c operations to seq_cst now.

kayceesrk · 2021-09-06T04:13:50Z

I've left a minor comment about memory ordering. Otherwise, looks fine to me. Feel free to merge.

Meta comment: I wonder how best to document the choice of memory ordering within the runtime. The memory ordering may not trigger bugs on x86 as the hardware memory model is quite strong. But we may end up seeing hard to find bugs on Arm64, which cannot be replicated under rr. I wonder whether it is worth going for weaker memory semantics for things like signal handling which are inherently slow. Would it be a reasonable idea to use sequential consistency everywhere for the atomic variables such as these and document that we're erring on the side of correctness rather than efficiency and include some notes on what weaker operations could be used. For the upstreaming process, I would like an C++ memory model expert review the use of atomics in the runtime system.

kayceesrk · 2021-09-06T07:42:27Z

Looks good. Merging now.

…gnals_multicore Make signals safe for multicore

sadiqj added 3 commits July 29, 2021 13:43

make signals safe for multicore

199dbd1

remove this unnecessary variable (which, due to a renaming, clashes)

832e8b3

sigh, whitespace in disabled tests

9aa4df5

sadiqj mentioned this pull request Jul 30, 2021

Don't deliver signals to threads that have blocked them #598

Closed

ctk21 reviewed Jul 30, 2021

View reviewed changes

ctk21 added this to In progress in Prepare multicore to enable 5.0 patchset Jul 30, 2021

sadiqj added 7 commits August 5, 2021 08:27

remove unnecessary header

4ab286d

nuke old comment

c73934d

remove prototypes from domain.h that are no longer in domain.c

c5e10d6

explicitly check for signals in the bytecode intrepreter

c8cae43

add signals header to major_gc.c

fcf988d

Revert "remove unnecessary header"

4add8d9

This reverts commit 4ab286d.

tighten up the relaxed memory orders

1f9a8ff

kayceesrk reviewed Sep 6, 2021

View reviewed changes

and tighten the rest of the orderings

d93b14e

kayceesrk merged commit 2a8b802 into ocaml-multicore:4.12+domains+effects Sep 6, 2021

ctk21 moved this from In progress to Done in Prepare multicore to enable 5.0 patchset Sep 6, 2021

sadiqj mentioned this pull request Sep 15, 2021

Integrate all of trunk's EINTR fixes #649

Merged

This was referenced Sep 30, 2021

Signals and Domains #334

Closed

Review io.c for thread-safety and add parallel tests #618

Closed

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022

Merge pull request ocaml-multicore/ocaml-multicore#630 from sadiqj/si…

8e3d529

…gnals_multicore Make signals safe for multicore

ctk21 pushed a commit to ctk21/ocaml that referenced this pull request Jan 11, 2022

Merge pull request ocaml-multicore/ocaml-multicore#630 from sadiqj/si…

3911db0

…gnals_multicore Make signals safe for multicore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make signals safe for multicore #630

Make signals safe for multicore #630

sadiqj commented Jul 29, 2021 •

edited

ctk21 left a comment

ctk21 Jul 30, 2021

sadiqj Aug 5, 2021

abbysmal commented Aug 2, 2021

sadiqj commented Aug 5, 2021

abbysmal commented Aug 6, 2021

sadiqj commented Aug 8, 2021 •

edited

kayceesrk Sep 6, 2021

sadiqj Sep 6, 2021

kayceesrk commented Sep 6, 2021 •

edited

kayceesrk commented Sep 6, 2021

Make signals safe for multicore #630

Make signals safe for multicore #630

Conversation

sadiqj commented Jul 29, 2021 • edited

ctk21 left a comment

Choose a reason for hiding this comment

ctk21 Jul 30, 2021

Choose a reason for hiding this comment

sadiqj Aug 5, 2021

Choose a reason for hiding this comment

abbysmal commented Aug 2, 2021

sadiqj commented Aug 5, 2021

abbysmal commented Aug 6, 2021

sadiqj commented Aug 8, 2021 • edited

kayceesrk Sep 6, 2021

Choose a reason for hiding this comment

sadiqj Sep 6, 2021

Choose a reason for hiding this comment

kayceesrk commented Sep 6, 2021 • edited

kayceesrk commented Sep 6, 2021

sadiqj commented Jul 29, 2021 •

edited

sadiqj commented Aug 8, 2021 •

edited

kayceesrk commented Sep 6, 2021 •

edited