Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make signals safe for multicore #630

Merged

Conversation

sadiqj
Copy link
Collaborator

@sadiqj sadiqj commented Jul 29, 2021

This is the first of three PRs that overhaul Multicore's signals implementation to:

  1. Take components that have diverged from trunk back to trunk with minimal changes (where trunk is ocaml/ocaml 4.12 branch)
  2. Provide some clear semantics around how signals should work in the presence of multiple Domains and a correct implementation for that

The following PRs deal with asynchronous exceptions and changes to IO to make it safe in their presence.


Behaviour of signals in the presence of multiple domains

The intended behaviour for this PR is that:

  1. Signal behaviour should behave no differently from trunk OCaml if there is only one domain
  2. If there is more than one domain, any domain may execute the OCaml signal handler i.e there is no guarantee that the thread that receives a signal is the one that executes it

This is achieved by changing the existing flags implementation into one that uses atomic counters and a CAS when executing a signal to handle races. I would appreciate someone checking the memory orderings.

Code motion

There is a little code motion around signals/domains, which attempts to line things up as they are in 4.12 trunk.

Copy link
Collaborator

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a deep signals expert, but I think this refactor does what it says.

I had one behaviour change for caml_leave_blocking_section where I wonder if it is intended. A couple of nits on cleanliness of header files.

Regarding the atomics, I think it is ok; but this stuff can be hard to reason about. However I did wonder why we don't try to just play safe and use seq-cst for all the atomic fetch add/sub and writes. I guess the place where the memory ordering optimization might be worth it is the fast path of enter/leave of blocking sections. Happy to play it safe, until profiling on ARM64 says otherwise, or take it as it is.

runtime/caml/platform.h Show resolved Hide resolved
runtime/caml/signals.h Outdated Show resolved Hide resolved
runtime/domain.c Show resolved Hide resolved
caml_enter_blocking_section_hook ();
}

CAMLexport void caml_leave_blocking_section(void)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a deliberate change here. Previously we had:

CAMLexport void caml_leave_blocking_section() {
  caml_leave_blocking_section_hook();
  caml_process_pending_signals();
}

I see upstream having:

CAMLexport void caml_leave_blocking_section(void)
{
  int saved_errno;
  /* Save the value of errno (PR#5982). */
  saved_errno = errno;
  caml_leave_blocking_section_hook ();

   ...
  if (check_for_pending_signals()) {
    signals_are_pending = 1;
    caml_set_action_pending();
  }

  errno = saved_errno;
}

Is it deliberate to lose the processing of pending signals in caml_leave_blocking_section or the setting of a flag (or other)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth pointing out that the one you have from trunk (which the EINTR pretty much matches other than check_for_pending_signals) doesn't actually process pending signals but just checks if they're pending due because it's possible for signals_are_pending to be flagged off even if there is one available in the specific signal flags.

We don't have that problem in this PR because we're using counters.

@abbysmal
Copy link
Collaborator

abbysmal commented Aug 2, 2021

This is great work, thank you very much for this!

I had a first shot at reading this, I will submit a proper review later if I find anything more than what @ctk21 did.

I feel like we have some missing processing still in the bytecode runtime, there's something that bugs me with the way it is done in interp.c, where process_signals and CHECK_SIGNALS are not doing any signal related work. (rather just checking GC interrupts.)
I do know that this codepath specifically is what makes testprempt in #603 fail. (test which is disabled in your branch because there's no tick thread here.)

@sadiqj
Copy link
Collaborator Author

sadiqj commented Aug 5, 2021

Regarding the atomics, I think it is ok; but this stuff can be hard to reason about. However I did wonder why we don't try to just play safe and use seq-cst for all the atomic fetch add/sub and writes. I guess the place where the memory ordering optimization might be worth it is the fast path of enter/leave of blocking sections. Happy to play it safe, until profiling on ARM64 says otherwise, or take it as it is.

You're right, we should just play it safe here until we know otherwise. I've moved the relaxed orderings to their appropriate stronger equivalents.

@abbysmal
Copy link
Collaborator

abbysmal commented Aug 6, 2021

I took a look at the changes and it looks good to me.
However while giving it a try with testpreempt on the tick thread branch, it seems there's still an issue with how signals are processed in the bytecode version.
I did not investigate it further, however my take on that specific one is that we should merge this and figure out this issue (if it's still a thing) after the EINTR fix is merged, I do not think it is worthwhile pursuing this further since we have a better situation down the path anyway.

@sadiqj
Copy link
Collaborator Author

sadiqj commented Aug 8, 2021

Thanks @Engil

@kayceesrk are you planning to review or are we good to go on this?

Once we've got #631 merged too I can put the EINTR PR up.

caml_pending_signals[signal_number] = 1;
caml_signals_are_pending = 1;
atomic_fetch_add_explicit(&caml_pending_signals[signal_number], 1, memory_order_seq_cst);
atomic_fetch_add_explicit(&total_signals_pending, 1, memory_order_seq_cst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert on memory ordering. Can you explain why we do memory_order_acq_rel at line 98 [1] but memory_order_seq_cst here? I wonder whether adding comments on the choice of memory ordering should be attached to the declaration of atomic variables, if they use interesting memory ordering. For total_signals_pending, we may mention that the we use acquire for loads and sequential consistency for CASes?

[1] https://github.com/ocaml-multicore/ocaml-multicore/pull/630/files#diff-93ee9a0b40685be35da1909f57cba82cfe51d5a8de495947e5068a8b8d8172cbR98

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the other ordering is acquire release is an oversight. I went with Tom's suggestion of going for the strongest possible ordering and we can look at loosening them up if they proved to be a problem.

I've moved the other signals.c operations to seq_cst now.

@kayceesrk
Copy link
Contributor

kayceesrk commented Sep 6, 2021

I've left a minor comment about memory ordering. Otherwise, looks fine to me. Feel free to merge.

Meta comment: I wonder how best to document the choice of memory ordering within the runtime. The memory ordering may not trigger bugs on x86 as the hardware memory model is quite strong. But we may end up seeing hard to find bugs on Arm64, which cannot be replicated under rr. I wonder whether it is worth going for weaker memory semantics for things like signal handling which are inherently slow. Would it be a reasonable idea to use sequential consistency everywhere for the atomic variables such as these and document that we're erring on the side of correctness rather than efficiency and include some notes on what weaker operations could be used. For the upstreaming process, I would like an C++ memory model expert review the use of atomics in the runtime system.

@kayceesrk
Copy link
Contributor

Looks good. Merging now.

@kayceesrk kayceesrk merged commit 2a8b802 into ocaml-multicore:4.12+domains+effects Sep 6, 2021
@ctk21 ctk21 moved this from In progress to Done in Prepare multicore to enable 5.0 patchset Sep 6, 2021
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
…gnals_multicore

Make signals safe for multicore
ctk21 pushed a commit to ctk21/ocaml that referenced this pull request Jan 11, 2022
…gnals_multicore

Make signals safe for multicore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants