New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make signals safe for multicore #630
Make signals safe for multicore #630
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a deep signals expert, but I think this refactor does what it says.
I had one behaviour change for caml_leave_blocking_section
where I wonder if it is intended. A couple of nits on cleanliness of header files.
Regarding the atomics, I think it is ok; but this stuff can be hard to reason about. However I did wonder why we don't try to just play safe and use seq-cst for all the atomic fetch add/sub and writes. I guess the place where the memory ordering optimization might be worth it is the fast path of enter/leave of blocking sections. Happy to play it safe, until profiling on ARM64 says otherwise, or take it as it is.
caml_enter_blocking_section_hook (); | ||
} | ||
|
||
CAMLexport void caml_leave_blocking_section(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is a deliberate change here. Previously we had:
CAMLexport void caml_leave_blocking_section() {
caml_leave_blocking_section_hook();
caml_process_pending_signals();
}
I see upstream having:
CAMLexport void caml_leave_blocking_section(void)
{
int saved_errno;
/* Save the value of errno (PR#5982). */
saved_errno = errno;
caml_leave_blocking_section_hook ();
...
if (check_for_pending_signals()) {
signals_are_pending = 1;
caml_set_action_pending();
}
errno = saved_errno;
}
Is it deliberate to lose the processing of pending signals in caml_leave_blocking_section
or the setting of a flag (or other)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth pointing out that the one you have from trunk (which the EINTR pretty much matches other than check_for_pending_signals) doesn't actually process pending signals but just checks if they're pending due because it's possible for signals_are_pending
to be flagged off even if there is one available in the specific signal flags.
We don't have that problem in this PR because we're using counters.
This is great work, thank you very much for this! I had a first shot at reading this, I will submit a proper review later if I find anything more than what @ctk21 did. I feel like we have some missing processing still in the bytecode runtime, there's something that bugs me with the way it is done in |
This reverts commit 4ab286d.
You're right, we should just play it safe here until we know otherwise. I've moved the relaxed orderings to their appropriate stronger equivalents. |
I took a look at the changes and it looks good to me. |
Thanks @Engil @kayceesrk are you planning to review or are we good to go on this? Once we've got #631 merged too I can put the EINTR PR up. |
caml_pending_signals[signal_number] = 1; | ||
caml_signals_are_pending = 1; | ||
atomic_fetch_add_explicit(&caml_pending_signals[signal_number], 1, memory_order_seq_cst); | ||
atomic_fetch_add_explicit(&total_signals_pending, 1, memory_order_seq_cst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert on memory ordering. Can you explain why we do memory_order_acq_rel
at line 98 [1] but memory_order_seq_cst
here? I wonder whether adding comments on the choice of memory ordering should be attached to the declaration of atomic variables, if they use interesting memory ordering. For total_signals_pending
, we may mention that the we use acquire for loads and sequential consistency for CASes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the other ordering is acquire release is an oversight. I went with Tom's suggestion of going for the strongest possible ordering and we can look at loosening them up if they proved to be a problem.
I've moved the other signals.c
operations to seq_cst
now.
I've left a minor comment about memory ordering. Otherwise, looks fine to me. Feel free to merge. Meta comment: I wonder how best to document the choice of memory ordering within the runtime. The memory ordering may not trigger bugs on x86 as the hardware memory model is quite strong. But we may end up seeing hard to find bugs on Arm64, which cannot be replicated under rr. I wonder whether it is worth going for weaker memory semantics for things like signal handling which are inherently slow. Would it be a reasonable idea to use sequential consistency everywhere for the atomic variables such as these and document that we're erring on the side of correctness rather than efficiency and include some notes on what weaker operations could be used. For the upstreaming process, I would like an C++ memory model expert review the use of atomics in the runtime system. |
Looks good. Merging now. |
…gnals_multicore Make signals safe for multicore
…gnals_multicore Make signals safe for multicore
This is the first of three PRs that overhaul Multicore's signals implementation to:
The following PRs deal with asynchronous exceptions and changes to IO to make it safe in their presence.
Behaviour of signals in the presence of multiple domains
The intended behaviour for this PR is that:
This is achieved by changing the existing flags implementation into one that uses atomic counters and a CAS when executing a signal to handle races. I would appreciate someone checking the memory orderings.
Code motion
There is a little code motion around signals/domains, which attempts to line things up as they are in 4.12 trunk.