Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Thread.sigmask by checking signal masks when starting the handler of a signal #2104

Merged
merged 1 commit into from Dec 19, 2018

Conversation

Projects
None yet
5 participants
@jhjourdan
Copy link
Contributor

commented Oct 12, 2018

The handler of a signal, if run from a suspended thread, will only
register the signal for the handling thread instead of the currently
executing thread.

This fixes MPR#4127 and MPR#7709.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch 2 times, most recently from 92995ee to afcc649 Oct 12, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2018

Would anyone be interested in reviewing this PR ?

@diml diml self-requested a review Oct 29, 2018

@diml

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

I'm going to review it

@diml diml self-assigned this Oct 29, 2018

@diml

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

I'm trying to convince myself that there is no race condition in this code. Wouldn't it be simpler to just send all the signals to a dedicated thread that would execute all the handlers synchronously?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2018

I think the only real race is between the handlers and the Thread.sigmask function. There is a possibility that a thread receives a signal before it gets masked, but handles it just after.

However, this race already exists with Unix.sigmaks, so I am not sure this is a bug or not.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2018

Wouldn't it be simpler to just send all the signals to a dedicated thread that would execute all the handlers synchronously?

Note that this thread would not be able to handle the signals itself. All it would be able to do is set the appropriate bits of the appropriate threads. And since one thread could be currently running, it may have to set the global bits instead. This would also lead to a risk of race.

@diml

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

Note that this thread would not be able to handle the signals itself.

Why not?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2018

That is the whole point of this PR: make sure that signals are handled by threads that are allowed to do so. This is important for several reasons:

  • Exceptions raised in handlers have to be catched in the corresponding threads.
  • Not handling signals in the correct thread can also make the handler be seen as non-atomic by the thread that should handle it. I.e., if the signal is executed in a different thread, the thread that should handle it could see the various steps of execution of the handler, even though it should appear as atomic for that thread.
  • The handling thread can also be observed through Thread.self and the backtrace.

The exception mechanism I described is used in Coq, for example, where there is one worker thread and one I/O thread. The worker thread can setup a signal to interrupt computation after some timeout occurs. If the I/O thread handles the signal, then no interruption occurs.

If what you are proposing is to handle signals in a dedicated thread and then dispatch exceptions when they occur in the thread that should have handled it, then you have several other issues:

  • You have to make sure the thread that should receive the signal is paused during the execution of the handler
  • You have to give Thread.self and backtraces appropriate behaviors
  • You have to make sure there is no race in the code that dispatches exceptions. This does seems to be as difficult as making sure there is no race in the code that dispatches signals.
@diml

This comment has been minimized.

Copy link
Member

commented Oct 29, 2018

I see. I simply didn't know that signal handlers where used this way by applications other than the toplevel.

To go back to this PR, the possible race condition I was thinking about was between setting curr_thread and a signal handler. I need to read the code again.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2018

To go back to this PR, the possible race condition I was thinking about was between setting curr_thread and a signal handler. I need to read the code again.

First, note that curr_thread and pending_signals are made volatile so that we are sure the compiler does not reorder memory accesses, and hence the handler and the main code see the memory accesses in the same order (both the handler and the main code execute in the same thread, so there is no CPU reordering involved).

Also, note that, while saving or restoring pending_signals, we only copy one of these bits when it is set. We never copy a 0 bit, but instead we make sure that the destination buffer is 0 initially. This makes sure that if a signal occurs while doing the copy, the new pending_signals bit is not overridden by the copy.

Now, here is why I think there is no race:

  • When leaving a blocking section (i.e., tacking the lock), if a handler executes after setting curr_thread, then it will set the global pending_signals, which is fine because this is the thread that is going to be executed.
  • When entering a blocking section (i.e., releasing the lock), we first set curr_thread to NULL to make sure that the handler writes to the local pending_signals.
@diml

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

BTW, why do we still need a global pending_signals? Wouldn't it be simpler to make it a pointer to the pending_signals of the current running thread?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Oct 30, 2018

BTW, why do we still need a global pending_signals? Wouldn't it be simpler to make it a pointer to the pending_signals of the current running thread?

That's a valid point. Currently, pending_signals is a global array. What you are proposing would be to make it a global pointer that points to a malloced array. I would have to use a different protocol to make sure there is no race between the handler and the code that sets this pointer, but that does not seem difficult.

I guess the reason for this choice is that I wanted to follow what seems to be the current convention in st_stubs.c: change as little as possible the OCaml runtime and have a copy of the runtime's data structure for each thread in caml_thread_struct. If you think using a pointer as you describe is cleaner, I can change the code.

@diml

This comment has been minimized.

Copy link
Member

commented Oct 30, 2018

That seems simpler to me yes. In particular, the caml_thread_record_signal_hook could then be written this way:

static void caml_thread_record_signal_hook(int signal_number) {
  /* The signal can be received by a C thread which does not
     correspond to the OCaml thread currently running.

     In this case, we might not be allowed to handle the signal in the
     currently running thread, since it could be masked. Hence, if the
     receiving thread is not currently running, we simply record it to
     be handled when the thread will be scheduled. This way, signals
     are always handled by the right thread, even if triggered via
     pthread_kill.
  */
  caml_thread_t handler_thread;

  handler_thread = st_tls_get(thread_descriptor_key);

  if(handler_thread == NULL)
    /* This should not happen, except if threads are manually launched
       in C by the user. Let's ignore the signal in this case. */
    return;

  handler_thread->pending_signals[signal_number] = 1;

  caml_record_signal_received(); /* do the caml_young_limit = ... trick */
}

which looks a bit more straightforward to me as all the logic for recording the signal is at the same place.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from afcc649 to 6b953c3 Nov 1, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 1, 2018

Alright, I tried to do what you proposed. I had some issues, because of the way signals are handled on Windows and of the way the tick signal is delivered. The final result is something rather more complicated than the previous proposal.

Anyway, I will not enter into the details, because I figured out that there is a fundamental flaw in the approach I am currently following: The pthread_getspecific function is not async-signal-safe and therefore cannot be used safely in the signal handler.

I'll come up with a different proposal soon.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 5, 2018

Ah, I wasn't aware of that. How does that work for errno then? That limits a lot what you can use from a signal handler.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 5, 2018

How does that work for errno then?

On GNU x86_64, it seems like they are essentially using the gcc-specific __thread keyword, whihc allows declaring a thread-local variable. However, this is relatively recent, and not present on all architectures. Not sure we want to use this for OCaml (although perhaps there is enough support on architectures that are of interest ?).

That limits a lot what you can use from a signal handler.

Indeed :( What's really frustrating here is that it seems that, at least on Linux, pthread_getspecific is async-signal-safe after the first access to a thread-specific variable, because the memory is lazily allocated. In our case, the thread handler is only reading a value that has already been written outside, so there does not seem to be risk. But this is undocumented and probably non-portable.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

For now, I suggest that we assume that our use of pthread_getspecific is valid and portable. We can always change the method to __thread later.

I had a look at the new code. You mentioned issues with the tick thread and Windows, how difficult do you think it is to make it work? Personally, i find it simpler to follow the overall logic this way.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 6b953c3 to 4382020 Nov 6, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

For now, I suggest that we assume that our use of pthread_getspecific is valid and portable. We can always change the method to __thread later.

I could also try my other approach where masking of signal is emulated by the OCaml runtime.

I had a look at the new code. You mentioned issues with the tick thread and Windows, how difficult do you think it is to make it work? Personally, i find it simpler to follow the overall logic this way.

Both problems were related to the fact that neither signals on Windows nor the tick thread were using POSIX signals. They were simply calling the caml_record_signal directly. This is made impossible with the current proposal, since in both cases, the caml_record_signal is called from a non-OCaml thread, and therefore the TLS cannot be used.

That should work in the current state (I had to fix a silly mistake in the Windows code, I hope there is no other such errors). For the tick thread, I reused caml_async_action_hook which was already used by the other (emulated) thread library. For Windows, I simply deactivated the thread-local pending_signal array, since masking signals is anyway not possible on Windows.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

Ok, do we need to do something similar when the signal is received by a C thread that is not registered with the OCaml runtime?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

Ok, do we need to do something similar when the signal is received by a C thread that is not registered with the OCaml runtime?

Well, in that case, you should either mask it in that thread or register a different handler.

But this is to be expected: If you are using the OCaml handler, then you should arrange that the signal is handled by an OCaml thread.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 6, 2018

The programmer might not always have control over such threads though.

Just to clarify, if an OCaml signal handler is installed for signal N and a C thread receive signal N, then:

  • with master: the handler will be executed by a random OCaml thread, possibly one that has signal N blocked
  • with this PR: the signal is ignored

Do you agree?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 6, 2018

The programmer might not always have control over such threads though.

Just to clarify, if an OCaml signal handler is installed for signal N and a C thread receive signal N, then:

with master: the handler will be executed by a random OCaml thread, possibly one that has signal N blocked
with this PR: the signal is ignored

Indeed. Plus, if I understand well the code for pthread_getspecific in glibc, there is a slight chance that pthread_getspecific calls malloc in a async-signal-unsafe fashion.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch 2 times, most recently from 27372f8 to 756ce1c Nov 7, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 7, 2018

After several iterations, I have finally been able to make it work under Windows. This should be ready for reviewing now.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

I feel uneasy at the idea of dropping signals. It basically means that as soon as you start using a C library that uses threads under the hood, signals sent to your applications might be silently ignored. Isn't there something that can be done for this case?

This also makes me think that there currently is a race condition in the non-threaded runtime, i.e. you could end up with two threads running OCaml code. (Edit: after looking again at the code, the worst that can happen is an assertion failure).

@mshinwell

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

@diml and I discussed this. I'm also not in favour of dropping signals, so we thought perhaps if there is a per-thread queue for signals now, there could be a "whole program" queue as well. Signals delivered to non-OCaml threads would be put on the whole program queue; every OCaml thread would check that queue as well as its own queue when processing pending signals.

However there seems to be a serious obstacle to this: in the "real" C signal handler you'd need to find out what thread you were currently in. You cannot use pthread_getspecific because it's not async signal safe as per the POSIX spec. You can use pthread_self, but you cannot use pthread_equal would you believe! So any scheme using pthread_self would have to record the pthread_t values for later comparison. However you cannot dynamically allocate memory in any reasonable fashion in a signal handler, and because we don't know the total number of threads in the program, we might not have anywhere to store the pthread_t.

Alternatively we could maybe use a "__thread volatile sigatomic_t" TLS variable to get a per-thread pointer to note down the signal. We probably have adequate TLS support on our first-tier platforms now. However on Linux there used to be a bug which means that this access can call malloc on the first occasion (c.f. https://gcc.gnu.org/ml/gcc-help/2013-09/msg00187.html and the link on that page). I would not be surprised if this bug still exists. I don't see how we can produce a workaround because, whilst we can force the allocation on the OCaml threads, we would not appear to be able to do so for arbitrary C threads.

@diml suggested that maybe we shouldn't proceed and consider completely overhauling the signal handling mechanism into some more abstract form. This would also potentially solve problems such as immediately-executed OCaml signal handlers (e.g. when not using the systhreads library) doing all sorts of non-async-signal-safe things. I wonder if this should be thought about more in the context of the upcoming changes for multicore support.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

Yh, personally I do think that allowing arbitrary OCaml code inside signal handler is just to complicated to support correctly.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 8, 2018

BTW, looking at the original issue again, one way to improve the situation without complex changes is to check the current signal mask before calling the OCaml handler; when the signal is masked, we simply don't clear the pending signal and don't execute the handler.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2018

Signals delivered to non-OCaml threads would be put on the whole program queue; every OCaml thread would check that queue as well as its own queue when processing pending signals.

BTW, looking at the original issue again, one way to improve the situation without complex changes is to check the current signal mask before calling the OCaml handler; when the signal is masked, we simply don't clear the pending signal and don't execute the handler.

This is what I was proposing here:

I could also try my other approach where masking of signal is emulated by the OCaml runtime.

I'll try do code that. I thought about one issue: we would need to read the current mask of the current thread using pthread_sigmask. This function (on Linux) is just a thin wrapper around a syscall, which turns out to be relatively slow (~250ns on my computer). I was afraid that calling this at each minor collection would cause some performance issues. But now that I have done some experiments, I see that this is not much of a problem, even for heavily allocating programs.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Nov 8, 2018

handling mechanism into some more abstract form.

Could you be more precise? I don't see what kind of abstraction could help here.

@diml

This comment has been minimized.

Copy link
Member

commented Nov 12, 2018

Could you be more precise? I don't see what kind of abstraction could help here.

One possibility I had in mind was to provide a small DSL for the action to run inside the signal handler, such as:

module Action : sig
  val nop : t
  val write_to_fd : Unix.file_descr -> string -> t

  module External_ref : sig
    type t
    val create : int -> t
    val get : t -> int
  end

  val write_ref : External_ref.t -> int -> t
end

The idea would be to provide enough to encode the common signal handlers. It would then be safe to execute this DSL directly from the C signal handler.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 756ce1c to 63b2ff0 Dec 14, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 14, 2018

I'll try do code that. I thought about one issue: we would need to read the current mask of the current thread using pthread_sigmask. This function (on Linux) is just a thin wrapper around a syscall, which turns out to be relatively slow (~250ns on my computer). I was afraid that calling this at each minor collection would cause some performance issues. But now that I have done some experiments, I see that this is not much of a problem, even for heavily allocating programs.

I'm done implementing this. This should be ready to review (if CI passes). I also managed not keep the use of [caml_signals_are_pending] (renamed to [caml_signal_might_be_pending]), so that [pthread_sigmask] is not called at every minor GC pass.

Show resolved Hide resolved runtime/signals.c Outdated
Show resolved Hide resolved runtime/signals.c Outdated

@jhjourdan jhjourdan changed the title Fix Thread.sigmask by having a separate pending_signals array for each thread Fix Thread.sigmask by checking signal masks when starting the handler of a signal Dec 19, 2018

Fix Thread.sigmaks, by checking whether a signal is masked before han…
…dling it.

We use [sigprocmask] (if available) to check whether a signal is blocked when
the systhread library is not loaded. When it gets loaded, we use
[pthread_sigmask] instead.

Also, include [caml_pending_signals] in signals returned by
[Unix.sigpending]. Indeed, some signals might have been handled in the
POSIX sense by the C handler in the OCaml runtime, but not been
handled in the OCaml sense (for example, because they are blocked).

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from ae6b842 to e7918d4 Dec 19, 2018

@diml

diml approved these changes Dec 19, 2018

@diml diml merged commit 1c82c48 into ocaml:trunk Dec 19, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2018

Great, thanks.

How, I'm sorry, but I noticed a (small, without important consequence) mistake in the patch: the prototype of caml_process_pending_signals should be caml_process_pending_signals(void) instead of caml_process_pending_signals(). Perhaps could you fix that directly on trunk?

@gasche

This comment has been minimized.

Copy link
Member

commented Dec 20, 2018

Merging this PR broke the CI on OSX:


 ... testing 'threadsigmask.ml' with 1.1 (bytecode) => failed (program output /Users/jenkins/builds/workspace/main/flambda/false/label/ocaml-macos-10.13/testsuite/_ocamltest/tests/lib-systhreads/threadsigmask/ocamlc.byte/threadsigmask.byte.output differs from reference /Users/jenkins/builds/workspace/main/flambda/false/label/ocaml-macos-10.13/testsuite/tests/lib-systhreads/threadsigmask.reference: 
--- /Users/jenkins/builds/workspace/main/flambda/false/label/ocaml-macos-10.13/testsuite/tests/lib-systhreads/threadsigmask.reference	2018-12-19 11:38:20.000000000 +0100
+++ /Users/jenkins/builds/workspace/main/flambda/false/label/ocaml-macos-10.13/testsuite/_ocamltest/tests/lib-systhreads/threadsigmask/ocamlc.byte/threadsigmask.byte.output	2018-12-20 19:09:37.000000000 +0100
@@ -1 +1,3 @@
-Signal handled in thread 0
+Long computation result: 100000
+Signal handled in thread Long computation result: 0100000
+

)
@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

It's also affecting FreeBSD, and apparently racily - precheck 172 shows native code failing in both flambda and non-flambda byte bytecode only in flambda (I don't expect flambda is to blame here).

We can't have systemically failing CI this close to a freeze - can I suggest we revert and re-open this GPR?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2018

There seems to be a bug in the test. I am working on it.

Moreover, this PR broke Coq, because it is renaming signals_are_pending which was supposed to be private (but Coq VM hijacks it...).

So yes, I agree that this PR should be temporarily reverted so that I can prepare fixes without disturbing everyone.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

Thanks @jhjourdan!

@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 21, 2018

Reverted in 79eb572

@gasche

This comment has been minimized.

Copy link
Member

commented Dec 21, 2018

Note: a Mantis issue MP#7883 was opened for the Coq issue. (I assigned it to @jhjourdan.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.