Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Thread.sigmask, take 2 #2211

Merged
merged 2 commits into from Mar 1, 2019

Conversation

Projects
None yet
7 participants
@jhjourdan
Copy link
Contributor

commented Dec 21, 2018

This is a follow-up of #2104.

  • renamed caml_signals_might_be_pending back into caml_signals_are_pending
  • wrote a more robust and more sensitive test. The previous one was both not sensitive enough (I was sometimes still succeeding when removing the fix) and bogus on some platforms (e.g., FreeBSD, MacOSX).
  • fixed a small bug in test lib-threads/signal.ml
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2018

@gasche Could you please confirm that the new test is suceeding on INRIA's MacOS CI?

@diml Could you do a quick review again? The new commits are separated.

@dra27

This comment has been minimized.

Copy link
Contributor

commented Dec 22, 2018

I've scheduled precheck#173. @diml is on holiday until January - is it just the test which has significantly altered?

@gasche

This comment has been minimized.

Copy link
Member

commented Dec 22, 2018

Unfortunately the build still fails on macos machines -- with a timeout, it looks like:

 ... testing 'threadsigmask.ml' with 1.1.1.1.1.1.1 (run) => Build timed out (after 20 minutes). Marking the build as failed.
/Users/jenkins/builds/workspace/precheck/flambda/false/label/ocaml-macos-10.13/testsuite/tests/lib-systhreads/threadsigmask.run: line 4: 80833 Terminated: 15          ${program} > ${output}

And here is the failure log on freebsd-64 (was also failing with the previous PR):

 ... testing 'threadsigmask.ml' with 1.1.2.1.1.1.1 (run) => Fatal error: exception Unix.Unix_error(Unix.ESRCH, "kill", "")
passed
@shindere

This comment has been minimized.

Copy link
Contributor

commented Dec 22, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

If you create one (visit ci.inria.fr) and ask to join the OCaml project

Done, thanks.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

I've scheduled precheck#173. @diml is on holiday until January - is it just the test which has significantly altered?

Mainly, yes. Other changes include minor refactoring (renaming) so that Coq can still hijack caml_signals_are_pending, and and small fix in test lib-thread/signal.ml, which @shindere can review.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 2937d05 to 18c7ef2 Dec 23, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

I pushed a new version of the test, which hopefully fixes both issues.

then you can run precheck on your branch

However, it does not seem I have the necessary permissions to run a new job on Jenkins. Or, at least, I don't know how to trigger it. Could you help me?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

Another remark on the MacOS test: it seems like on this platform, Pervasives.exit does not end the whole process if there are other threads running. Is this a known issue?

@shindere

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2018

@shindere

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2018

@shindere

This comment has been minimized.

Copy link
Contributor

commented Dec 23, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

Another remark on the MacOS test: it seems like on this platform,
Pervasives.exit does not end the whole process if there are other
threads running. Is this a known issue?
@mshinwell?

False alarm. The test is still doing a timeout despite my changes, so this seems to indicate the issue was not in exit... I still don't understand what is happening on MacOS.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 18c7ef2 to fd23529 Dec 23, 2018

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 23, 2018

I have been able to make the test not doing a timeout by only using 2 threads.... However the test i still failing: the process does not seem to receive any signal.

This may be a bug in MacOS's implementation of pthread_sigmask, but I have no test machine to check.

@diml

This comment has been minimized.

Copy link
Member

commented Jan 7, 2019

@jhjourdan what about the following test: we have three threads:

  • main thread: blocks USR1 and USR2
  • thread1: blocks only USR2
  • thread2: blocks only USR1

thread1 and thread2 run the infinite allocation loop. The main thread sends 1000 signals to Unix.getpid (), randomly alternating between USR1 and USR2. If at the end of the test all the USR1 signals have been received by thread1 and all the USR2 signals have been received by thread2 then we conclude that the behaviour is correct. That seems a bit more robust to me.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from fd23529 to 2dadbea Jan 15, 2019

Show resolved Hide resolved testsuite/tests/lib-systhreads/threadsigmask.ml Outdated
@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 15, 2019

I updated the test again: It still fails on MacOSX. It seems like on this platform, Thread.sigmask fails to block signals.

I can't see why my patch could behave differently on MacOS. I'm trying to get access to a MacOS machine to do some test in order to better understand what is happening there.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch 2 times, most recently from 0e9459e to 66bf213 Jan 28, 2019

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

I found the problem on MacOS: The signal handling mechanism in caml_execute_signal used to call sigprocmask, which had the effect of setting signal masks globally for all threads.

I fixed the bug by replacing all the calls to sigprocmask with a call to pthread_sigmaks. This will change the semantics of Unix.sigprocmask on MacOS, since it used to set the masks for all processes, but I think this is OK, since this will make the behavior consistent with Linux.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2019

Note that the build is passing on INRIA CI on MacOSX:

https://ci.inria.fr/ocaml/job/precheck/186/

(Some builds are failing on Cygwin, but this seems unrelated.)

@diml

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

I fixed the bug by replacing all the calls to sigprocmask with a call to pthread_sigmaks. This will change the semantics of Unix.sigprocmask on MacOS, since it used to set the masks for all processes, but I think this is OK, since this will make the behavior consistent with Linux.

That seems fine to me, however it seems worth mentioning this in the documentation of Unix.sigprocmask in unix.mli.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2019

Done.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2019

(I also rebased on trunk, and updated Changes.)

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2019

When merging, you may want to say whether that should go in 4.08 or not.

Well, I don't know, I have the feeling this is not my decision. I think that, as a bugfix, this can indeed go to 4.08. However, this is quite subtle, so I could understand waiting 4.09.

@shindere

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

@@ -0,0 +1,22 @@
(* Program used together with the threadsigmask.ml test for sending 10 time the

This comment has been minimized.

Copy link
@diml

diml Jan 29, 2019

Member

Do we still need this file?

This comment has been minimized.

Copy link
@jhjourdan

jhjourdan Jan 30, 2019

Author Contributor

Well spotted. Removed.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 71765ca to 398431b Jan 30, 2019

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2019

I discussed a bit with @xavierleroy today. He said he wanted to have a quick look to this. Perhaps we could wait for him before merging?

@diml

This comment has been minimized.

Copy link
Member

commented Jan 30, 2019

Seems good to me, let's wait for @xavierleroy's review

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2019

Appveyor build seems to fail but actually passes.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2019

I was just curious how @jhjourdan implemented the choice between sigprocmask and pthread_sigmask. I read the code again and it looks fine to me.

@diml

diml approved these changes Jan 31, 2019

@diml

This comment has been minimized.

Copy link
Member

commented Jan 31, 2019

@damiendoligez, should this go in 4.08 or 4.09?

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2019

Do we have any progress on this ?

AFAIU, we are waiting for a decision about 4.08 vs. 4.09 and for the final green light for merging.

@damiendoligez ?

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Feb 27, 2019

OK for cherry-picking to 4.08.

@diml

This comment has been minimized.

Copy link
Member

commented Feb 27, 2019

@jhjourdan could you rebase/squash the commits? I'll merge and cherry-pick after that

jhjourdan added some commits Dec 21, 2018

Fix Thread.sigmaks, by checking whether a signal is masked before han…
…dling it.

We use [sigprocmask] (if available) to check whether a signal is
blocked when the systhread library is not loaded. As soon as the
[Thread] module gets loaded, we use [pthread_sigmask] instead, and
redirect all the calls to [sigprocmask] to [pthread_sigmask].  Indeed,
the latter has unspecified behavior in a multi-threaded context
anyway.  In practice, this should not change the semantics of
[Unix.sigprocmask] on Linux, since on this platform, [pthread_sigmask]
is actually an alias for [sigprocmask]. On MacOSX, the semantics will
change, since [sigprocmask] changes the masks of the whole process on
this platform.

Also, include [caml_pending_signals] in signals returned by
[Unix.sigpending]. Indeed, some signals might have been handled in the
POSIX sense by the C handler in the OCaml runtime, but not been
handled in the OCaml sense (for example, because they are blocked).

This commit un-reverts 1c82c48, which has been reverted in
79eb572. The issues of the original commit are corrected in this commit.

@jhjourdan jhjourdan force-pushed the jhjourdan:fixthreadsigmask branch from 398431b to 854d1c8 Feb 28, 2019

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2019

I rebased and squashed the commits.

Note that I also made a slight change: it turns out sigprocmask and pthread_sigmask do not exactly have the same interface, since the former uses errno but latter doesn't. I wrote a thin wrapper around sigprocmask to accommodate this.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

@diml, I think everything is ready !

@damiendoligez damiendoligez merged commit 08ed1e8 into ocaml:trunk Mar 1, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

damiendoligez added a commit that referenced this pull request Mar 1, 2019

Fix Thread.sigmask, take 2 (#2211)
Fix Thread.sigmaks, by checking whether a signal is masked before handling it.

We use [sigprocmask] (if available) to check whether a signal is
blocked when the systhread library is not loaded. As soon as the
[Thread] module gets loaded, we use [pthread_sigmask] instead, and
redirect all the calls to [sigprocmask] to [pthread_sigmask].  Indeed,
the latter has unspecified behavior in a multi-threaded context
anyway.  In practice, this should not change the semantics of
[Unix.sigprocmask] on Linux, since on this platform, [pthread_sigmask]
is actually an alias for [sigprocmask]. On MacOSX, the semantics will
change, since [sigprocmask] changes the masks of the whole process on
this platform.

Also, include [caml_pending_signals] in signals returned by
[Unix.sigpending]. Indeed, some signals might have been handled in the
POSIX sense by the C handler in the OCaml runtime, but not been
handled in the OCaml sense (for example, because they are blocked).

This commit un-reverts 1c82c48, which has been reverted in
79eb572. The issues of the original commit are corrected in this commit.
@damiendoligez

This comment has been minimized.

Copy link
Member

commented Mar 1, 2019

Merged to trunk and cherry-picked to 4.08 as commit 2515285.

@jhjourdan

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

Wonderful, thanks :)

@jhjourdan jhjourdan deleted the jhjourdan:fixthreadsigmask branch Mar 1, 2019

@diml

This comment has been minimized.

Copy link
Member

commented Mar 4, 2019

Thanks for merging @damiendoligez, I was out Friday and AFK most of Thursday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.