New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make systhread mutexes errorcheck #9757
Conversation
56fdcb7
to
243346c
Compare
This means that an exception is raised when attempting to lock a mutex locked from the same thread, e.g. from an asynchronous callback. This changes the behaviour on Windows where mutexes were recursive. Add test for deadlock inside asynchronous callbacks.
243346c
to
650283a
Compare
I have not reviewed the code, but I see two issues with this PR: 1- Performance : pthread's errorcheck mutexes have a cost, and some people do (did ?) care about the performance of systhread (cf #4351). So, at the very least, we should do a benchmarck for this PR. 2- Releasing a mutex owned by another thread is now explicitly forbidden and checked by pthread. In the previous code, this was theoretically an undefined behavior inherited from pthread, but my gut feeling is that this was harmless because there has always been an HB relationship between the locker and the unlocker thanks to the master lock. So, this PR changes the behavior of mutexes in this case (if we assume that the UB I just described is benign). On the other hand, this PR fixes a potential UB for well-typed programs, which is good. |
@@ -0,0 +1,3 @@ | |||
start | |||
Sys_error("Mutex.lock: Resource deadlock avoided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact string reported here is libc- and locale-dependent. This test shouldn't fail if it changes.
I think this is a good change, independent of the discussion at #9722. The Posix code looks good to me, but I haven't read the Windows side. @jhjourdan releasing a mutex from the wrong thread seems like it could break on many libcs? Are there programs that do this? |
Really? I doubt pthread_mutex implementations use thread-local variables in practice. I don't see why they would do that, and TLS is rather costly. |
TLS is cheap, especially from within libc (there's generally a register reserved), and many implementations use it to determine the current thread ID. See e.g. glibc's implementation here. It's convenient to have thread IDs around when you're writing a queued sleeping lock. (They're not needed for a simple spinlock, though). |
Before we argue on the relative merits of the three flavors of mutexes, let's keep in mind that there are at least three different uses of mutexes in OCaml's systhreads: 1- the masterlock, 2- implementing the abstract type To me, it could make sense to have reentrant mutexes protecting I/O buffers, for instance if we want to support functionality equivalent to I could agree with Finally, for the masterlock, speed is crucial, so maybe we don't want any checking there. |
Thanks for the discussion, this is interesting. I agree that statu quo is the safest bet in terms of compatibility. It is important to care about emergent properties, so @jhjourdan and @xavierleroy's points tend to convince me. However, unlocking from the another thread is unsupported at least on Windows, so this will have to remain unspecified. Please decide for me whether the current behaviour has to be preserved. If so, before closing the PR we can try to recycle it. @xavierleroy suggests to make channel locks recursive so that the user may be able to lock channels. I can look into this, but I see the difficulty that it does not work against interference with asynchronous callbacks. It might be possible to work around this if we are careful to maintain only 3 states, and adapt the unlocking code in the loops at #9722 accordingly if it has already been locked twice. Did you have something else in mind? It is also possible to add an optional argument to |
Another issue is that |
Is this expected to work? If so, should we implement Mutex.unlock using something other than |
I am tempted to say "no". In my mind, a mutex must be used in a well-bracketed way, thus be unlocked by the thread that locked it. Otherwise, it's a 0-1 semaphore :-) But it is troubling that we never got to document this assumption!
We can implement Mutex.t as a mutex + a condition variable + a Boolean, effectively implementing a 0-1 semaphore... But this is not good for performance, and feels overkill. |
The Windows implementation is not strong enough - calling let msg s = print_endline s; flush stdout in
let m = Mutex.create () in
Mutex.lock m;
let thread1 = Thread.create (fun m -> Mutex.lock m; msg "thread1 locked m") m in
msg "master release m";
Mutex.unlock m;
Thread.join thread1;
msg "thread1 is dead";
let thread2 = Thread.create (fun m -> Mutex.lock m; msg "thread2 locked m") m in
msg "master release m";
Mutex.unlock m;
Thread.join thread2;
msg "thread2 is dead" succeeds on trunk Windows 10 and Ubuntu 18.04. With this PR it still incorrectly succeeds with Windows 10, but on Ubuntu has the expected new behaviour:
Instead of using |
Right, but |
Then I'd rather use errorcheck mutexes, as proposed in this PR, at least to implement |
@dra27, thanks. It seems also that the mutex implementation can be very different between Windows versions, so we cannot really conclude that this used to accidentally work in the past. I'll have a look at the mingw implementation if we go in this direction. |
I think it should be fine. This function was part of SUSv2 in 1997, and was included in POSIX.1-2001 as part of the XSI extension. The recent change is that it was made mandatory rather than part of an extension, but I'm not aware of any pthread implementations that lack it. |
FTR, https://discuss.ocaml.org/t/mutex-lock-resource-deadlock-avoided-on-freebsd-12-1-ocaml-4-09-1-lwt-4-2-1/6206 provides an example where 1) unlock is called from a different thread, and 2) this fails on some platform. |
I made a different Win32 implementation, see PR #9846. Review welcome. I also have a variant where channels get recursive mutexes, but decided to keep it my freezer until we want to add the equivalent of |
The other version looks better (in the direction of what I had in mind). Therefore I am closing this one. Thanks for taking this off my plate. |
To preserve behaviour, explicit polls are added: - in caml_raise, to raise the right exception when as system call is interrupted by a signal. - in sigprocmask, to ensure that signals are handled as soon as they are unmasked.
From the commit log:
This means that an exception is raised when attempting to lock a mutex
locked from the same thread, e.g. from an asynchronous callback.
This changes the behaviour on Windows where mutexes were recursive.
Add test for deadlock inside asynchronous callbacks.
The main motivation is the detection of deadlocks caused by locking from an asynchronous callback, since this is a "bug at a distance" that can surprise users and be hard to debug. This has occurred many times: #5141, #5299, #7503, #8794...
This PR changes both
Mutex
and the channel locks. This is orthogonal to #9722 which aims to unlock channels before running asynchronous callbacks, and can be considered separately.For POSIX, this converts a deadlock into a
Sys_error
exception, so this preserves compatibility.There is a danger of breaking code for people who wrote win32-only programs and relied on the recursive behaviour of win32 mutexes. Unfortunately, the documentation of
Mutex
is ambiguous about the non-recursive nature of mutexes. If there is a risk to break programs, it may be preferable to leave the windows behaviour unchanged; otherwise to clarify the documentation.Optionally, I can make the runtime lock errorcheck too. At #5299, it was previously discussed to make it recursive, and it was decided against it, and in particular the current windows behaviour is incorrect. This is easy to fix in the same way if desired.
For obvious reasons the Win32 code has not yet been tested. I will rely on CI for compilation and further help is welcome.
(I believe this can interest @xavierleroy and @stedolan.)