New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mozart2 hangs #305
Comments
vm tests hang on travis, not sure why. See mozart#305 for follow-up discussion.
That's odd, I remember the I'd try just commenting out the hanging test if you can reproduce, maybe it's just one of them having buggy termination. |
Ozwish is indeed not involved. And it is most probably one vm that does not close properly. If the ozemulator can hang randomly, then I guess that N vms will hang N-times more randomly-often ;-). |
@eregon The issue is quite easy to reproduce with With gdb, you get
Does it mean something to you ? |
So that sounds like it's waiting on an open IO, maybe simply stdin? |
To me it looks more like a bug/misuse of boost asio. There remains only one blocked thread, with nothing to wake it up. It looks like a worker thread waiting on an empty work queue. There are several simmilar issues like https://stackoverflow.com/questions/41804866/asio-on-linux-stalls-in-epoll https://svn.boost.org/trac10/ticket/11069 |
Funny enough, running vm.ozf like I did above does nothing except declaring a big record. In fact I am able to reproduce the issue with
if I run enough instance in parallel. Apparently there needs to be some heavy load on the machine. Looks more and more like a race condition around asio usage. |
@layus Which Boost version are you using? |
@eregon boost 1.67. clang -fthread-sanitizer to the rescue! Does it mean something to you ?
|
(just a note for my future self) This may be of interest: https://stackoverflow.com/questions/43168199/cancelling-boost-asio-deadline-timer-safely#43169596 |
Bottom line: timers are not thread-safe. Because we restart the timer within the handler (which runs in the io thread) we must only access this handler from the io thread. See #307 for details. Moreover, there is the possibility of a small race condition when cancelling a timer. It may appear that the cancel() does nothing if it occurs right after the timer expiration, but before it's restart in the onPreemptionTimerXXX handler. I had to use the trick described in http://stackoverflow.com/questions/43168199/cancelling-boost-asio-deadline-timer-safely#43169596 to achieve a safe cancellation. deleting the timer was also a puzzle, but I think I got it right. |
Mozart2 sometimes hangs instead of exiting when all oz code is finished.
This occurs with gui applications (because of the ozwish pipe not closing properly) and also randomly on other programs.
It was already reported for "vm" tests.
The text was updated successfully, but these errors were encountered: