-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle thread crash/abort even when main thread is blocking (busy-waiting). NFC #16174
Conversation
f4dc6e8
to
ceeb252
Compare
Still working a on test .. |
// mechanism to send this message since it can allocate (or otherwise itself | ||
// crash) so use a low level atomic primitive for this signal. | ||
if (emscripten_is_main_runtime_thread() && thread_crashed) | ||
abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a thread crashing causing the main thread to abort the desired behavior? Is this equivalent to a thread doing something like a segfault and bringing down the entire process?
What if the main thread never calls emscripten_futex_wait
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is like one of the threads segfaulting.
If the main thread is returning to the even loop there is no issue. This patch is only for the case when the main thread is blocking/looping and not returning (e.g. pthread_join).
ceeb252
to
b910c50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, that makes sense now. Checking for crashes in futex_wait is sufficient because we're assuming that the main thread will call pthread_join at some point.
b910c50
to
fcbe96e
Compare
All our blocking primitives (not just pthread_join) bottom out here in futex_wait. If the thread is busy in a compute loop we can't do anything.. but such a program has worse problems to deal with. |
fcbe96e
to
22f0d22
Compare
…ting). NFC Previously if a thread crashed when the main thread was blocked on something (such as pthread_join) it would never get the `onerror` message because it would never return the event loop.
22f0d22
to
d072e29
Compare
// Return the event loop so we can handle the message from the crashed | ||
// thread. | ||
emscripten_unwind_to_js_event_loop(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the call to futex_wait_busy
below? should we not check this in the loop there as well? (it is in JS so it's harder - not sure if worth it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It recently got moved to native code just above!
That reason I didn't bother with that is that all the call sites that we care about already time slice calls to emscripten_futex_wait anyway.
Its a little odd that we have the time slice happening both outside of emscripten_futex_wait
and also inside of emscripten_futex_wait
(but we only do that for if we choose the futex_wait_busy path). I've been thinking about trying to eliminate one or the other since two it at two levels seems wrong.. but its not obvious which place is that right place to to it.
Separely, I think there is a different way we might want to factor this stuff. Create a new API called _emscripten_thread_tick
which calls both emscripten_main_thread_process_queued_calls
and runs checks like this. There are possibly other things we might want to do here too. Then change basically all the emscripten_main_thread_process_queued_calls
calls to _emscripten_thread_tick
.. that would automatically mean doing this new check in all the right places (i.e. whenever a thread is idle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, great! Very nice.
Previously if a thread crashed when the main thread was blocked on
something (such as pthread_join) it would never get the
onerror
message because it would never return the event loop.
Without this change the main thread will simple loop forever after the
secondary thread crashes.