-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Make Node.js pthreads behavior in line with native #19073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FWIW I'm definitely in favour of this change, but, to be clear, one concern was/is that it's potentially a breaking one: someone who isn't even using Hopefully no one is really doing this in the wild, and there isn't really that many Node.js users of Emscripten anyway, but, unlike with previous linked iterations, there is some small risk involved. |
dac5e90
to
fbbc36d
Compare
This partially reverts commit 9cdd7b3.
test/test_core.py
Outdated
@@ -2884,7 +2884,7 @@ def test_pthread_abort(self): | |||
|
|||
@node_pthreads | |||
def test_pthread_abort_interrupt(self): | |||
self.set_setting('PTHREAD_POOL_SIZE', 1) | |||
self.set_setting('PROXY_TO_PTHREAD') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would want to investigate why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed a bit curious that this is needed. Linking with -sPTHREAD_POOL_SIZE=1
and -sEXIT_RUNTIME
doesn't work either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular test case is calling abort()
in its pthread entry point, which causes a WebAssembly.RuntimeError
to be thrown that is supposed to be propagated back to the main thread via the 'error'
event. However, the worker is calling _emscripten_thread_crashed()
just before throwing this exception:
Lines 278 to 281 in 574f4f4
if (Module['__emscripten_thread_crashed']) { | |
Module['__emscripten_thread_crashed'](); | |
} | |
throw ex; |
As a result, the main thread immediately exits while the runtime is kept alive:
emscripten/system/lib/pthread/emscripten_yield.c
Lines 31 to 35 in 574f4f4
if (thread_crashed) { | |
// Return the event loop so we can handle the message from the crashed | |
// thread. | |
emscripten_exit_with_live_runtime(); | |
} |
Since the worker is weakly referenced, the program exits too early, causing the test to fail.
A possible solution to this is to make the thread strongly referenced during pthread_join()
:
emscripten/system/lib/libc/musl/src/thread/pthread_join.c
Lines 21 to 23 in 42df09e
// Mark the thread as strongly referenced so that Node.js doesn't exit | |
// while the pthread is running. | |
_emscripten_thread_set_strongref(t, 1); |
This ensures that the program does not exit too early when a joinable thread crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit ca5f97c improves this.
@@ -49,5 +49,8 @@ EMSCRIPTEN_KEEPALIVE int _emscripten_proxy_main(int argc, char** argv) { | |||
pthread_t thread; | |||
int rc = pthread_create(&thread, &attr, _main_thread, NULL); | |||
pthread_attr_destroy(&attr); | |||
if (rc == 0) { | |||
emscripten_thread_set_strongref(thread, 1); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a call to runtimeKeepalivePop
in exitOnMainThread
that is supposed to take are of bringing down the app once the proxy'd main ends.
I have an stalled change that tried to recfactor that: #18770
The corresponding push is in callMain:
#if PROXY_TO_PTHREAD
// With PROXY_TO_PTHREAD make sure we keep the runtime alive until the
// proxied main calls exit (see exitOnMainThread() for where Pop is called).
{{{ runtimeKeepalivePush() }}}
#endif
On other words aren't already forcing the runtime to stay alive as long as the proxied main is running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On other words aren't already forcing the runtime to stay alive as long as the proxied main is running?
IIUC, the runtimeKeepalivePush()
/ runtimeKeepalivePop()
mechanism would only keep the Emscripten runtime alive, but still allows Node.js to exit (i.e. it doesn't keep the event loop of Node.js alive). For example:
$ cat <<EOT > test.c
#include <emscripten/emscripten.h>
int main() {
emscripten_exit_with_live_runtime();
// Should never get here
__builtin_trap();
}
EOT
$ emcc -O3 test.c -o test.js -sEXIT_RUNTIME
$ node test.js
$ echo $?
0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in scope of this PR, but I think one should imply the other. I was pretty surprised when I ran into that behaviour for the first time and noticed that runtimeKeepAlivePush didn't keep Node from exiting.
worker.ref(); | ||
// Once we start executing a pthread, we need to mark the worker as weakly | ||
// referenced, this ensures that Node.js may exit while a pthread is running. | ||
worker.unref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just set them to unreferenced on first creation then? Or are you saying that an unused worker is strongly referenced and used one only weakly? That seems a odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just set them to unreferenced on first creation then?
I considered making the worker weakly referenced during the 'loaded'
event, i.e.:
Patch
--- a/src/library_pthread.js
+++ b/src/library_pthread.js
@@ -287,11 +287,9 @@ var LibraryPThread = {
} else if (cmd === 'loaded') {
worker.loaded = true;
#if ENVIRONMENT_MAY_BE_NODE
- // Check that this worker doesn't have an associated pthread.
- if (ENVIRONMENT_IS_NODE && !worker.pthread_ptr) {
- // Once worker is loaded & idle, mark it as weakly referenced,
- // so that mere existence of a Worker in the pool does not prevent
- // Node.js from exiting the app.
+ if (ENVIRONMENT_IS_NODE) {
+ // Once the worker is loaded, mark it as weakly referenced, so
+ // that its existence does not prevent Node.js from exiting.
worker.unref();
}
#endif
@@ -686,13 +684,6 @@ var LibraryPThread = {
// in this file, and not from the external worker.js.
msg.moduleCanvasId = threadParams.moduleCanvasId;
msg.offscreenCanvases = threadParams.offscreenCanvases;
-#endif
-#if ENVIRONMENT_MAY_BE_NODE
- if (ENVIRONMENT_IS_NODE) {
- // Mark worker as weakly referenced once we start executing a pthread,
- // so that its existence does not prevent Node.js from exiting.
- worker.unref();
- }
#endif
// Ask the worker to start executing its pthread entry point function.
worker.postMessage(msg, threadParams.transferList);
But this 'loaded'
event could occur after pthread_create()
returns, which would mean it could break such constructions:
pthread_create(&thread, NULL, worker_thread, NULL);
_emscripten_thread_set_strongref(thread, 1);
(i.e. the same construction that PROXY_TO_PTHREAD
currently uses).
Or are you saying that an unused worker is strongly referenced and used one only weakly?
I'm not sure if I understand this question, but idle/unused workers (i.e. workers not hosting a pthread) are already weakly referenced within the pool.
@@ -18,6 +18,9 @@ static int __pthread_timedjoin_np(pthread_t t, void **res, const struct timespec | |||
// TODO: The detached check here is just to satisfy the | |||
// `other.test_{proxy,main}_pthread_join_detach` tests. | |||
if (t->detach_state != DT_DETACHED && __pthread_self() == t) return EDEADLK; | |||
// Mark the thread as strongly referenced so that Node.js doesn't exit | |||
// while the pthread is running. | |||
_emscripten_thread_set_strongref(t, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does one thread joining with another thread effect whether we want to keep node alive or not?
Sure if the main thread returns in brings down all threads even if they are waiting on each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a workaround for the failing test_pthread_abort_interrupt
testcase. Commit ca5f97c improves this and removes the need to do this in pthread_join()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context: #19073 (comment)
src/library_pthread.js
Outdated
@@ -575,6 +575,15 @@ var LibraryPThread = { | |||
else postMessage({ 'cmd': 'cleanupThread', 'thread': thread }); | |||
}, | |||
|
|||
_emscripten_thread_set_strongref: function(thread, strongRef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be marked as __proxy: sync
since I don't see how it could work on anything but the main runtime thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this __proxy: 'sync'
annotation with commit 059448a as this function is now only called on the main runtime thread.
// Mark worker as strongly referenced once we start executing a pthread, | ||
// so that Node.js doesn't exit while the pthread is running. | ||
worker.ref(); | ||
// Mark worker as weakly referenced once we start executing a pthread, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I don't understand about this comment "Mark worker as weakly referenced once we start executing a pthread" is that workers are already weakly referenced when they are in the pool, so why does anything need to change once we start executing a pthread. They are weakly referenced both before and after they start executing a thread, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, only idle/unused workers are weakly referenced within the pool (i.e. when -sPTHREAD_POOL_SIZE=x
is specified).
emscripten/src/library_pthread.js
Lines 292 to 295 in 22893f5
// Once worker is loaded & idle, mark it as weakly referenced, | |
// so that mere existence of a Worker in the pool does not prevent | |
// Node.js from exiting the app. | |
worker.unref(); |
emscripten/src/library_pthread.js
Lines 216 to 219 in 22893f5
// Once a pthread has finished and the worker becomes idle, mark it | |
// as weakly referenced so that its existence does not prevent Node.js | |
// from exiting. | |
worker.unref(); |
But when -sPTHREAD_POOL_SIZE=x
is omitted (which should be safe on Node.js after commit 73eaf81) or when an application spawns more threads than specified within that setting (i.e. thread pool exhaustion), workers are strongly referenced by default, so this .unref()
is still necessary for those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instread fix the workers are strongly referenced by default
thing?
IRC the intent prior this change is that they should be weakly referenced until they are running a pthread, and then strongly referenced one a pthread is running.
After this change they should never be strongly referenced so should be .unref()
as soon as we create them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to simplify this with commit 059448a.
After this change they should never be strongly referenced so should be
.unref()
as soon as we create them?
AFAIK it's only safe to .unref()
the worker during the 'loaded'
event or just before executing the pthread entry point function (i.e. only after the .on('message')
listeners are attached), it will not work if you .unref()
too early. See for example this program:
Details
const { Worker, isMainThread, parentPort } = require('worker_threads');
if (isMainThread) {
const worker = new Worker(__filename);
worker.on('error', (err) => { throw err; });
worker.on('exit', () => console.log('exit'));
worker.on('message', (msg) => console.log(msg))
worker.postMessage('From main thread!')
/* setTimeout(() => */worker.unref()/* , 100); */
} else {
parentPort.on('message', (msg) => console.log(msg));
parentPort.postMessage('From worker!');
}
As a possible enhancement, perhaps we could make PROXY_TO_PTHREAD
no-op on Node.js (i.e. just call the original main)? This would be safe after commit 73eaf81, and would allow further simplification.
a3c75cb
to
22893f5
Compare
|
||
void _emscripten_thread_crashed() { | ||
thread_crashed = true; | ||
crashed_thread_id = pthread_self(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not just call _emscripten_thread_set_strongref(pthread_self())
here and avoid the change to _emscripten_yield
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but then I noticed this comment:
emscripten/system/lib/pthread/emscripten_yield.c
Lines 25 to 29 in 96d3452
// When a secondary thread crashes, we need to be able to interrupt the main | |
// thread even if it's in a blocking/looping on a mutex. We want to avoid | |
// using the normal proxying mechanism to send this message since it can | |
// allocate (or otherwise itself crash) so use a low level atomic primitive | |
// for this signal. |
So since this function is called from a crashed thread, I don't think we can use the normal proxying mechanism here (i.e. __proxy: 'sync'
) .
src/library_pthread.js
Outdated
// as weakly referenced so that its existence does not prevent Node.js | ||
// from exiting. | ||
#if ENVIRONMENT_MAY_BE_NODE && PROXY_TO_PTHREAD | ||
if (ENVIRONMENT_IS_NODE && worker.proxied_main) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if this read more like:
// Even threads that are strongly referenced, such as the proxied main thread
// or threads that are being kept alive long enough to proxy their error state
// should be unref() once that are returned to the poll
if (worker.strongly_referenced) {
worker.unref();
worker.is_strongly_referenced = false;
}
Then _emscripten_thread_set_strongref
could set is_strongly_referenced
to true and PROZY_TO_PTHREAD
could explictly call _emscripten_thread_set_strongref
on main's pthread instead of adding the magic extra argument to __pthread_create_js
. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better I think we can just unconditionally call worker.unref();
here! Since according the docs "If the worker is already unref()ed calling unref() again has no effect." (https://nodejs.org/api/worker_threads.html#workerunref)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that extra argument added to __pthread_create_js
was a bit of an weird way to identify the proxied main. I reverted those commits and instead added a comment to clarify that .ref()
/.unref()
is backed by a boolean.
This change was generated using: $ ./test/runner other.*code_size* other.*metadce* --rebase
8025fdd
to
96d3452
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've very excited to see this change land!
src/library_pthread.js
Outdated
_emscripten_thread_set_strongref: function(thread) { | ||
#if ENVIRONMENT_MAY_BE_NODE | ||
if (!ENVIRONMENT_IS_NODE) return; | ||
PThread.pthreads[thread].ref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a two line function like this I think maybe the positive check is more readable:
if (ENVIRONMENT_IS_NODE) {
PThread.pthreads[thread].ref();
}
(and maybe fewer bytes of code?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with 8f2bc84, there were no changes in the code size tests.
if (rc == 0) { | ||
// Mark the thread as strongly referenced, so that Node.js doesn't exit | ||
// while the pthread is running. | ||
_emscripten_thread_set_strongref(thread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one alternative to adding this new API would be to add a custom pthread attr to signal this? However, since this is an internal API we can always experiment with changing that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with a custom pthread attr (similar to emscripten_pthread_attr_settransferredcanvases()
), but that was more complicated than doing this. _emscripten_thread_set_strongref()
is also used for crashed threads, so it can't be removed in favor of a custom pthread attr.
// Once a pthread has finished and the worker becomes idle, mark it | ||
// as weakly referenced so that its existence does not prevent Node.js | ||
// from exiting. | ||
// Once the proxied main thread has finished, mark it as weakly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here is specific to the proxied main but the _emscripten_thread_set_strongref
seems like a generic function.
Perhaps the comment could say
// In case a thread was marked as strongly referenced using
// _emscripten_thread_set_strongref (for example, the proxied main thread)
// we need to ensure that its un-referenced when we the thread exits and
// the works is returned to the pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This .unref()
is conditional guarded by PROXY_TO_PTHREAD
, so I think the comment still makes sense.
In other words, we can be sure that all threads are already weakly referenced when the worker is returned to the pool, expect for the "main" thread in PROXY_TO_PTHREAD
mode; since that one is strongly referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I hadn't noticed that this block was now only PROXY_TO_PTHREAD
.
@@ -571,6 +572,13 @@ var LibraryPThread = { | |||
else postMessage({ 'cmd': 'cleanupThread', 'thread': thread }); | |||
}, | |||
|
|||
_emscripten_thread_set_strongref: function(thread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a comment here about how the sole current use case for this function is for keeping the "main" thread alive in PROXY_TO_PTHREAD mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed with 8f2bc84.
// Mark the crashed thread as strongly referenced so that Node.js doesn't | ||
// exit while the pthread is propagating the uncaught exception back to | ||
// the main thread. | ||
_emscripten_thread_set_strongref(crashed_thread_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this change might help with the race conditions we've been seeing in certain pthread tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is possible that by the time we get here, the crashed_thread_id
has already be cleaned up and that crashed_thread_id
is no longer a valid thread pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think this change might help with the race conditions we've been seeing in certain pthread tests.
Did you mean issue #15014? I think that race condition is still there, I just responded to that issue.
Is is possible that by the time we get here, the
crashed_thread_id
has already be cleaned up and thatcrashed_thread_id
is no longer a valid thread pointer?
Given that crashed threads are never cleaned up (i.e. they do not call _emscripten_thread_exit
), I think we can be sure that the pthread and corresponding worker are still alive up once it arrives here.
There could be a possibility that the uncaught exception will be re-thrown on the main thread before it ever arrives here, but since the program will be finished by then, I don't think that would be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth adding a ChangeLog entry for?
Feel free to merge either way, we can always add one later. I'm excited to see this land.
By marking workers as weakly referenced. This corresponds to the ideal scenario as mentioned in commit a9cbf47.
(originally based on commit e6f86a8 and 4b4faf9)
This passes the test suite locally, except forFixed with commit ca5f97c.test_pthread_abort_interrupt
./cc @RReverser