Usage of pthread in a Node.js app never lets it exit #12801

RReverser · 2020-11-17T17:50:12Z

There is something wrong with the way Emscripten uses threads on Node.js. Looks like any usage prevents Node app from ever exiting.

Just tried the simplest example:

#include <thread>
#include <iostream>

int main() {
	std::thread t([] {
		std::cout << "Hello from another thread\n";
	});
	t.join();
	return 0;
}

Compiled with:

> emcc temp.cc -o temp.js -pthread -s PTHREAD_POOL_SIZE=4

This results in:

> node --experimental-wasm-threads --experimental-wasm-bulk-memory temp
Hello from another thread
Pthread 0x705d10 exited.
[stuck here]

Looking at https://nodejs.org/api/worker_threads.html, I suspect Emscripten needs to call .unref() to indicate that it's okay to exit as soon as reference is unreachable, but probably doesn't do that yet?

The text was updated successfully, but these errors were encountered:

emaxx-google · 2020-11-18T02:24:42Z

Is this supposed to work without PROXY_TO_PTHREAD? See https://emscripten.org/docs/porting/pthreads.html#additional-flags

RReverser · 2020-11-18T02:26:25Z

@emaxx-google Yes. PROXY_TO_PTHREAD is a separate feature that is built on top of PThread emulation and allows to move main to a separate thread. This bug is about regular PThread emulation.

emaxx-google · 2020-11-18T02:29:36Z

I'm not an Emscripten expert, but the same page describes in a bit more detail:

The Emscripten implementation for the pthreads API should follow the POSIX standard closely, but some behavioral differences do exist:

When pthread_create() is called, if we need to create a new Web Worker, then that requires returning the main event loop. That is, you cannot call pthread_create and then keep running code synchronously that expects the worker to start running - it will only run after you return to the event loop. This is a violation of POSIX behavior and will break common code which creates a thread and immediately joins it or otherwise synchronously waits to observe an effect such as a memory write.

RReverser · 2020-11-18T12:56:18Z

@emaxx-google That's not the issue here. I already know how pthreads work in Emscripten :) What that paragraph is describing is the reason why you need PTHREAD_POOL_SIZE which I'm already passing.

juj · 2020-11-18T13:55:47Z

Try building with -s PTHREAD_POOL_SIZE=1 -s EXIT_RUNTIME=1 linker flags. Does that help?

RReverser · 2020-11-18T14:52:58Z

@juj Hmm it does, but that's not necessary in a non-pthread builds. Why is adding pthread different?

RReverser · 2020-11-18T14:54:21Z

[semi-answering my own question] IIRC the difference is that EXIT_RUNTIME forcibly does process.exit() or something like that, which is not exactly what we want here - what we really want is for Node to run event loop to completion and exit naturally, but right now it seems it can't due to hanging workers.

juj · 2020-11-18T15:08:54Z

The behavior is likely caused by this line:

emscripten/src/postamble.js

Line 441 in 5568b81

noExitRuntime = true;

. If you remove that line, it should also cause the code to exit, even if you don't specify -s EXIT_RUNTIME=1. Does that also do a forcible process.exit()?

You can also try running PThread.terminateAllThreads(); JS function if you think the issue is caused by hanging Workers. That will tear down all Workers.

RReverser · 2020-11-18T15:16:34Z

. If you remove that line, it should also cause the code to exit, even if you don't specify -s EXIT_RUNTIME=1. Does that also do a forcible process.exit()?

Yeah, they're equivalent AFAIK:

emscripten/src/postamble.js

Line 197 in c2462cd

exit(ret, /* implicit = */ true);

juj · 2020-11-18T15:22:31Z

That should be the only difference with respect to process teardown in st and mt builds. If in that mode there is a forcible shutdown with process.exit() happening, then it should also be happening in non-multithreaded builds. Not sure where such a forcible shutdown is happening, though Emscripten at least does not emit a call to process.exit() API into the generated build.

RReverser · 2020-11-18T15:31:36Z

I actually now wonder if this is "expected behaviour"... if issue is caused by hanging Workers, then there are only two choices:

Emscripten tears them down at the end of main. Node.js exits gracefully when JS is used as a process, but if user imports this JS as a module and tries to call other functions, they can no longer [easily] create threads because Worker pool is empty.
Emscripten doesn't tear them down, Workers remain hanging and can be used by further calls into the module, but when used as a process, Node.js can't shut down - this is what happens now.

I do think that it's possible to get a middle ground by looking into those .unref() - Node.js should be smart enough to figure out that, when main JS has finished execution, workers can no longer receive messages and don't need to wait anymore. Right now it seems like they hang around in case something will send them a message and they need to do more work.

RReverser · 2020-11-18T15:36:11Z

So yeah, I can reproduce by only loading the .worker.js part from a custom JS instead of Emscripten glue:

const {Worker} = require('worker_threads');
let w = new Worker('./temp.worker.js');

This also hangs forever, probably because Worker still waits for messages.

RReverser · 2020-11-18T15:39:17Z

@addaleax I admit I don't fully understand the .ref() / .unref() mechanics of worker_threads, maybe you can help?

Basically, the question is - is there a way to make sure that Workers are torn down once the Worker object is GC'd?

Right now it looks like a reference cycle keeping both main thread and the Worker alive.

addaleax · 2020-11-18T15:47:13Z

Basically, the question is - is there a way to make sure that Workers are torn down once the Worker object is GC'd?

No, that never happens. Worker objects cannot be GC’ed before they are terminated because they are GC roots because they receive events from the event loop.

I admit I don't fully understand the .ref() / .unref() mechanics of worker_threads, maybe you can help?

If you call .unref() on a Worker, then that means that they will not keep the event loop alive on its own, i.e. this works the same as .ref()/unref() on network sockets, timers, etc.

This also hangs forever, probably because Worker still waits for messages.

That sounds very likely, yes, but I wouldn’t know what we could do about this on the Node.js side.

RReverser · 2020-11-18T15:58:29Z

Worker objects cannot be GC’ed before they are terminated because they are GC roots because they receive events from the event loop.

Huh. But that's not how they work in browsers? Can't Workers be tied to the parentPort instead so that, when that side is collected, Worker stops waiting for messages too?

addaleax · 2020-11-18T16:04:54Z

@RReverser What makes you think that browsers behave differently? I think the behavior you want would require cross-thread heap reference tracking, which doesn’t exist.

Can't Workers be tied to the parentPort instead so that, when that side is collected, Worker stops waiting for messages too?

parentPort also can’t be GC’ed because it’s also on the event loop because the parent thread might send a message at any point.

RReverser · 2020-11-18T16:09:46Z

What makes you think that browsers behave differently?

I think I saw it somewhere, but I'll need to dig to find the reference.

the parent thread might send a message at any point

Right, but if the parent thread has exited (which it can e.g. in my last example without onmessage handler on the main thread), surely the Worker can get notified about that and get unreferenced?

Or, more generally, when Worker object is collected, it can send a notification to the actual worker thread to stop listening?

addaleax · 2020-11-18T16:12:40Z

@RReverser If the parent thread exits, that also forcibly stops all child threads, there’s no unreferencing possible or necessary.

Or, more generally, when Worker object is collected, it can send a notification to the actual worker thread to stop listening?

Just to reiterate, Worker objects do not get collected until they are actually stopped because they can generate events at any point, just like network sockets and other event loop objects.

RReverser · 2020-11-18T16:15:58Z

If the parent thread exits, that also forcibly stops all child threads, there’s no unreferencing possible or necessary.

Hmm, then why is Node still hanging in the example above?

addaleax · 2020-11-18T17:00:43Z

@RReverser Because the parent doesn’t exit, because the Worker is still there and can emit events.

RReverser · 2020-11-18T17:10:51Z

But parent is not subscribed to Worker events in this case? To clarify, I'm referring to

So yeah, I can reproduce by only loading the .worker.js part from a custom JS instead of Emscripten glue:
const {Worker} = require('worker_threads');
let w = new Worker('./temp.worker.js');
This also hangs forever, probably because Worker still waits for messages.

addaleax · 2020-11-18T17:14:30Z

Ahh, I see. I guess that’s a fair point then … in Node.js, what currently happens when there’s an uncaught exception inside a Worker is that an 'error' event is emitted, which is always visible behavior in the parent thread, so we’re always subscribed to that by default.

Would it be acceptable to explicitly use .unref() here?

RReverser · 2020-11-19T14:29:17Z

Would it be acceptable to explicitly use .unref() here?

I don't know... I mean, it's certainly possible in some simple cases like above to try and find places where explicit .unref() is necessary, but it feels like manual memory management and kinda un-JavaScript-y.

In cases like Emscripten's it's a lot harder, because there is an actual reference cycle (main thread and a Worker waiting for each other) which traditionally can only be fixed with a GC (or hard termination).

stale · 2022-04-17T19:21:35Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

wheresthecode · 2022-10-15T19:20:10Z

I pretty new to Emscripten and javascript altogether, but I also ran into this problem. I am compiling a library with no main function and then I have a javascript with some Jest tests in it. Once all my tests have run and I am certain all my threads have executed I am calling PThread.reminateAllThreads which allows node to exit. The code looks like this:
// afterAll is a Jest function that is called after all tests have run
afterAll(() => {
createModule().then((m) => m.PThread.terminateAllThreads());
});

I'm not sure about the safety of doing this, but it has worked for me so far

sbc100 · 2022-10-17T22:05:43Z

Does adding -sEXIT_RUNTIME to the command line also work for you @wheresthecode ?

RReverser · 2022-10-27T13:25:35Z

Working on yet another project where this bites me and also had to apply Mocha workaround like @wheresthecode. @sbc100 Unfortunately, in library case like @wheresthecode's EXIT_RUNTIME doesn't help because rather than using main function, we want to stop pthreads when library is gone.

However, I no longer think Emscripten can fix this on its side, since it has no way of knowing whether the library will be called into again, and, thus, whether it will need to use threads from the pthread pool again :(

kleisauke · 2022-10-27T14:27:55Z

wasm-vips also had this issue on non-browser environments. It currently exposes a shutdown function that calls exitRuntime() for this. See for example:
https://github.com/kleisauke/wasm-vips/blob/265abc665043fbefd7809c8d2a4d5456a7fbb211/test/unit/node-helper.js#L35-L38
https://github.com/kleisauke/wasm-vips/blob/265abc665043fbefd7809c8d2a4d5456a7fbb211/lib/vips.d.ts#L105-L110
https://github.com/kleisauke/wasm-vips/blob/265abc665043fbefd7809c8d2a4d5456a7fbb211/src/vips-emscripten.cpp#L444-L447
https://github.com/kleisauke/wasm-vips/blob/265abc665043fbefd7809c8d2a4d5456a7fbb211/src/vips-emscripten.cpp#L62-L64

sbc100 · 2022-10-27T17:55:18Z

Working on yet another project where this bites me and also had to apply Mocha workaround like @wheresthecode. @sbc100 Unfortunately, in library case like @wheresthecode's EXIT_RUNTIME doesn't help because rather than using main function, we want to stop pthreads when library is gone.

In that case perhaps the library could require exit() to be called explicitly.. at which point all the threads would be killed. The normal C exit function could be exported for this purpose.

However, I no longer think Emscripten can fix this on its side, since it has no way of knowing whether the library will be called into again, and, thus, whether it will need to use threads from the pthread pool again :(

RReverser · 2022-11-10T19:44:30Z

However, I no longer think Emscripten can fix this on its side

I think I might have been convinced otherwise and this is actually something that can be fixed on Emscripten side; not 100% sure yet, but going to experiment in the next couple of days.

kleisauke · 2022-11-17T13:22:28Z

While working on PR #18201, I noticed that using Wasm Workers (the -sWASM_WORKERS compile/link flag) didn't have this issue.

Somehow (perhaps due to the use of Atomics.waitAsync()?), it can also spawn two workers on main() without causing a deadlock (afaik, -pthread needed to link with either -sPTHREAD_POOL_SIZE=2 or -sPROXY_TO_PTHREAD for that).

emscripten/test/core/test_stdio_locking.c

Lines 80 to 86 in 7307f41

    
           worker[0] = emscripten_malloc_wasm_worker(/*stack size: */1024); 
        
           worker[1] = emscripten_malloc_wasm_worker(/*stack size: */1024); 
        
           emscripten_wasm_worker_post_function_v(worker[0], (void (*))thread_main); 
        
           emscripten_wasm_worker_post_function_v(worker[1], (void (*))thread_main); 
        
           // Terminate both workers after a small delay 
        
           emscripten_set_timeout(terminate_worker, 1000, 0);

So, an alternative route is to implement the pthread API on top of Wasm Workers as discussed in #12833 (comment).

RReverser · 2022-11-17T13:36:49Z

Somehow (perhaps due to the use of Atomics.waitAsync()?), it can also spawn two workers on main() without causing a deadlock

Right, it doesn't deadlock because the API requirement are different from pthreads. Pthreads depend on being able to spawn threads synchronously & block on shared memory, so Emscripten implementation has to make it possible too, but Wasm Workers don't have that compat restriction as it's a brand-new API and can require it to be async. #9910 would have a similar effect if implemented.

Anyway, I've ben working and going to send a PR fixing this issue for most common use-cases today, with some others either being fixed later or for now left to end users.

kleisauke · 2022-11-17T14:29:21Z

Great, thanks for doing this!

It would indeed be great if issue #9910 is also addressed in the future. There's now a -sASYNCIFY=2 experimental mode which depend upon the https://github.com/WebAssembly/stack-switching proposal, so perhaps the Asyncify overhead would be minimal in the future. :)

sbc100 · 2022-11-17T16:17:48Z

@brendandahl, another use case of -sASYNCIFY=2 here.. the ability to start new threads/workers without returning to the event loop.

RReverser · 2022-11-17T16:52:30Z

To be precise, #9910 is, this issue itself is not relevant to Asyncify.

@d3lm

Fixes #12801 for majority of cases. This is a relatively simple change, but it took embarrassingly many attempts to get it in the right places for all obscure tests to pass + to figure out which tests can make use of it instead of doing manual exit + to debug some apparent differences in Node Worker GC behaviour between Windows/Linux as a bonus. I tried two approaches in parallel, a conservative one in this PR and one that brings Emscripten behaviour closer to native in a separate branch. In ideal scenario, I wanted to make Node.js apps behave just like native, where background threads themselves don't keep an app open, and instead app lives as long as it explicitly blocks on `pthread_join` or other blocking APIs. However, it's a more disruptive change that still requires more work and testing, as some Emscripten use-cases implicitly depend on the app running despite not having any more foreground work to do - one notable example is `PROXY_TO_PTHREAD` that spawns a detached thread, but obviously wants the app to continue running. All those cases are fixable, but, as said above, requires more work so I'm keeping it aside for now. Instead, in this PR I'm adding a .ref/.unref "dance" (h/t @d3lm for the original idea) that keeps the app alive as long as _any_ pthreads are running, whether joinable or detached, and whether you have explicit blocking on them or not. It works as following: - Upon creation, all pool workers are strongly referenced as we need to wait for them to be properly loaded. - Once worker is loaded, it's marked as weakly referenced, as we don't want idle workers to prevent app from exiting. - Once worker is associated with & starts running a pthread, it's marked as strongly referenced so that the app stays alive as long as it's doing some work. - Once worker is done and returned to the idle worker pool, it's weakly referenced again. This ensures maximum compatibility, while fixing majority of common cases. One usecase it doesn't fix is when a C/C++ app itself has an internal singletone threadpool (like one created by glib) - in this case there's no way for Emscripten to know that those "running" threads are actually semantically idle. This would be fixed by the more rigorous alternative implementation mentioned above, but, for now, such usecases can be relatively easily worked around with a bit of custom `--pre-js` that goes over all `PThread.runningWorkers` and marks them as `.unref`d. That's what I did in an app I'm currently working on, and it works pretty well. To avoid reaching into JS internals, we might consider adding an `emscripten_`-prefixed API to allow referencing/unreferencing Worker via a `pthread_t` instance from the C code, but for now I'm leaving it out of scope of this PR.

RReverser added the multithreading label Nov 17, 2020

kleisauke mentioned this issue Nov 22, 2020

Exiting pthreads postMessage to report the exit, which requires the main thread to receive events in order to reuse them #12400

Open

kleisauke mentioned this issue Jan 7, 2021

Ensure Node worker threads are exited gracefully #12963

Merged

Sociosarbis mentioned this issue Apr 2, 2021

Should worker threads exit? ffmpegwasm/ffmpeg.wasm#136

Open

stale bot added the wontfix label Apr 17, 2022

stale bot removed the wontfix label Oct 15, 2022

RReverser mentioned this issue Nov 17, 2022

Prevent idle Workers from keeping Node.js app alive #18227

Merged

RReverser closed this as completed in #18227 Nov 18, 2022

Usage of pthread in a Node.js app never lets it exit #12801

Usage of pthread in a Node.js app never lets it exit #12801

Comments

RReverser commented Nov 17, 2020 • edited

emaxx-google commented Nov 18, 2020

RReverser commented Nov 18, 2020

emaxx-google commented Nov 18, 2020 • edited

RReverser commented Nov 18, 2020

juj commented Nov 18, 2020

RReverser commented Nov 18, 2020

RReverser commented Nov 18, 2020

juj commented Nov 18, 2020

RReverser commented Nov 18, 2020

juj commented Nov 18, 2020 • edited

RReverser commented Nov 18, 2020 • edited

RReverser commented Nov 18, 2020

RReverser commented Nov 18, 2020 • edited

addaleax commented Nov 18, 2020

RReverser commented Nov 18, 2020

addaleax commented Nov 18, 2020

RReverser commented Nov 18, 2020

addaleax commented Nov 18, 2020

RReverser commented Nov 18, 2020

addaleax commented Nov 18, 2020

RReverser commented Nov 18, 2020

addaleax commented Nov 18, 2020

RReverser commented Nov 19, 2020

stale bot commented Apr 17, 2022

wheresthecode commented Oct 15, 2022

sbc100 commented Oct 17, 2022

RReverser commented Oct 27, 2022

kleisauke commented Oct 27, 2022

sbc100 commented Oct 27, 2022

RReverser commented Nov 10, 2022

kleisauke commented Nov 17, 2022

RReverser commented Nov 17, 2022 • edited

kleisauke commented Nov 17, 2022

sbc100 commented Nov 17, 2022

RReverser commented Nov 17, 2022

RReverser commented Nov 17, 2020 •

edited

emaxx-google commented Nov 18, 2020 •

edited

juj commented Nov 18, 2020 •

edited

RReverser commented Nov 18, 2020 •

edited

RReverser commented Nov 18, 2020 •

edited

RReverser commented Nov 17, 2022 •

edited