-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Prevent idle Workers from keeping Node.js app alive #18227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I haven't had a chance to look at the code but thank you for working on this! I'm excited to get it fixed. One question that popped into my mind: Would simply removing the thread pooling on node also fix the issue? Do we need thread pooling on node? If out node thread implementation is mostly for testing perhaps we don't need to care about the startup code of new threads and we can create a new worker each time? I guess the downside of doing that is that we have less parity with the browser tests so might catch fewer bugs in node tests? Also, for the case of glib-based apps that use thread pools, wouldn't easiest thing do for them be to build with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Surprisingly simple fix in the end
For apps - yes - but as mentioned in the original issue, the biggest problem is porting libraries, where you can't just exit runtime because there is no "main" function and, therefore, no "end" point of execution, but rather a bunch of exports that user might call at any time.
Well, yes, because they exhibit the same issue as in browsers when you block the event loop but a Worker is not created yet. E.g. if I run my example code in Node.js without worker pool, I'll see the typical
and the app will deadlock.
I'm not sure what you mean by saying "mostly for testing". There are pretty real usecases for Wasm in Node.js environment, including for multithreaded Wasm. |
I guess I meant to ask that as a question. I'm not aware of any folks using emscripten-built module under node in production, but that might simply be because they don't tend to file bug here. I would love to support this use case I just don't know how common it is today. Do have some specific examples? |
I guess there are two reasons for having the worker pool:
Its is normally pretty obvious when (1) is the issue, but its less clear when (2) is the issue. Assuming we some day fix (1) in some other way (e.g. via If we remove reason (1) do we still want to do pooling for reason (2)? I would guess the answer might be different for node vs browser but I don't know. The other downside to never removing workers is that applications that don't use threads excepts for certain tasks will have those resources locked up for the lifetime of the applications (i.e. the number of OS threads can go up, but never come down). |
Squoosh Node.js usecase would be definitely one, and there were couple of others I encountered over time. The one I'm working with right now - StackBlitz - might be a bit unusual yet pretty popular. It provides a full Node.js environment in browser, so you can use arbitrary the Node.js APIs, but you don't have access to browser APIs and you can't run native code, so that's where Wasm Node.js target steps in and fills the gap.
I saw that expressed in some issue before, but I'm sceptical it would be very different tbh. In both cases the cost is not negligible, because whether Node.js or browser, they both need to first load JS from external source, create new context, evaluate & potentially JIT compile the JS code etc. Sure, in browser if the JS is not cached (first visit), you might need to do the more expensive HTTP call too, but besides that they both need to do the ~same amount of extra work on top of the native But, without having an alternative and doing measurements it's all just guesswork. Once we do have a working alternative and it proves fast enough, I'm as happy to get rid of the pthread pool as you are - it caused way too many problems over time :) |
This is the workaround I mentioned in emscripten-core/emscripten#18227. Since we know all the threads in the app are part of the threadpool, they can be just weakly referenced, so that the existence of the Worker alone doesn't prevent Node.js from exiting, and instead it's the blocking that waits for results of specific ops that keeps the event loop alive. This allows to get rid of non-JS-esque shutdown helper.
This is the workaround I mentioned in emscripten-core/emscripten#18227. Since we know all the threads in the app are part of the threadpool, they can be just weakly referenced, so that the existence of the Worker alone doesn't prevent Node.js from exiting, and instead it's the blocking that waits for results of specific ops that keeps the event loop alive. This allows to get rid of non-JS-esque shutdown helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Why did |
Ok at least it's failing on |
I think(?) it's caused by commit llvm/llvm-project@e1b88c8 which was auto-rolled with https://chromium.googlesource.com/emscripten-releases/+/4e2ffe94b04dbadfbca1687ab458d306b3414d13. |
... #18231 :) |
Good job @RReverser, and thanks for the shoutout here. Really glad that we can fix this in Emscripten directly 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👏
This fixes couple more tests.
dabe2e8
to
07fd976
Compare
Probably worth a highlight, but forgot to add it in the original PR.
Fixes #12801 for majority of cases.
This is a relatively simple change, but it took embarrassingly many attempts to get it in the right places for all obscure tests to pass + to figure out which tests can make use of it instead of doing manual exit + to debug some apparent differences in Node Worker GC behaviour between Windows/Linux as a bonus.
I tried two approaches in parallel, a conservative one in this PR and one that brings Emscripten behaviour closer to native in a separate branch.
In ideal scenario, I wanted to make Node.js apps behave just like native, where background threads themselves don't keep an app open, and instead app lives as long as it explicitly blocks on
pthread_join
or other blocking APIs. However, it's a more disruptive change that still requires more work and testing, as some Emscripten use-cases implicitly depend on the app running despite not having any more foreground work to do - one notable example isPROXY_TO_PTHREAD
that spawns a detached thread, but obviously wants the app to continue running. All those cases are fixable, but, as said above, requires more work so I'm keeping it aside for now.Instead, in this PR I'm adding a .ref/.unref "dance" (h/t @d3lm for the original idea) that keeps the app alive as long as any pthreads are running, whether joinable or detached, and whether you have explicit blocking on them or not. It works as following:
This ensures maximum compatibility, while fixing majority of common cases.
One usecase it doesn't fix is when a C/C++ app itself has an internal singletone threadpool (like one created by glib) - in this case there's no way for Emscripten to know that those "running" threads are actually semantically idle. This would be fixed by the more rigorous alternative implementation mentioned above, but, for now, such usecases can be relatively easily worked around with a bit of custom
--pre-js
that goes over allPThread.runningWorkers
and marks them as.unref
d. That's what I did in an app I'm currently working on, and it works pretty well. To avoid reaching into JS internals, we might consider adding anemscripten_
-prefixed API to allow referencing/unreferencing Worker via apthread_t
instance from the C code, but for now I'm leaving it out of scope of this PR.Let me know if you have any questions.