-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node+pthreads+wasm EH surprisingly slow #15727
Comments
Interesting. I wonder if building with
could help here (perhaps in favor of Regarding locking, I also wonder if using atomic builtins might help on Node.js (see e.g. commit 981ba25). Although, that won't work on the web (that's why I dropped that commit for now). |
Hmm, using builtins is a good idea - maybe we call these a lot and the overhead of going to JS is hitting us. But yeah, limitations on the web make it hard to do. What do you think about opening a PR though with your builtin work that keeps it behind a flag, for experimentation? About |
I just opened draft PR #15740 to experiment with the use of atomic builtins on non-web environments. |
Could it be possible that they are similar issues with Chrome? We have typical "parallel for loop"-kind of computations and noticed that Chrome does not really benefit from multiple threads, whereas Firefox performs as expected (compared to a native build). Meaning the same binary runs much faster on the same machine in Firefox compared to Chrome. Edit: At first we thought this may be a |
Data here shows that malloc/free become 2-3x slower when using atomics compared to without: Not a big enough difference to explain this issue, but that might be something worth looking at too. Perhaps relaxed atomics could be used someday. |
Good news, it looks like most of this slowdown is fixed by mimalloc. Building a wasm EH+pthreads build now, I get the following numbers (running
So with all modern features + mimalloc we are around 2x slower than a native build, and both builds seem to use all CPU cores properly in my OS CPU monitor. In comparison, dlmalloc is still 10x slower as in the first measurement in this issue. Closing as mimalloc basically resolved this. |
We also decided to switch to |
See
WebAssembly/binaryen#4334 (comment)
That builds wasm-opt with EH and pthreads, then optimizes a large file. Despite using multiple cores the wasm version running in node is over 10x slower.
Node profiler shows this:
That's a lot of time spent in pthreads helper code for locking. I wonder if it's related to wasm atomics being always sequentially consistent or something like that? Just a random guess though.
The text was updated successfully, but these errors were encountered: