Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node+pthreads+wasm EH surprisingly slow #15727

Closed
kripken opened this issue Dec 7, 2021 · 7 comments
Closed

Node+pthreads+wasm EH surprisingly slow #15727

kripken opened this issue Dec 7, 2021 · 7 comments

Comments

@kripken
Copy link
Member

kripken commented Dec 7, 2021

See

WebAssembly/binaryen#4334 (comment)

That builds wasm-opt with EH and pthreads, then optimizes a large file. Despite using multiple cores the wasm version running in node is over 10x slower.

Node profiler shows this:

 [C++]:
   ticks  total  nonlib   name
  31085   28.2%   28.4%  __lll_lock_wait
  14723   13.4%   13.4%  __pthread_mutex_unlock_usercnt
   5617    5.1%    5.1%  __pthread_cond_wait
   5279    4.8%    4.8%  __pthread_cond_timedwait

That's a lot of time spent in pthreads helper code for locking. I wonder if it's related to wasm atomics being always sequentially consistent or something like that? Just a random guess though.

@kleisauke
Copy link
Collaborator

Interesting. I wonder if building with

-sPTHREAD_POOL_SIZE="_emscripten_num_logical_cores()"
-sDEFAULT_LIBRARY_FUNCS_TO_INCLUDE=emscripten_num_logical_cores

could help here (perhaps in favor of -sPROXY_TO_PTHREAD(?)).

Regarding locking, I also wonder if using atomic builtins might help on Node.js (see e.g. commit 981ba25). Although, that won't work on the web (that's why I dropped that commit for now).

@kripken
Copy link
Member Author

kripken commented Dec 9, 2021

Hmm, using builtins is a good idea - maybe we call these a lot and the overhead of going to JS is hitting us. But yeah, limitations on the web make it hard to do. What do you think about opening a PR though with your builtin work that keeps it behind a flag, for experimentation?

About PTHREAD_POOL_SIZE, I'll try that, maybe it will turn up something surprising.

@kleisauke
Copy link
Collaborator

I just opened draft PR #15740 to experiment with the use of atomic builtins on non-web environments.

@svenpilz
Copy link

svenpilz commented Dec 17, 2021

Could it be possible that they are similar issues with Chrome? We have typical "parallel for loop"-kind of computations and noticed that Chrome does not really benefit from multiple threads, whereas Firefox performs as expected (compared to a native build). Meaning the same binary runs much faster on the same machine in Firefox compared to Chrome.

Edit: At first we thought this may be a ALLOW_MEMORY_GROWTH issue, but it seems there is not much difference if any.

@kripken
Copy link
Member Author

kripken commented Jan 11, 2022

Data here shows that malloc/free become 2-3x slower when using atomics compared to without:

#15914 (comment)

Not a big enough difference to explain this issue, but that might be something worth looking at too. Perhaps relaxed atomics could be used someday.

@kripken
Copy link
Member Author

kripken commented Feb 22, 2024

Good news, it looks like most of this slowdown is fixed by mimalloc.

Building a wasm EH+pthreads build now, I get the following numbers (running -O3 on poppler.wasm):

time (seconds)
Native (x86_64) 2.1
Emscripten/mimalloc 4.3
Emscripten/dlmalloc 20.6

So with all modern features + mimalloc we are around 2x slower than a native build, and both builds seem to use all CPU cores properly in my OS CPU monitor. In comparison, dlmalloc is still 10x slower as in the first measurement in this issue.

Closing as mimalloc basically resolved this.

@kripken kripken closed this as completed Feb 22, 2024
@svenpilz
Copy link

We also decided to switch to mimalloc today, interesting coincidence :). Thank you for integrating it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants