Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSPI asynchronous calls into WASM are much slower than Asyncify #21081

Open
rhashimoto opened this issue Jan 15, 2024 · 6 comments
Open

JSPI asynchronous calls into WASM are much slower than Asyncify #21081

rhashimoto opened this issue Jan 15, 2024 · 6 comments

Comments

@rhashimoto
Copy link

I think this is more likely to be a Chromium bug and I have filed a bug there, but I don't understand the details well enough to know for sure.

Asynchronous calls from JavaScript into WASM are 2+ orders of magnitude slower with JSPI over Asyncify. I'm using Chrome 122.0.6248.0 (Official Build) canary (arm64) on an M2 Mac with chrome://flags/#enable-experimental-webassembly-stack-switching enabled.

I have a test page here (source) that times 10,000 calls to a C function that calls an asynchronous JavaScript function (use Chrome Canary as some older versions just crash the page). It loops over the calls once from C and once from Javascript, for both Asyncify and JavaScript builds. The output of the page looks like this:

asyncify 10000 iterations in C 0.01 seconds
asyncify 10000 iterations in JS 0.01 seconds
jspi 10000 iterations in C 0.003 seconds
jspi 10000 iterations in JS 3.554 seconds

This shows that with Asyncify, calling the JS function from C and calling it from JavaScript to C to JavaScript, is the same 0.01 seconds. JSPI is faster than Asyncify at calling the JS function from C, but is 350 times slower at calling from JS.

Version of emscripten/emsdk:
$ emcc -v
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.47 (431685f)
clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/dev/emsdk/upstream/bin

Full link command and output with -v appended:
$ make (both Asyncify and JSPI builds shown here)
emcc -O2 -I. -s ALLOW_MEMORY_GROWTH=1 -s ASYNCIFY_EXPORTS='["cfunc", "cloop"]' -s ASYNCIFY_IMPORTS='["jfunc"]' -s ENVIRONMENT="web,worker" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s INVOKE_RUN -s WASM=1 -v -s ASYNCIFY -o cfunc-asyncify.mjs cfunc.c --js-library libfoo.js
"/home/dev/emsdk/upstream/bin/clang" --version
"/home/dev/emsdk/upstream/bin/clang" -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN -Werror=implicit-function-declaration --sysroot=/var/tmp/emsdk/upstream/emscripten/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -I. -v cfunc.c -c -o /tmp/emscripten_temp_8v67r6_8/cfunc_0.o
clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/dev/emsdk/upstream/bin
(in-process)
"/var/tmp/emsdk/upstream/bin/clang-18" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name cfunc.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/home/dev/browser-test-cases/jspi -v -fcoverage-compilation-dir=/home/dev/browser-test-cases/jspi -resource-dir /var/tmp/emsdk/upstream/lib/clang/18 -D EMSCRIPTEN -I . -isysroot /var/tmp/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /var/tmp/emsdk/upstream/lib/clang/18/include -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include -O2 -Werror=implicit-function-declaration -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fignore-exceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_8v67r6_8/cfunc_0.o -x c cfunc.c
clang -cc1 version 18.0.0 based upon LLVM 18.0.0git default target aarch64-unknown-linux-gnu
ignoring nonexistent directory "/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
.
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/compat
/var/tmp/emsdk/upstream/lib/clang/18/include
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
"/home/dev/emsdk/upstream/bin/wasm-ld" -o cfunc-asyncify.wasm /tmp/emscripten_temp_8v67r6_8/cfunc_0.o -L/var/tmp/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -lGL -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpnogg_6b0libemscripten_js_symbols.so --export-if-defined=main --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export=__get_temp_ret --export=__set_temp_ret --export=malloc --export=free --export=wasm_call_ctors --export-table -z stack-size=65536 --initial-memory=16777216 --max-memory=2147483648 --no-entry --table-base=1 --global-base=1024
"/home/dev/emsdk/upstream/bin/llvm-objcopy" cfunc-asyncify.wasm cfunc-asyncify.wasm --remove-section=.debug* --remove-section=producers
"/home/dev/emsdk/upstream/bin/wasm-emscripten-finalize" --pass-arg=legalize-js-interface-exported-helpers cfunc-asyncify.wasm -o cfunc-asyncify.wasm --detect-features
"/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/src/compiler.js /tmp/tmp_6pmyt7
.json
"/home/dev/emsdk/upstream/bin/wasm-opt" --post-emscripten -O2 --low-memory-unused --asyncify --pass-arg=asyncify-imports@env.jfunc,env.invoke
,env.asyncjs,.fd_sync,.emscripten_promise_await,.emscripten_idb_load,.emscripten_idb_store,.emscripten_idb_delete,.emscripten_idb_exists,.emscripten_idb_clear,.emscripten_idb_load_blob,.emscripten_idb_store_blob,.emscripten_sleep,.emscripten_wget_data,.emscripten_scan_registers,.emscripten_lazy_load_code,.load_secondary_module,.emscripten_fiber_swap,.SDL_Delay,.jfunc --zero-filled-memory --pass-arg=directize-initial-contents-immutable cfunc-asyncify.wasm -o cfunc-asyncify.wasm --mvp-features --enable-mutable-globals --enable-sign-ext
"/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/tools/acorn-optimizer.js /tmp/emscripten_temp_8v67r6_8/cfunc-asyncify.js JSDCE minifyWhitespace --exportES6 -o /tmp/emscripten_temp_8v67r6_8/cfunc-asyncify.jso1.js
emcc -O2 -I. -s ALLOW_MEMORY_GROWTH=1 -s ASYNCIFY_EXPORTS='["cfunc", "cloop"]' -s ASYNCIFY_IMPORTS='["jfunc"]' -s ENVIRONMENT="web,worker" -s EXPORTED_RUNTIME_METHODS='["ccall", "cwrap"]' -s INVOKE_RUN -s WASM=1 -v -s ASYNCIFY=2 -o cfunc-jspi.mjs cfunc.c --js-library libfoo.js
"/home/dev/emsdk/upstream/bin/clang" --version
"/home/dev/emsdk/upstream/bin/clang" -target wasm32-unknown-emscripten -fignore-exceptions -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -DEMSCRIPTEN -Werror=implicit-function-declaration --sysroot=/var/tmp/emsdk/upstream/emscripten/cache/sysroot -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -O2 -I. -v cfunc.c -c -o /tmp/emscripten_temp_jls2o26n/cfunc_0.o
clang version 18.0.0 (https://github.com/llvm/llvm-project 21030b9ab4487d845e29792063f5666d8c4b8e09)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/dev/emsdk/upstream/bin
(in-process)
"/var/tmp/emsdk/upstream/bin/clang-18" -cc1 -triple wasm32-unknown-emscripten -emit-obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name cfunc.c -mrelocation-model static -mframe-pointer=none -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -debugger-tuning=gdb -fdebug-compilation-dir=/home/dev/browser-test-cases/jspi -v -fcoverage-compilation-dir=/home/dev/browser-test-cases/jspi -resource-dir /var/tmp/emsdk/upstream/lib/clang/18 -D EMSCRIPTEN -I . -isysroot /var/tmp/emsdk/upstream/emscripten/cache/sysroot -internal-isystem /var/tmp/emsdk/upstream/lib/clang/18/include -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten -internal-isystem /var/tmp/emsdk/upstream/emscripten/cache/sysroot/include -O2 -Werror=implicit-function-declaration -ferror-limit 19 -fvisibility=default -fgnuc-version=4.2.1 -fignore-exceptions -fcolor-diagnostics -vectorize-loops -vectorize-slp -iwithsysroot/include/fakesdl -iwithsysroot/include/compat -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr -o /tmp/emscripten_temp_jls2o26n/cfunc_0.o -x c cfunc.c
clang -cc1 version 18.0.0 based upon LLVM 18.0.0git default target aarch64-unknown-linux-gnu
ignoring nonexistent directory "/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/wasm32-emscripten"
#include "..." search starts here:
#include <...> search starts here:
.
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/fakesdl
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include/compat
/var/tmp/emsdk/upstream/lib/clang/18/include
/var/tmp/emsdk/upstream/emscripten/cache/sysroot/include
End of search list.
emcc: warning: -sASYNCIFY=2 (JSPI) is still experimental [-Wexperimental]
"/home/dev/emsdk/upstream/bin/wasm-ld" -o cfunc-jspi.wasm /tmp/emscripten_temp_jls2o26n/cfunc_0.o -L/var/tmp/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten -lGL -lal -lhtml5 -lstubs -lnoexit -lc -ldlmalloc -lcompiler_rt -lc++-noexcept -lc++abi-noexcept -lsockets -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr /tmp/tmpddcpx76vlibemscripten_js_symbols.so --export-if-defined=main --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=__main_argc_argv --export=stackSave --export=stackRestore --export=stackAlloc --export=__errno_location --export=__get_temp_ret --export=__set_temp_ret --export=malloc --export=free --export=__wasm_call_ctors --export-table -z stack-size=65536 --initial-memory=16777216 --max-memory=2147483648 --no-entry --table-base=1 --global-base=1024
"/home/dev/emsdk/upstream/bin/llvm-objcopy" cfunc-jspi.wasm cfunc-jspi.wasm --remove-section=.debug
--remove-section=producers
"/home/dev/emsdk/upstream/bin/wasm-emscripten-finalize" --dyncalls-i64 --pass-arg=legalize-js-interface-exported-helpers cfunc-jspi.wasm -o cfunc-jspi.wasm --detect-features
"/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/src/compiler.js /tmp/tmprhq8bvm5.json
"/home/dev/emsdk/upstream/bin/wasm-opt" --post-emscripten -O2 --low-memory-unused --jspi --pass-arg=jspi-imports@env.jfunc,env.invoke
,env.asyncjs,.fd_sync,.emscripten_promise_await,.emscripten_idb_load,.emscripten_idb_store,.emscripten_idb_delete,.emscripten_idb_exists,.emscripten_idb_clear,.emscripten_idb_load_blob,.emscripten_idb_store_blob,.emscripten_sleep,.emscripten_wget_data,.emscripten_scan_registers,.emscripten_lazy_load_code,._load_secondary_module,.emscripten_fiber_swap,.SDL_Delay,.jfunc --pass-arg=jspi-exports@cfunc,cloop,main,__main_argc_argv,_ZN10emscripten8internal5async --zero-filled-memory --pass-arg=directize-initial-contents-immutable cfunc-jspi.wasm -o cfunc-jspi.wasm --mvp-features --enable-mutable-globals --enable-sign-ext --enable-reference-types
"/home/dev/emsdk/node/16.20.0_64bit/bin/node" /var/tmp/emsdk/upstream/emscripten/tools/acorn-optimizer.js /tmp/emscripten_temp_jls2o26n/cfunc-jspi.js JSDCE minifyWhitespace --exportES6 -o /tmp/emscripten_temp_jls2o26n/cfunc-jspi.jso1.js

@kripken
Copy link
Member

kripken commented Jan 16, 2024

One thing to worry about here is on-stack replacement. If you call the function time() many times, do the results change? (Without OSR, the issue is that once inside a loop, the VM must do on-stack replacement to replace the baseline or interpreter code with fully-optimized code, and if that loop only happens once, there is no such chance and it runs slowly. That is the case in V8. Calling time() again would give a chance to run the optimized version.)

also cc @brendandahl

@rhashimoto
Copy link
Author

Thanks for your insight @kripken! Sorry, but I may not be understanding quite what you're suggesting.

My C code is here (it's very short). jfunc() is the asynchronous JavaScript function, cfunc() just calls jfunc(), and cloop() iterates cfunc(). I'm comparing a call to cloop() from JavaScript with iterating over calls to cfunc() from JavaScript.

I changed my code locally to replace the JavaScript loop timing with this:

  async function timeJStoCtoJS() {
    const start = performance.now();
    for (let i = 0; i < ITERATIONS; ++i) {
      await module.ccall('cfunc', 'number', ['number'], [2.0], { async: true });
    }
    const end = performance.now();
    const duration = Math.trunc(end - start) / 1000;
    postMessage(`${build} ${ITERATIONS} iterations in JS ${duration} seconds`);
  }

  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();
  await timeJStoCtoJS();

Does that accomplish what you are suggesting?

The results of this on my machine are:

asyncify 10000 iterations in C 0.01 seconds
asyncify 10000 iterations in JS 0.01 seconds
asyncify 10000 iterations in JS 0.006 seconds
asyncify 10000 iterations in JS 0.005 seconds
asyncify 10000 iterations in JS 0.005 seconds
asyncify 10000 iterations in JS 0.004 seconds
jspi 10000 iterations in C 0.004 seconds
jspi 10000 iterations in JS 12.56 seconds
jspi 10000 iterations in JS 12.542 seconds
jspi 10000 iterations in JS 13.139 seconds
jspi 10000 iterations in JS 12.551 seconds
jspi 10000 iterations in JS 12.57 seconds

It does look like the timings improve for Asyncify (that progression is pretty consistent over repeated runs), but are little changed for JSPI. I'm currently running 122.0.6251.0 (Official Build) canary (arm64) on a M2 Mac.

@kripken
Copy link
Member

kripken commented Jan 16, 2024

Thanks, yes, that's what I meant. Ok, I think that confirms that OSR is not a factor here, which is good to rule out.

@kripken
Copy link
Member

kripken commented Jan 16, 2024

@brendandahl do you know who would be good to cc on that chromium bug?

@brendandahl
Copy link
Collaborator

I'm guessing the performance difference is coming from the current cost of entering a JSPI'd module many times. IIRC a new stack is allocated each time. Performance within the module should have no overhead, but entering/exiting many times is currently expensive.

As for cc, I think Ilya or Thibaud would know more.

@kripken
Copy link
Member

kripken commented Jan 16, 2024

Interesting, thanks. Ok, I think we can cc them here actually: @thibaudmichaud @SPY - last comment may already answer the question, but please confirm if you can, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants