wasm-only memset #5245

kripken · 2017-05-24T00:40:06Z

(builds on #5244)

This passes WASM_ONLY into the js compiler, and uses that to use i64 operations in memset when in wasm-only mode. It's basically a step between using i32s and simd in the main loop, 8 bytes per write instead of 4 or 16. This is almost 2x faster than using i32s (on very large memsets, 16K in the test suite).

cc @juj who benchmarked those things a lot.

If this makes sense to do, we should also do memcpy probably.

…mple to add both --passname and -O2 etc. also fix a bug in debug mode when in the test runner, we need to be careful about copying those debug files

…ns in memset when in wasm-only mode, for a nearly 2x speedup on big memsets

jgravelle-google · 2017-05-24T18:54:20Z

src/library.js

@@ -824,8 +824,12 @@ LibraryManager.library = {
 #if SIMD
      196608
 #else
+#if WASM_ONLY


This pattern seems to come up a fair bit. Might be worth extending the preprocessor to handle something like

#if SIMD ... #elif WASM_ONLY ... #endif

Wouldn't belong to this PR but might be worth doing.

jgravelle-google · 2017-05-24T19:06:14Z

emcc.py

+
+      if shared.Settings.BINARYEN:
+        # determine wasm-only mode, now that all settings are settled on
+        shared.Settings.WASM_ONLY = shared.Building.is_wasm_only()


Wait, so we add a WASM_ONLY flag that we assert gets set when we unset LEGALIZE_JS_FFI, but then we clobber it with our autodetected shared.Building.is_wasm_only() regardless?

We've promoted WASM_ONLY to a Setting so it's available to the js compiler, not because it's something users should be setting. Because of that I think we should move this to before the assert, and just always have it autodetected. Maybe add a note to settings.js, saying we ignore this if you try to set it.

emcc.py order was a little wrong, thanks, fixed.

And clarified in settings.js what is internal use only.

kripken · 2017-05-24T20:09:48Z

This seems like a win for memcpy and memset, however the 128-byte memset/memcpy benchmarks show us slightly slower than before. Not sure why.

jgravelle-google

lgtm

juj · 2017-05-30T20:49:04Z

src/library.js

+      // In the unaligned copy case, unroll a bit as well.
+      aligned_dest_end = (dest_end - 4)|0;
+      while ((dest|0) < (aligned_dest_end|0) ) {
+        store4(dest, load4(src, 1), 1); // unaligned


Hmm, why is this a store4 and a load4? Is that alright since this is doing one byte loads and stores?

The , 1 tells wasm it is alignment 1. So this should be at least as efficient as 4 stores, as if it wasn't the engine can break it up, it has all the info to do so.

Ah right, that makes sense. Does this currently emit unaligned loads and stores on wasm or does it break up to 1 byte writes in binaryen?

asm2wasm will turn it into a single 4-byte load/store, marked with alignment 1.

juj · 2017-05-30T20:53:23Z

src/library.js

+#if WASM_ONLY
+    ret = dest|0;
+    dest_end = (dest + num)|0;
+    if ((dest&7) == (src&7)) {


In wasm-only mode, this raises an interesting question: if src or dst are unaligned, is it faster to do unaligned 8-byte copies or should one resort to aligned 1-byte copies. On x86 definitely unaligned 8-byte copies will still be faster than 1-byte copies, since x86 practically doesn't care (iirc it was a 1 cycle penalty either on loads or stores on Intel, and 1 cycle penalty on loads and stores on AMD). However on ARM this might be a different thing. Anyone have a super efficient ARM memcpy at hand?

juj · 2017-05-30T20:56:57Z

This is perfect! Btw in Unreal Engine 4, memcpy is currently the single biggest hot wasm function in profiles, so curious to see how much this improves. lgtm, that one case of the unaligned tail looks a bit odd to me though, but unsure if I interpreted that correctly.

stale · 2019-09-18T21:45:31Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

kripken added 4 commits May 23, 2017 16:19

support '-' prefixing of passes in BINARYEN_PASSES list, making it si…

f98dc0a

…mple to add both --passname and -O2 etc. also fix a bug in debug mode when in the test runner, we need to be careful about copying those debug files

pass WASM_ONLY into the js compiler, and use that to use i64 operatio…

6796808

…ns in memset when in wasm-only mode, for a nearly 2x speedup on big memsets

fix order of setting WASM_ONLY

47f7203

memcpy too, and note the proper alignment

1cdd9c4

jgravelle-google reviewed May 24, 2017

View reviewed changes

kripken added 4 commits May 24, 2017 13:01

work with 8-byte aligned data in wasm-only memcpy/memset

74a99fa

work with 8-byte aligned data in wasm-only memcpy/memset

7f03fac

fix order of use of WASM_ONLY in emcc.py

a0c1965

clarify which settings.js settings are internal use only

7b6d026

jgravelle-google approved these changes May 24, 2017

View reviewed changes

juj reviewed May 30, 2017

View reviewed changes

Merge branch 'incoming' into wasm-only-memset

521179e

juj added the performance label Jun 6, 2017

juj mentioned this pull request Aug 7, 2017

Implement 64-bit word sized memcpy #4915

Closed

juj force-pushed the incoming branch from e1622af to d54eb4e Compare May 21, 2018 13:46

stale bot added the wontfix label Sep 18, 2019

stale bot closed this Sep 25, 2019

kripken deleted the wasm-only-memset branch November 6, 2019 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm-only memset #5245

wasm-only memset #5245

kripken commented May 24, 2017

jgravelle-google May 24, 2017

jgravelle-google May 24, 2017

kripken May 24, 2017

kripken commented May 24, 2017

jgravelle-google left a comment

juj May 30, 2017

kripken May 30, 2017

juj Jun 6, 2017

kripken Jun 6, 2017

juj May 30, 2017

juj commented May 30, 2017

stale bot commented Sep 18, 2019

wasm-only memset #5245

wasm-only memset #5245

Conversation

kripken commented May 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken commented May 24, 2017

jgravelle-google left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juj commented May 30, 2017

stale bot commented Sep 18, 2019