Add implementation of emscripten_memcpy_big based on bulk memory. #19128

sbc100 · 2023-04-03T23:31:53Z

These new functions live in libbulkmemory which only gets included if bulk memory is enabled (either via -mbulk-memory directly or indirectly via `-pthread).

benchmark results for benchmark.test_memcpy_1mb:

 v8:                       mean: 1.666
 v8-bulkmemory:            mean: 1.598
 v8-standalone-bulkmemory: mean: 1.576
 v8-standalone:            mean: 3.197

Here we can see the that when bulk memory is enabled its at least as fast if not faster than the JS version.

v8-standalone doesn't have emscripten_memcpy_big at all is is much slower, as expected. By adding -mbulk-memory the standalone version becomes just as fast as the non-standalone.

emcc.py

system/lib/libc/emscripten_memset.c

system/lib/standalone/standalone.c

system/lib/libc/emscripten_memcpy.c

test/test_other.py

system/lib/libc/emscripten_memcpy.c

These new functions live in `libbulkmemory` which only gets included if bulk memory is enabled (either via `-mbulk-memory` directly or indirectly via `-pthread). benchmark results for benchmark.test_memcpy_1mb: ``` v8: mean: 1.666 v8-bulkmemory: mean: 1.598 v8-standalone-bulkmemory: mean: 1.576 v8-standalone: mean: 3.197 ``` Here we can see the that when bulk memory is enabled its at least as fast if not faster than the JS version. v8-standalone doesn't have emscripten_memcpy_big at all is is much slower, as expected. By adding `-mbulk-memory` the standalone version becomes just as fast as the non-standalone.

haberman · 2023-08-27T15:09:08Z

If bulk memory is available, why can't memcpy() and memset() use memory.copy and memory.fill exclusively? Why does emscripten_memcpy_big() need to exist at all in that case?

This code seems to be written with an assumption that the bulk memory operations have some kind of overhead that makes them only worthwhile for copies/fills of greater than 512 bytes. Is this true?

sbc100 · 2023-08-27T23:20:42Z

If bulk memory is available, why can't memcpy() and memset() use memory.copy and memory.fill exclusively? Why does emscripten_memcpy_big() need to exist at all in that case?

This code seems to be written with an assumption that the bulk memory operations have some kind of overhead that makes them only worthwhile for copies/fills of greater than 512 bytes. Is this true?

I think you could well be correct and we might just want to completely remove the traditional __musl_memcpy when bulk memory is available.

The code that this was replacing involved calling out the JavaScript which certain had/has a non-zero overhead. We would probably want measure imperially weather memory.copy and memory.fill are faster or not for small copies, but it seems likely to me that there would have the same or better performance costs.

Do you have a particular reason to want this change? For example, is the cost of including __musl_memcpy concerning you?

haberman · 2023-08-28T03:28:04Z

Do you have a particular reason to want this change? For example, is the cost of including __musl_memcpy concerning you?

I am looking at a profile where __memcpy() (and __musl_memset() to a lesser extent) are significant costs, and I am hoping that using memory.copy and memory.fill would speed up my benchmark.

I don't know of a way to test this theory. Do you know of any way to force memcpy() to compile to memory.copy? I tried using __builtin_memcpy(), but oddly that still seems to call the __memcpy().

sbc100 · 2023-08-28T04:51:34Z

The simplest way to do that would probably be to just remove the if (n >= 512) { check from memcpy.c, and then rebuild libc (either via ./embuilder build libc --force or ./emcc --clear-cache (to force all libraries to be rebuilt).

Unlike the JS versions of these function there is no need to only use these for small inputs. Results of running the test_memcpy_128b test before and after this change: ``` v8-bulk: mean: 1.536 (+-0.071) secs median: 1.495 range: 1.471-1.650 (noise: 4.630%) (5 runs) size: 149291, compressed: 54249 ``` -> ``` v8-bulk: mean: 1.489 (+-0.117) secs median: 1.535 range: 1.268-1.606 (noise: 7.871%) (5 runs)- size: 148387, compressed: 53813- ``` See comments in #19128

sbc100 requested review from kripken and tlively April 3, 2023 23:31

sbc100 force-pushed the bulk_memory branch 2 times, most recently from b6ea0f2 to 3a26933 Compare April 3, 2023 23:35

tlively reviewed Apr 3, 2023

View reviewed changes

emcc.py Show resolved Hide resolved

system/lib/libc/emscripten_memset.c Show resolved Hide resolved

system/lib/standalone/standalone.c Show resolved Hide resolved

kripken reviewed Apr 4, 2023

View reviewed changes

system/lib/libc/emscripten_memcpy.c Show resolved Hide resolved

sbc100 force-pushed the bulk_memory branch 5 times, most recently from 00f7e9b to 06d45a4 Compare April 4, 2023 00:45

sbc100 requested review from tlively and kripken April 4, 2023 01:09

tlively approved these changes Apr 4, 2023

View reviewed changes

sbc100 force-pushed the bulk_memory branch 2 times, most recently from b850366 to ddd7446 Compare April 4, 2023 05:05

kripken approved these changes Apr 4, 2023

View reviewed changes

test/test_other.py Show resolved Hide resolved

system/lib/libc/emscripten_memcpy.c Show resolved Hide resolved

sbc100 force-pushed the bulk_memory branch 2 times, most recently from bee195a to c8023e8 Compare April 10, 2023 16:02

sbc100 force-pushed the bulk_memory branch from c8023e8 to 573b042 Compare April 11, 2023 02:05

sbc100 merged commit 6f3cfe3 into main Apr 11, 2023
2 checks passed

sbc100 deleted the bulk_memory branch April 11, 2023 04:51

RReverser mentioned this pull request Apr 21, 2023

emscripten_memcpy_big when bulk-memory enabled #18414

Closed

sbc100 mentioned this pull request Aug 28, 2023

Always use bulkmemory memcpy and memset when available #20144

Merged

juj mentioned this pull request Apr 3, 2024

Add emscripten_memset_js and use it from memset #21683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add implementation of emscripten_memcpy_big based on bulk memory. #19128

Add implementation of emscripten_memcpy_big based on bulk memory. #19128

sbc100 commented Apr 3, 2023

haberman commented Aug 27, 2023

sbc100 commented Aug 27, 2023

haberman commented Aug 28, 2023

sbc100 commented Aug 28, 2023

Add implementation of emscripten_memcpy_big based on bulk memory. #19128

Add implementation of emscripten_memcpy_big based on bulk memory. #19128

Conversation

sbc100 commented Apr 3, 2023

haberman commented Aug 27, 2023

sbc100 commented Aug 27, 2023

haberman commented Aug 28, 2023

sbc100 commented Aug 28, 2023