-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add implementation of emscripten_memcpy_big based on bulk memory. #19128
Conversation
b6ea0f2
to
3a26933
Compare
00f7e9b
to
06d45a4
Compare
b850366
to
ddd7446
Compare
bee195a
to
c8023e8
Compare
These new functions live in `libbulkmemory` which only gets included if bulk memory is enabled (either via `-mbulk-memory` directly or indirectly via `-pthread). benchmark results for benchmark.test_memcpy_1mb: ``` v8: mean: 1.666 v8-bulkmemory: mean: 1.598 v8-standalone-bulkmemory: mean: 1.576 v8-standalone: mean: 3.197 ``` Here we can see the that when bulk memory is enabled its at least as fast if not faster than the JS version. v8-standalone doesn't have emscripten_memcpy_big at all is is much slower, as expected. By adding `-mbulk-memory` the standalone version becomes just as fast as the non-standalone.
If bulk memory is available, why can't This code seems to be written with an assumption that the bulk memory operations have some kind of overhead that makes them only worthwhile for copies/fills of greater than 512 bytes. Is this true? |
I think you could well be correct and we might just want to completely remove the traditional The code that this was replacing involved calling out the JavaScript which certain had/has a non-zero overhead. We would probably want measure imperially weather Do you have a particular reason to want this change? For example, is the cost of including |
I am looking at a profile where I don't know of a way to test this theory. Do you know of any way to force |
The simplest way to do that would probably be to just remove the |
Unlike the JS versions of these function there is no need to only use these for small inputs. Results of running the test_memcpy_128b test before and after this change: ``` v8-bulk: mean: 1.536 (+-0.071) secs median: 1.495 range: 1.471-1.650 (noise: 4.630%) (5 runs) size: 149291, compressed: 54249 ``` -> ``` v8-bulk: mean: 1.489 (+-0.117) secs median: 1.535 range: 1.268-1.606 (noise: 7.871%) (5 runs)- size: 148387, compressed: 53813- ``` See comments in #19128
Unlike the JS versions of these function there is no need to only use these for small inputs. Results of running the test_memcpy_128b test before and after this change: ``` v8-bulk: mean: 1.536 (+-0.071) secs median: 1.495 range: 1.471-1.650 (noise: 4.630%) (5 runs) size: 149291, compressed: 54249 ``` -> ``` v8-bulk: mean: 1.489 (+-0.117) secs median: 1.535 range: 1.268-1.606 (noise: 7.871%) (5 runs)- size: 148387, compressed: 53813- ``` See comments in #19128
Unlike the JS versions of these function there is no need to only use these for small inputs. Results of running the test_memcpy_128b test before and after this change: ``` v8-bulk: mean: 1.536 (+-0.071) secs median: 1.495 range: 1.471-1.650 (noise: 4.630%) (5 runs) size: 149291, compressed: 54249 ``` -> ``` v8-bulk: mean: 1.489 (+-0.117) secs median: 1.535 range: 1.268-1.606 (noise: 7.871%) (5 runs)- size: 148387, compressed: 53813- ``` See comments in #19128
These new functions live in
libbulkmemory
which only gets included if bulk memory is enabled (either via-mbulk-memory
directly or indirectly via `-pthread).benchmark results for benchmark.test_memcpy_1mb:
Here we can see the that when bulk memory is enabled its at least as fast if not faster than the JS version.
v8-standalone doesn't have emscripten_memcpy_big at all is is much slower, as expected. By adding
-mbulk-memory
the standalone version becomes just as fast as the non-standalone.