New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Memoize lookups from SIDE_MODULE into MAIN_MODULE #6437

Merged
merged 1 commit into from Apr 11, 2018

Conversation

Projects
None yet
3 participants
@mdboom
Contributor

mdboom commented Apr 10, 2018

In running Numpy benchmarks in WebAssembly, I noticed that particularly for the benchmarks that are much slower than native, a large fraction of the time was spent looking up symbols in the main module from the side module (at least I think that's what this code does):

for (var named in NAMED_GLOBALS) {
  (function(named) {
    Module['g$_' + named] = function() { return Module['_' + named] // <- 50% of runtime };
  })(named);
}

By memoizing the lookup, that significantly speeds things up.

(x scale is slowdown factor of WebAssembly vs. native. Grey is before this change, blue is after).

benchmark_improvement

@kripken

This comment has been minimized.

Owner

kripken commented Apr 11, 2018

Wow, incredible that this speeds things up so much! Aside from being a huge speedup, it also brings us into the 4x or so slowdown area on average, which is surprisingly good given native can use simd.

About this code, the reason it tries to by dynamic is it originated from code that lets a module load code from a module loaded after it. However, this code here is literally just for the main module itself, so it's a moot point, and this optimization is valid. We can simplify it a little more, even (probably with no speedup, though), I'll comment in the code.

@kripken

This comment has been minimized.

Owner

kripken commented Apr 11, 2018

Oh actually we can't simplify this any more, I was mistaken. lgtm!

@kripken kripken merged commit 2e3ac30 into kripken:incoming Apr 11, 2018

19 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: test-ab Your tests passed on CircleCI!
Details
ci/circleci: test-binaryen0 Your tests passed on CircleCI!
Details
ci/circleci: test-binaryen1 Your tests passed on CircleCI!
Details
ci/circleci: test-binaryen2 Your tests passed on CircleCI!
Details
ci/circleci: test-binaryen3 Your tests passed on CircleCI!
Details
ci/circleci: test-browser Your tests passed on CircleCI!
Details
ci/circleci: test-c Your tests passed on CircleCI!
Details
ci/circleci: test-d Your tests passed on CircleCI!
Details
ci/circleci: test-e Your tests passed on CircleCI!
Details
ci/circleci: test-f Your tests passed on CircleCI!
Details
ci/circleci: test-ghi Your tests passed on CircleCI!
Details
ci/circleci: test-jklmno Your tests passed on CircleCI!
Details
ci/circleci: test-other Your tests passed on CircleCI!
Details
ci/circleci: test-p Your tests passed on CircleCI!
Details
ci/circleci: test-qrst Your tests passed on CircleCI!
Details
ci/circleci: test-sanity Your tests passed on CircleCI!
Details
ci/circleci: test-uvwxyz Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@lukewagner

This comment has been minimized.

Contributor

lukewagner commented Apr 11, 2018

Is it even necessary to have these JS thunks at all? It seems like wasm Table.prototype.set() could be used to have direct wasm→wasm calls that would be much faster still.

@kripken

This comment has been minimized.

Owner

kripken commented Apr 11, 2018

@lukewagner Do you mean using function pointers and indirect calls? It's possible that would be faster than the current thunks. But I thought the long-term plan for dynamic linking was to have cross-module calls be fast direct ones? The current code tries to be close to that.

@lukewagner

This comment has been minimized.

Contributor

lukewagner commented Apr 11, 2018

Yeah, a call_indirect with a constant index. That should be significantly faster than a wasmjswasm call, no matter how slim the JS stub is.

@lukewagner

This comment has been minimized.

Contributor

lukewagner commented Apr 11, 2018

(Of course, if you could manage to use a wasmwasm import call, that'd be a bit faster still.)

@kripken

This comment has been minimized.

Owner

kripken commented Apr 11, 2018

Makes sense.

Yeah, in the case of the code here, the hope is to improve it into a direct call, which should be possible since it's a call from a shared module into the main module, and the main module instantiates the shared module. However, the code we emit for the shared module doesn't know it'll be a call into the main module (might be another shared module), so there's something to figure out there that I haven't had time to look into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment