Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference output tests for JS operations #1894

Merged
merged 7 commits into from
Dec 4, 2019

Conversation

alexcrichton
Copy link
Contributor

This commit starts adding a test suite which checks in, to the
repository, test assertions for both the JS and wasm file outputs of a
Rust crate compiled with #[wasm_bindgen]. These aren't intended to be
exhaustive or large scale tests, but rather micro-tests to help observe
the changes in wasm-bindgen's output over time.

The motivation for this commit is basically overhauling how all the GC
passes work in wasm-bindgen today. The reorganization is also included
in this commit as well.

Previously wasm-bindgen would, in an ad-hoc fashion, run the GC passes
of walrus in a bunch of places to ensure that less "garbage" was seen
by future passes. This not only was a source of slowdown but it also was
pretty brittle since wasm-bindgen kept breaking if extra iteams leaked
through.

The strategy taken in this commit is to have one precise location for a
GC pass, and everything goes through there. This is achieved by:

  • All internal exports are removed immediately when generating the
    nonstandard wasm interface types section. Internal exports,
    intrinsics, and runtime support are all referenced by the various
    instructions and/or sections that use them. This means that we now
    have precise tracking of what an adapter uses.

  • This in turn enables us to implement the add_gc_roots function for
    walrus custom sections, which in turn allows walrus GC passes to do
    what unexport_unused_intrinsics did before. That function is now no
    longer necessary, but effectively works the same way. All intrinsics
    are unexported at the beginning and then they're selectively
    re-imported and re-exported through the JS glue generation pass as
    necessary and defined by the bindings.

  • Passes like the anyref pass are now much more precise about the
    intrinsics that they work with. The anyref pass also deletes any
    internal intrinsics found and also does some rewriting of the adapters
    aftewards now to hook up calls to the heap count import to the heap
    count intrinsic in the wasm module.

This commit starts adding a test suite which checks in, to the
repository, test assertions for both the JS and wasm file outputs of a
Rust crate compiled with `#[wasm_bindgen]`. These aren't intended to be
exhaustive or large scale tests, but rather micro-tests to help observe
the changes in `wasm-bindgen`'s output over time.

The motivation for this commit is basically overhauling how all the GC
passes work in `wasm-bindgen` today. The reorganization is also included
in this commit as well.

Previously `wasm-bindgen` would, in an ad-hoc fashion, run the GC passes
of `walrus` in a bunch of places to ensure that less "garbage" was seen
by future passes. This not only was a source of slowdown but it also was
pretty brittle since `wasm-bindgen` kept breaking if extra iteams leaked
through.

The strategy taken in this commit is to have one precise location for a
GC pass, and everything goes through there. This is achieved by:

* All internal exports are removed immediately when generating the
  nonstandard wasm interface types section. Internal exports,
  intrinsics, and runtime support are all referenced by the various
  instructions and/or sections that use them. This means that we now
  have precise tracking of what an adapter uses.

* This in turn enables us to implement the `add_gc_roots` function for
  `walrus` custom sections, which in turn allows walrus GC passes to do
  what `unexport_unused_intrinsics` did before. That function is now no
  longer necessary, but effectively works the same way. All intrinsics
  are unexported at the beginning and then they're selectively
  re-imported and re-exported through the JS glue generation pass as
  necessary and defined by the bindings.

* Passes like the `anyref` pass are now much more precise about the
  intrinsics that they work with. The `anyref` pass also deletes any
  internal intrinsics found and also does some rewriting of the adapters
  aftewards now to hook up calls to the heap count import to the heap
  count intrinsic in the wasm module.
The final user of the `require_internal_export` function was
`__wbindgen_realloc`. This usage has now been removed by updating how we
handle usage of the `realloc` function.

The wasm interface types standard doesn't have a `realloc` function
slot, nor do I think it ever will. This means that as a polyfill for
wasm interface types we'll always have to support the lack of `realloc`.
For direct Rust to JS, however, we can still optionally handle
`realloc`. This is all handled with a few internal changes.

* Custom `StringToMemory` instructions now exist. These have an extra
  `realloc` slot to store an intrinsic, if found.
* Our custom instructions are lowered to the standard instructions when
  generating an interface types section.
* The `realloc` function, if present, is passed as an argument like the
  malloc function when passing strings to wasm. If it's not present we
  use a slower fallback, but if it's present we use the faster
  implementation.

This should mean that there's little-to-no impact on existing users of
`wasm-bindgen`, but this should continue to still work for wasm
interface types polyfills and such. Additionally the GC passes now work
in that they don't delete `__wbindgen_realloc` which we later try to
reference.
This depends on the anyref table and a function to allocate an index if
the anyref pass is running, so be sure to track that in the instruction
itself for GC rooting.
Or if you're otherwise not using anyref slices, don't force some
intrinsics to exist.
@alexcrichton
Copy link
Contributor Author

For some background on this, I've started working on some simple anyref/wasm-interface-types tests. These two passes are pretty 'early days' at this point and need a fair amount of improvement. One of the first things I identified is that, as usual, it's always been a nightmare managing the GC passes in wasm-bindgen. With the recent refactoring and embedding walrus ids in instructions I wanted to try to fix this once-and-for all. We should now have proper rooting that's not "whatever the JS output happens to do" but rather the wasm module itself can be properly GC'd at any time.

This is not intended to change any user-facing functionality of wasm-bindgen today. In doing all this though I got paranoid that I was regressing small outputs and including extra items in the JS/wasm by accident. To assuage my fear of this I started to add a new test suite to wasm bindgen which checks the exhaustive output of wasm-bindgen itself, both the wasm and the JS file. The wasm is sanitized to not be so dependent on the compiler revision, but the idea here is that in a PR we can see the effect that the change has on the generated wasm and/or the generated JS file. This will also help prevent regressing situations where wasm-bindgen includes an intrinsic in a module by accident.

So overall this is in general a test suite being added, but lots of improvements as well to get the anyref pass working better with GC and such.

Looks like these values adjust in slight but insignificant ways over
time
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

None => continue,
};
match import {
AuxImport::Intrinsic(Intrinsic::AnyrefHeapLiveCount) => {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These nested loops are really calling for some kind of traversal / walker or something with filtering support for adapter instructions...

@@ -280,15 +279,6 @@ impl Bindgen {
}
};

// Our threads and multi-value xforms rely on the presence of the stack
// pointer, so temporarily export it so that our many GC's don't remove
// it before the xform runs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@alexcrichton alexcrichton merged commit d7a4a77 into rustwasm:master Dec 4, 2019
@alexcrichton alexcrichton deleted the reference-tests branch December 4, 2019 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants