Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for WASM 128-bit SIMD #56

Merged
merged 11 commits into from
Jan 10, 2022
Merged

Conversation

almann
Copy link
Contributor

@almann almann commented Jan 2, 2022

There are some TODOs (see below), but the implementation and test suite integration has been done.

  • Adds feature wasm32_simd128 consistent with aarch64_neon as there is no
    auto-detection for SIMD in WASM.
  • Updates README.md with WASM build instructions.
  • Integrates with tests--though the should_panic test case will be ignored
    due to limitations of the test driver with wasm32-wasi.
  • Adds ignore for IntelliJ based IDEs (e.g., CLion).

TODO

  • Integrate with benchmarks. Naively targeting wasm32-wasi for the benchmarks have issues with both wasmer and wasmtime. I suspect this is partially because of criterion expecting wasm32 targets to run to have wasm-bindgen. One option would be to compile a shim of the library as a staticlib crate targeting wasm32-unknown-unknown and using wasmer or wasmtime to compile/link the function and benchmark it in the current framework.
  • Validate inlining.
  • Integrate with CI workflow scripts.
  • Run the fuzzer.

Implementation for #36

* Adds feature `wasm32_simd128` consistent with `aarch64_neon` as there is no
  auto-detection for SIMD in WASM.
* Updates `README.md` with WASM build instructions.
* Integrates with tests--though the `should_panic` test case will be ignored
  due to limitations of the test driver with `wasm32-wasi`.
* Adds ignore for IntelliJ based IDEs (e.g., CLion).

Implementation for rusticstuff#36
This adds a small `cdylib` shim library  around simdutf8 with cargo
configuration to target `wasm32-unknown-unknown`.

The benchmarks are augmented to optionally embed Wasmer and at compile
time build the WASM shim and embed it in the benchmarks, compile the
WASM module copy in the test slice and invoke the shim routine to
benchmark it.

A limitation of this current approach is that it measures the
overhead of calling across the WASM runtime boundary for each iteration.

Another limitation is that it is currently using the default cranelift
backend for Wasmer (which is the backend for wasmtime), but the LLVM
backend is more performant according to Wasmer's documentation.  The
benchmarks still allow you to get a reasonable set of expectation of
performance (relative numbers against std vs basic for example).
@almann
Copy link
Contributor Author

almann commented Jan 3, 2022

I've added preliminary benchmark integrations in this commit.

The approach is basically what I posited in the TODO above--I cross compile a small C-ABI shim crate targeting wasm32-unknown-unknown into a cdylib and then embed the compiled WASM module into the executable. At runtime, the benchmark compiles the WASM module and copies the benchmark string into the instance's linear memory, and then proceeds to benchmark calling into the appropriate shim function through the WASM runtime's interface (which also has the minor downside of benchmarking the overhead of calling a function through the WASM runtime). This is good enough to make sure that the relative usage of SIMD is actually doing something and to measure the relative speed up on a given WASM runtime (mileage may vary in different VMs on different host architectures).

On my NUC (Intel i7-10710U running Ubuntu 20.04 with performance governor on) I got some promising benchmarks that demonstrate that the WASM SIMD is definitely performing better.

$ for X in std compat basic; do (cargo bench --features=simdutf8_wasm --bench="throughput_wasm_$X" -- --save-baseline wasm-$X); done

image

Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.

@hkratz
Copy link
Contributor

hkratz commented Jan 3, 2022

First of all: Great work, thank you so much! I will do a first review shortly.

Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.

You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.

Copy link
Contributor

@hkratz hkratz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

What I am not 100% clear on is, if we want to provide a knob so that the library user can turn on the simd implementation at runtime without having to compile with the +simd128 target feature, because for browsers there is wasm-feature-detect which could be used to check for simd availability. What do you think?

src/implementation/wasm32/simd128.rs Outdated Show resolved Hide resolved
src/implementation/algorithm.rs Outdated Show resolved Hide resolved
src/implementation/wasm32/simd128.rs Show resolved Hide resolved
src/implementation/wasm32/simd128.rs Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/implementation/wasm32/simd128.rs Show resolved Hide resolved
tests/tests.rs Show resolved Hide resolved
@almann
Copy link
Contributor Author

almann commented Jan 3, 2022

You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.

Excellent, I have pushed that commit to the PR.

@almann
Copy link
Contributor Author

almann commented Jan 3, 2022

What I am not 100% clear on is, if we want to provide a knob so that the library user can turn on the simd implementation at runtime without having to compile with the +simd128 target feature, because for browsers there is wasm-feature-detect which could be used to check for simd availability. What do you think?

My current understanding (which could be wrong) of WASM ISA extensions such as SIMD is is that there is no way within WASM for us to detect this, so the code targeting WASM itself cannot detect SIMD or not--in the host system (e.g., a Javascript host) there may or may not be such a facility, but it is not clear to me how we could reliably interact with it from the library in a portable (as in non-JS engines) way.

I do think from the library's perspective, we should definitely allow static flexibility at a minimum because ultimately we want to be able target any WASM runtime irrespective of its native capabilities.

almann and others added 4 commits January 3, 2022 11:08
Co-authored-by: Hans Kratz <hans@appfour.com>
Removes the `wasm32_simd128` feature in favor of controlling this
behavior with just `target_feature = simd128` which is consistent with
x86 `no_std`.

Updates the documentation to explain the requirement of the target
feature to drive selection and moves test/dev docs for WASM into its own
file as to not clutter the README.

Also makes one minor change to the bench build script to make clippy
happy.
This adds two end-user features: `simdutf8_wasm_cranelift` and
`simdutf8_wasm_llvm`.  These features enable the WASM benchmarking and
select the appropriate Wasmer backend.

The benchmarking doc has been updated with more detailed instructions
for the WASM setup as the LLVM backend requires more setup and it makes
clearer how to run all the API benchmarks under WASM.

Note that the Singlepass backend was not included as it failed to load
our SIMD code even though Wasmer docs indicate that Singlepass supports
simd128.
Also adds inlining tests for WASM builds.
@almann
Copy link
Contributor Author

almann commented Jan 4, 2022

Added the CI including inlining tests, I had it configured in another branch to run with that branch, here was a run of the workflow:

https://github.com/almann/simdutf8/actions/runs/1655415697

Comment on lines +470 to +474
#[cfg(all(
feature = "public_imp",
target_arch = "wasm32",
target_feature = "simd128"
))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI integration exposed this bug--the test should only be exposed if WASM SIMD is enabled.

@almann almann requested a review from hkratz January 4, 2022 06:30
Updated `wasm-runner` fixes the issue. Adds verbosity to CI WASM run
to make it clearer as to what is being run.
@hkratz
Copy link
Contributor

hkratz commented Jan 5, 2022

Looking good! It will be a few days until I have time to review the rest though.

@hkratz
Copy link
Contributor

hkratz commented Jan 7, 2022

@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: bheisler/criterion.rs#461 (comment).

@almann
Copy link
Contributor Author

almann commented Jan 7, 2022

@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: bheisler/criterion.rs#461 (comment).

Diving into that issue and the associated PR in the downstream dependency (plotters), is that both are still open (and I think I saw this when I was trying to target the benchmarks to wasm32-wasi). The problem I had run into against released criterion.rs was that the downstream dependency for wasm32 targetted code assumes wasm-pack which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.

@hkratz, did I maybe miss something here?

Refactors underlying wasm code to be generic with respect to wasmer
and wasmtime.  Renames feature flags to be more in line with the
different WASM backends to benchmark.

Updates `BENCHMARKING.md` with the changes.
@almann
Copy link
Contributor Author

almann commented Jan 8, 2022

I also added Wasmtime support to the benchmarks in 48eacc4.

I found actually benchmarks on the different WASM runtimes to be of interest, so assuming we keep the shim approach, I think these benchmarks are useful and a user can select whichever one (or all of them) as they see fit.

For example here is the comparison of Wasmer Cranelift/LLVM back-ends vs. Wasmtime (which only has a Cranelift backend) for the basic validator (on Ubuntu 20.04, Ryzen 3950X [Zen2]):

image

image

The caveat of the shim benchmarks apply, but it does suggest interesting differences between runtime and this code.

@almann
Copy link
Contributor Author

almann commented Jan 9, 2022

Diving into that issue and the associated PR in the downstream dependency (plotters), is that both are still open (and I think I saw this when I was trying to target the benchmarks to wasm32-wasi). The problem I had run into against released criterion.rs was that the downstream dependency for wasm32 targetted code assumes wasm-pack which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.

@hkratz, did I maybe miss something here?

FWIW, I tried playing around with this. I can definitely patch around plotters (check it out in a submodule, patch it with the PR above and make sure the version lines up), but I run into problems with rayon needing a thread-pool. A comment in the PR above alludes to rayon/plotters being optional, but I see no such evidence of this in criterion's TOML file:

https://github.com/bheisler/criterion.rs/blob/4e773a3b8523a73e7105e11f0b2d4b545827712e/Cargo.toml#L35

https://github.com/bheisler/criterion.rs/blob/4e773a3b8523a73e7105e11f0b2d4b545827712e/Cargo.toml#L42-L45

RUSTFLAGS="-C target-feature=+simd128" CARGO_TARGET_WASM32_WASI_RUNNER="wasm-runner wasmer" cargo bench --no-default-features --target=wasm32-wasi --bench=throughput_basic

...

thread 'main' panicked at 'The global thread pool has not been initialized.: ThreadPoolBuildError { kind: IOError(Error { kind: Unsupported, message: "operation not supported on this platform" }) }', /home/almann/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:170:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: failed to run `/home/almann/CLionProjects/simdutf8/bench/target/wasm32-wasi/release/deps/throughput_basic-716ee0bc7dd8261c.wasm`
│   1: RuntimeError: unreachable
           at __rust_start_panic (throughput_basic-716ee0bc7dd8261c.wasm[2750]:0x1dd651)
           at rust_panic (throughput_basic-716ee0bc7dd8261c.wasm[2728]:0x1dcd76)
           at std::panicking::rust_panic_with_hook::heefe90fce7240a03 (throughput_basic-716ee0bc7dd8261c.wasm[2721]:0x1dc9e3)
           at std::panicking::begin_panic_handler::{{closure}}::h4f5bcef500c8e46d (throughput_basic-716ee0bc7dd8261c.wasm[2702]:0x1dbab7)
           at std::sys_common::backtrace::__rust_end_short_backtrace::h53dbd65c57a9d0e6 (throughput_basic-716ee0bc7dd8261c.wasm[2701]:0x1dba06)
           at rust_begin_unwind (throughput_basic-716ee0bc7dd8261c.wasm[2720]:0x1dc531)
           at core::panicking::panic_fmt::h92e73467e6a2c091 (throughput_basic-716ee0bc7dd8261c.wasm[2894]:0x1eb224)
           at core::result::unwrap_failed::h03dbf875e0072fe2 (throughput_basic-716ee0bc7dd8261c.wasm[2921]:0x1ec477)
           at rayon_core::registry::global_registry::hb1f293f2c8dfd05b (throughput_basic-716ee0bc7dd8261c.wasm[1292]:0x12d0e8)
           at rayon_core::current_num_threads::h35bc7a6b8dbf8c63 (throughput_basic-716ee0bc7dd8261c.wasm[1303]:0x12e058)
           at rayon::iter::plumbing::bridge::ha8b8b23fcd9b38a6 (throughput_basic-716ee0bc7dd8261c.wasm[1021]:0x1058ae)
           at criterion::analysis::estimates::h6521ee9def717bc3 (throughput_basic-716ee0bc7dd8261c.wasm[659]:0xb4089)
           at criterion::analysis::common::h567d2c8475d0f44a (throughput_basic-716ee0bc7dd8261c.wasm[244]:0x22eb7)
           at criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input::hb0a4d9a4d0aa4552 (throughput_basic-716ee0bc7dd8261c.wasm[254]:0x258b5)
           at simdutf8_bench::bench_input::hd027083377a798b4 (throughput_basic-716ee0bc7dd8261c.wasm[115]:0x11047)
           at simdutf8_bench::criterion_benchmark::hf2c8a1c63f585fec (throughput_basic-716ee0bc7dd8261c.wasm[116]:0x1143d)
           at throughput_basic::main::h9dc93b5f5255a651 (throughput_basic-716ee0bc7dd8261c.wasm[245]:0x24030)
           at std::sys_common::backtrace::__rust_begin_short_backtrace::h2c5b7aaf52faf128 (throughput_basic-716ee0bc7dd8261c.wasm[127]:0x1234b)
           at std::rt::lang_start::{{closure}}::h557e866ad9f65275 (throughput_basic-716ee0bc7dd8261c.wasm[147]:0x13ecd)
           at std::rt::lang_start_internal::h44883fd0d3691d00 (throughput_basic-716ee0bc7dd8261c.wasm[2697]:0x1db731)
           at __original_main (throughput_basic-716ee0bc7dd8261c.wasm[246]:0x240dd)
           at _start (throughput_basic-716ee0bc7dd8261c.wasm[20]:0x18e8)
           at _start.command_export (throughput_basic-716ee0bc7dd8261c.wasm[3016]:0x1f37d1)

Copy link
Contributor

@hkratz hkratz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, criterion unconditionally uses rayon for its analysis. I was misled by the comment I linked. The WASM benchmarking is a bit more complex than I would like but I don't see a way around it and it is contained in the bench.

So once you fix the nits from this review I will merge this PR. Fuzz testing can be done independently.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@almann
Copy link
Contributor Author

almann commented Jan 9, 2022

FWIW, The CI failure appears to be transient, I pushed to main on my fork and it ran okay.

https://github.com/almann/simdutf8/actions/runs/1674697191

Looking at the error--there was a problem downloading Wasmer, which didn't fail that step (but failed when attempting to run the tests as the runtime was missing):

   (node:1899) UnhandledPromiseRejectionWarning: RequestError: connect ECONNREFUSED 52.204.121.99:443
      at ClientRequest.<anonymous> (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:104424)
      at Object.onceWrapper (events.js:300:26)
      at ClientRequest.emit (events.js:215:7)
      at ClientRequest.e.emit (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:38640)
      at TLSSocket.socketErrorListener (_http_client.js:406:9)
      at TLSSocket.emit (events.js:210:5)
      at emitErrorNT (internal/streams/destroy.js:92:8)
      at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
      at processTicksAndRejections (internal/process/task_queues.js:80:21)
      at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14)
  (node:1899) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
  (node:1899) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

@almann almann requested a review from hkratz January 9, 2022 22:43
Copy link
Contributor

@hkratz hkratz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for implementing this!

@hkratz hkratz merged commit 8145b58 into rusticstuff:main Jan 10, 2022
almann added a commit to almann/simdutf8 that referenced this pull request Jan 10, 2022
In rusticstuff#56, some code review suggestions got a bit mangled, this was fixed
in `lib.rs`, but not in the corresponding README.
@almann almann deleted the wasm-stage branch January 10, 2022 20:19
hkratz pushed a commit that referenced this pull request Jan 11, 2022
In #56, some code review suggestions got a bit mangled, this was fixed
in `lib.rs`, but not in the corresponding README.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants