Adds support for WASM 128-bit SIMD #56

almann · 2022-01-02T00:15:10Z

There are some TODOs (see below), but the implementation and test suite integration has been done.

Adds feature wasm32_simd128 consistent with aarch64_neon as there is no
auto-detection for SIMD in WASM.
Updates README.md with WASM build instructions.
Integrates with tests--though the should_panic test case will be ignored
due to limitations of the test driver with wasm32-wasi.
Adds ignore for IntelliJ based IDEs (e.g., CLion).

TODO

Integrate with benchmarks. Naively targeting wasm32-wasi for the benchmarks have issues with both wasmer and wasmtime. I suspect this is partially because of criterion expecting wasm32 targets to run to have wasm-bindgen. One option would be to compile a shim of the library as a staticlib crate targeting wasm32-unknown-unknown and using wasmer or wasmtime to compile/link the function and benchmark it in the current framework.
Validate inlining.
Integrate with CI workflow scripts.
Run the fuzzer.

Implementation for #36

* Adds feature `wasm32_simd128` consistent with `aarch64_neon` as there is no auto-detection for SIMD in WASM. * Updates `README.md` with WASM build instructions. * Integrates with tests--though the `should_panic` test case will be ignored due to limitations of the test driver with `wasm32-wasi`. * Adds ignore for IntelliJ based IDEs (e.g., CLion). Implementation for rusticstuff#36

This adds a small `cdylib` shim library around simdutf8 with cargo configuration to target `wasm32-unknown-unknown`. The benchmarks are augmented to optionally embed Wasmer and at compile time build the WASM shim and embed it in the benchmarks, compile the WASM module copy in the test slice and invoke the shim routine to benchmark it. A limitation of this current approach is that it measures the overhead of calling across the WASM runtime boundary for each iteration. Another limitation is that it is currently using the default cranelift backend for Wasmer (which is the backend for wasmtime), but the LLVM backend is more performant according to Wasmer's documentation. The benchmarks still allow you to get a reasonable set of expectation of performance (relative numbers against std vs basic for example).

almann · 2022-01-03T06:19:27Z

I've added preliminary benchmark integrations in this commit.

The approach is basically what I posited in the TODO above--I cross compile a small C-ABI shim crate targeting wasm32-unknown-unknown into a cdylib and then embed the compiled WASM module into the executable. At runtime, the benchmark compiles the WASM module and copies the benchmark string into the instance's linear memory, and then proceeds to benchmark calling into the appropriate shim function through the WASM runtime's interface (which also has the minor downside of benchmarking the overhead of calling a function through the WASM runtime). This is good enough to make sure that the relative usage of SIMD is actually doing something and to measure the relative speed up on a given WASM runtime (mileage may vary in different VMs on different host architectures).

On my NUC (Intel i7-10710U running Ubuntu 20.04 with performance governor on) I got some promising benchmarks that demonstrate that the WASM SIMD is definitely performing better.

$ for X in std compat basic; do (cargo bench --features=simdutf8_wasm --bench="throughput_wasm_$X" -- --save-baseline wasm-$X); done

Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.

hkratz · 2022-01-03T10:43:39Z

First of all: Great work, thank you so much! I will do a first review shortly.

Do you want me to add this feature to this PR, or do you want me to break them up (e.g., have this PR stand on its own, and have follow up PRs and make the task list above issues)? I am very sympathetic large PRs so please let me know how you'd like me to proceed.

You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.

hkratz

Thanks again!

What I am not 100% clear on is, if we want to provide a knob so that the library user can turn on the simd implementation at runtime without having to compile with the +simd128 target feature, because for browsers there is wasm-feature-detect which could be used to check for simd availability. What do you think?

src/implementation/wasm32/simd128.rs

src/implementation/algorithm.rs

src/implementation/wasm32/simd128.rs

README.md

src/implementation/wasm32/simd128.rs

tests/tests.rs

almann · 2022-01-03T17:17:44Z

You can put it in this PR, just don't rebase for now, so I know which commits I have already looked at.

Excellent, I have pushed that commit to the PR.

almann · 2022-01-03T17:28:32Z

What I am not 100% clear on is, if we want to provide a knob so that the library user can turn on the simd implementation at runtime without having to compile with the +simd128 target feature, because for browsers there is wasm-feature-detect which could be used to check for simd availability. What do you think?

My current understanding (which could be wrong) of WASM ISA extensions such as SIMD is is that there is no way within WASM for us to detect this, so the code targeting WASM itself cannot detect SIMD or not--in the host system (e.g., a Javascript host) there may or may not be such a facility, but it is not clear to me how we could reliably interact with it from the library in a portable (as in non-JS engines) way.

I do think from the library's perspective, we should definitely allow static flexibility at a minimum because ultimately we want to be able target any WASM runtime irrespective of its native capabilities.

Co-authored-by: Hans Kratz <hans@appfour.com>

Removes the `wasm32_simd128` feature in favor of controlling this behavior with just `target_feature = simd128` which is consistent with x86 `no_std`. Updates the documentation to explain the requirement of the target feature to drive selection and moves test/dev docs for WASM into its own file as to not clutter the README. Also makes one minor change to the bench build script to make clippy happy.

This adds two end-user features: `simdutf8_wasm_cranelift` and `simdutf8_wasm_llvm`. These features enable the WASM benchmarking and select the appropriate Wasmer backend. The benchmarking doc has been updated with more detailed instructions for the WASM setup as the LLVM backend requires more setup and it makes clearer how to run all the API benchmarks under WASM. Note that the Singlepass backend was not included as it failed to load our SIMD code even though Wasmer docs indicate that Singlepass supports simd128.

Also adds inlining tests for WASM builds.

almann · 2022-01-04T06:20:50Z

Added the CI including inlining tests, I had it configured in another branch to run with that branch, here was a run of the workflow:

https://github.com/almann/simdutf8/actions/runs/1655415697

almann · 2022-01-04T06:28:34Z

tests/tests.rs

+#[cfg(all(
+    feature = "public_imp",
+    target_arch = "wasm32",
+    target_feature = "simd128"
+))]


The CI integration exposed this bug--the test should only be exposed if WASM SIMD is enabled.

Updated `wasm-runner` fixes the issue. Adds verbosity to CI WASM run to make it clearer as to what is being run.

hkratz · 2022-01-05T14:19:21Z

Looking good! It will be a few days until I have time to review the rest though.

hkratz · 2022-01-07T09:47:58Z

@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: bheisler/criterion.rs#461 (comment).

almann · 2022-01-07T19:22:07Z

@almann Are you sure there is no way around the shim for benchmarking? It looks like there is some basic support for wasi benchmarking in the criterion master branch: bheisler/criterion.rs#461 (comment).

Diving into that issue and the associated PR in the downstream dependency (plotters), is that both are still open (and I think I saw this when I was trying to target the benchmarks to wasm32-wasi). The problem I had run into against released criterion.rs was that the downstream dependency for wasm32 targetted code assumes wasm-pack which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.

@hkratz, did I maybe miss something here?

Refactors underlying wasm code to be generic with respect to wasmer and wasmtime. Renames feature flags to be more in line with the different WASM backends to benchmark. Updates `BENCHMARKING.md` with the changes.

almann · 2022-01-08T06:43:03Z

I also added Wasmtime support to the benchmarks in 48eacc4.

I found actually benchmarks on the different WASM runtimes to be of interest, so assuming we keep the shim approach, I think these benchmarks are useful and a user can select whichever one (or all of them) as they see fit.

For example here is the comparison of Wasmer Cranelift/LLVM back-ends vs. Wasmtime (which only has a Cranelift backend) for the basic validator (on Ubuntu 20.04, Ryzen 3950X [Zen2]):

The caveat of the shim benchmarks apply, but it does suggest interesting differences between runtime and this code.

almann · 2022-01-09T06:34:25Z

Diving into that issue and the associated PR in the downstream dependency (plotters), is that both are still open (and I think I saw this when I was trying to target the benchmarks to wasm32-wasi). The problem I had run into against released criterion.rs was that the downstream dependency for wasm32 targetted code assumes wasm-pack which leads to a bunch of undefined symbols. Looking more closely at the PR files, the patch looks pretty trivial and explains my experience, with the build.

@hkratz, did I maybe miss something here?

FWIW, I tried playing around with this. I can definitely patch around plotters (check it out in a submodule, patch it with the PR above and make sure the version lines up), but I run into problems with rayon needing a thread-pool. A comment in the PR above alludes to rayon/plotters being optional, but I see no such evidence of this in criterion's TOML file:

https://github.com/bheisler/criterion.rs/blob/4e773a3b8523a73e7105e11f0b2d4b545827712e/Cargo.toml#L35

https://github.com/bheisler/criterion.rs/blob/4e773a3b8523a73e7105e11f0b2d4b545827712e/Cargo.toml#L42-L45

RUSTFLAGS="-C target-feature=+simd128" CARGO_TARGET_WASM32_WASI_RUNNER="wasm-runner wasmer" cargo bench --no-default-features --target=wasm32-wasi --bench=throughput_basic

...

thread 'main' panicked at 'The global thread pool has not been initialized.: ThreadPoolBuildError { kind: IOError(Error { kind: Unsupported, message: "operation not supported on this platform" }) }', /home/almann/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/registry.rs:170:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: failed to run `/home/almann/CLionProjects/simdutf8/bench/target/wasm32-wasi/release/deps/throughput_basic-716ee0bc7dd8261c.wasm`
│   1: RuntimeError: unreachable
           at __rust_start_panic (throughput_basic-716ee0bc7dd8261c.wasm[2750]:0x1dd651)
           at rust_panic (throughput_basic-716ee0bc7dd8261c.wasm[2728]:0x1dcd76)
           at std::panicking::rust_panic_with_hook::heefe90fce7240a03 (throughput_basic-716ee0bc7dd8261c.wasm[2721]:0x1dc9e3)
           at std::panicking::begin_panic_handler::{{closure}}::h4f5bcef500c8e46d (throughput_basic-716ee0bc7dd8261c.wasm[2702]:0x1dbab7)
           at std::sys_common::backtrace::__rust_end_short_backtrace::h53dbd65c57a9d0e6 (throughput_basic-716ee0bc7dd8261c.wasm[2701]:0x1dba06)
           at rust_begin_unwind (throughput_basic-716ee0bc7dd8261c.wasm[2720]:0x1dc531)
           at core::panicking::panic_fmt::h92e73467e6a2c091 (throughput_basic-716ee0bc7dd8261c.wasm[2894]:0x1eb224)
           at core::result::unwrap_failed::h03dbf875e0072fe2 (throughput_basic-716ee0bc7dd8261c.wasm[2921]:0x1ec477)
           at rayon_core::registry::global_registry::hb1f293f2c8dfd05b (throughput_basic-716ee0bc7dd8261c.wasm[1292]:0x12d0e8)
           at rayon_core::current_num_threads::h35bc7a6b8dbf8c63 (throughput_basic-716ee0bc7dd8261c.wasm[1303]:0x12e058)
           at rayon::iter::plumbing::bridge::ha8b8b23fcd9b38a6 (throughput_basic-716ee0bc7dd8261c.wasm[1021]:0x1058ae)
           at criterion::analysis::estimates::h6521ee9def717bc3 (throughput_basic-716ee0bc7dd8261c.wasm[659]:0xb4089)
           at criterion::analysis::common::h567d2c8475d0f44a (throughput_basic-716ee0bc7dd8261c.wasm[244]:0x22eb7)
           at criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input::hb0a4d9a4d0aa4552 (throughput_basic-716ee0bc7dd8261c.wasm[254]:0x258b5)
           at simdutf8_bench::bench_input::hd027083377a798b4 (throughput_basic-716ee0bc7dd8261c.wasm[115]:0x11047)
           at simdutf8_bench::criterion_benchmark::hf2c8a1c63f585fec (throughput_basic-716ee0bc7dd8261c.wasm[116]:0x1143d)
           at throughput_basic::main::h9dc93b5f5255a651 (throughput_basic-716ee0bc7dd8261c.wasm[245]:0x24030)
           at std::sys_common::backtrace::__rust_begin_short_backtrace::h2c5b7aaf52faf128 (throughput_basic-716ee0bc7dd8261c.wasm[127]:0x1234b)
           at std::rt::lang_start::{{closure}}::h557e866ad9f65275 (throughput_basic-716ee0bc7dd8261c.wasm[147]:0x13ecd)
           at std::rt::lang_start_internal::h44883fd0d3691d00 (throughput_basic-716ee0bc7dd8261c.wasm[2697]:0x1db731)
           at __original_main (throughput_basic-716ee0bc7dd8261c.wasm[246]:0x240dd)
           at _start (throughput_basic-716ee0bc7dd8261c.wasm[20]:0x18e8)
           at _start.command_export (throughput_basic-716ee0bc7dd8261c.wasm[3016]:0x1f37d1)

hkratz

You are right, criterion unconditionally uses rayon for its analysis. I was misled by the comment I linked. The WASM benchmarking is a bit more complex than I would like but I don't see a way around it and it is contained in the bench.

So once you fix the nits from this review I will merge this PR. Fuzz testing can be done independently.

README.md

src/lib.rs

Co-authored-by: Hans Kratz <hans@appfour.com>

almann · 2022-01-09T20:51:28Z

FWIW, The CI failure appears to be transient, I pushed to main on my fork and it ran okay.

https://github.com/almann/simdutf8/actions/runs/1674697191

Looking at the error--there was a problem downloading Wasmer, which didn't fail that step (but failed when attempting to run the tests as the runtime was missing):

   (node:1899) UnhandledPromiseRejectionWarning: RequestError: connect ECONNREFUSED 52.204.121.99:443
      at ClientRequest.<anonymous> (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:104424)
      at Object.onceWrapper (events.js:300:26)
      at ClientRequest.emit (events.js:215:7)
      at ClientRequest.e.emit (/Users/runner/work/_actions/wasmerio/setup-wasmer/v1/dist/index.js:1:38640)
      at TLSSocket.socketErrorListener (_http_client.js:406:9)
      at TLSSocket.emit (events.js:210:5)
      at emitErrorNT (internal/streams/destroy.js:92:8)
      at emitErrorAndCloseNT (internal/streams/destroy.js:60:3)
      at processTicksAndRejections (internal/process/task_queues.js:80:21)
      at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1129:14)
  (node:1899) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
  (node:1899) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

hkratz

Thanks again for implementing this!

In rusticstuff#56, some code review suggestions got a bit mangled, this was fixed in `lib.rs`, but not in the corresponding README.

In #56, some code review suggestions got a bit mangled, this was fixed in `lib.rs`, but not in the corresponding README.

almann added 3 commits January 1, 2022 15:44

Adds missing rustdoc documentation from README

a61bf27

hkratz requested changes Jan 3, 2022

View reviewed changes

almann and others added 4 commits January 3, 2022 11:08

Applies code review suggestions

313d2d6

Co-authored-by: Hans Kratz <hans@appfour.com>

Adds CI configuration for WASM

87d354c

Also adds inlining tests for WASM builds.

almann commented Jan 4, 2022

View reviewed changes

almann requested a review from hkratz January 4, 2022 06:30

Adds 1.38.0 back to wasm32-wasi test CI

2237429

Updated `wasm-runner` fixes the issue. Adds verbosity to CI WASM run to make it clearer as to what is being run.

almann mentioned this pull request Jan 5, 2022

Makes the panic test conditional on aarch64_neon #57

Merged

Implements benchmark support for wasmtime

48eacc4

Refactors underlying wasm code to be generic with respect to wasmer and wasmtime. Renames feature flags to be more in line with the different WASM backends to benchmark. Updates `BENCHMARKING.md` with the changes.

hkratz requested changes Jan 9, 2022

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

almann and others added 2 commits January 9, 2022 09:42

Applies code review suggestions

40402cc

Co-authored-by: Hans Kratz <hans@appfour.com>

Fixed errant backtick from CR feedback suggestions

5d744c0

almann requested a review from hkratz January 9, 2022 22:43

hkratz approved these changes Jan 10, 2022

View reviewed changes

hkratz merged commit 8145b58 into rusticstuff:main Jan 10, 2022

almann added a commit to almann/simdutf8 that referenced this pull request Jan 10, 2022

Fixes mangled markdown in README.md

a5c9a42

In rusticstuff#56, some code review suggestions got a bit mangled, this was fixed in `lib.rs`, but not in the corresponding README.

almann mentioned this pull request Jan 10, 2022

Fixes mangled markdown in README.md #58

Merged

almann deleted the wasm-stage branch January 10, 2022 20:19

almann mentioned this pull request Jan 10, 2022

Run Fuzzer on wasm32 Targeted Code #59

Open

hkratz pushed a commit that referenced this pull request Jan 11, 2022

Fixes mangled markdown in README.md

18d9fab

In #56, some code review suggestions got a bit mangled, this was fixed in `lib.rs`, but not in the corresponding README.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds support for WASM 128-bit SIMD #56

Adds support for WASM 128-bit SIMD #56

almann commented Jan 2, 2022 •

edited

Loading

almann commented Jan 3, 2022 •

edited

Loading

hkratz commented Jan 3, 2022 •

edited

Loading

hkratz left a comment

almann commented Jan 3, 2022

almann commented Jan 3, 2022

almann commented Jan 4, 2022 •

edited

Loading

almann Jan 4, 2022

hkratz commented Jan 5, 2022 •

edited

Loading

hkratz commented Jan 7, 2022

almann commented Jan 7, 2022 •

edited

Loading

almann commented Jan 8, 2022

almann commented Jan 9, 2022

hkratz left a comment •

edited

Loading

almann commented Jan 9, 2022

hkratz left a comment

Adds support for WASM 128-bit SIMD #56

Adds support for WASM 128-bit SIMD #56

Conversation

almann commented Jan 2, 2022 • edited Loading

TODO

almann commented Jan 3, 2022 • edited Loading

hkratz commented Jan 3, 2022 • edited Loading

hkratz left a comment

Choose a reason for hiding this comment

almann commented Jan 3, 2022

almann commented Jan 3, 2022

almann commented Jan 4, 2022 • edited Loading

almann Jan 4, 2022

Choose a reason for hiding this comment

hkratz commented Jan 5, 2022 • edited Loading

hkratz commented Jan 7, 2022

almann commented Jan 7, 2022 • edited Loading

almann commented Jan 8, 2022

almann commented Jan 9, 2022

hkratz left a comment • edited Loading

Choose a reason for hiding this comment

almann commented Jan 9, 2022

hkratz left a comment

Choose a reason for hiding this comment

almann commented Jan 2, 2022 •

edited

Loading

almann commented Jan 3, 2022 •

edited

Loading

hkratz commented Jan 3, 2022 •

edited

Loading

almann commented Jan 4, 2022 •

edited

Loading

hkratz commented Jan 5, 2022 •

edited

Loading

almann commented Jan 7, 2022 •

edited

Loading

hkratz left a comment •

edited

Loading