Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wasm32 simd128 intrinsics #549

Merged
merged 18 commits into from Aug 15, 2018
Merged

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Aug 8, 2018

This adds all wasm32 simd128 intrinsics with the exception of integer saturating arithmetic.

Once rust-lang/rust#53179 lands we can enable the target feature and see what happens.

In the meantime, early reviews are welcome.

Basically, the WASM SIMD extension introduces a single SIMD vector type: v128. The intrinsics mostly [0] follow a {interpretation}.operation naming scheme, where:

  • {interpretation} specifies how the bytes of the SIMD vector are to be interpreted, e.g., as f32x4, i64x2, etc.
  • {operation} is the operation to be performed, e.g, add. The semantics of the operation depend on the interpretation of the vector type being used, e.g., floating point addition vs wrapping integer addition.

In Rust, these are translated to {interpretation}::{operation}, where:

  • interpetations are unit structs, e.g., struct i32x4
  • operations are inherent methods on unit structs, e.g, i32x4::add(v128, v128) -> v128 (recall that the only vector type in the ISA is v128).

And that's pretty much it. I'd say 90% of the ISA is trivial to implement. The problematic bit and pieces are:

  • v8x16.shuffle which takes a 128-bit wide immediate mode argument. Each byte of the argument takes values between 0-31, and there are 16 valid bytes, so... we can't constify! our way out of this one. Because the simd_shuffle32 takes a constant argument, we cannot really use it, so I decided to just use inline assembly and call it a day.

  • saturating arithmetic: rustc doesn't have the intrinsics for this yet. These should be trivial to add, and packed_simd needs these as well. So this isn't a big deal, it is just not part of this PR.

  • bitselect: bitselect interprets a v128 as a i1x128 mask, but we can't specify this type using repr(simd). The simd_select intrinsic is also not what we want here, because it takes a repr(simd) type like i8x16 and internally truncates it to i1x16 when creating a mask, but if we pass it a i128x1 it would produce an i1x1. I think we should probably add a simd_bitselect intrinsic to rustc that just does not perform the truncation - it is necessary for AVX-512 anyways to use an i64 mask as i1x64

  • saturating conversions: I don't think I have implemented these correctly, but since we can't really test WASM much yet, it is hard to tell

  • floating-point min / max : we are currently using minnum/maxnum here which depending on which IEEE 754 standard you are on (2008 vs 2018) behave differently, but in general I think t hat they do not propagate NaNs properly: if one of the two arguments is a NaN, they always return the other argument, while these should always return the NaN argument. I don't know what the best way to implement the correct behavior is yet. We could use select and/or fcmp directly or through a rustc instrinsic. Having these other implementations of min/max would be useful: packed_simd should probably provide both min/max and minnum/maxnum anyways.

  • floating-point neg: I've hacked my way around the neg, we should probably use an LLVM intrinsic that just flips the sign bit instead of multiplying by -1..

The most important issue:

  • testing: how do we test this? We need either a test runner for cargo, or a set up in which we put the tests in a wasm binary and use cretonne or something similar to compile them to x86/aarch and execute that in qemu.

[0] there are some exceptions at the end of the file when it comes to conversions, where {dst_interpretation}.{operation}/{src_interpretation}{:modifier} is used.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2018

cc @sunfish

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome to me, thanks so much for taking this on @gnzlbg!

I haven't thoroughly audited the implementations yet, but here's some thoughts I've got:

  • In general wasm is very different from x86 in that there's not really well-established prior art about how these intrinsics should be exposed or things like C signatures/names we should apply. We do, however, have a very well defined instruction set in the wasm standard. These instructions are well typed and easily map to functions, so I think it's worthwhile (and stabilizable!) to map directly to the instructions in the standard, using straightforward mappings to Rust idioms (as you've done here) where possible.

  • Instead of pub struct i64x2; and such, perhaps it could be pub mod i64x2? I don't think we need i64x2 to be a type, right? I really like the idea though of having the :: correspond to the . in wasm. It would also mean that we may want to consider renaming the memory intrinsics to their new names which have mem.* in the name (food for thought)

  • I like how all the functions are named precisely after the name of the instruction, that works well.

  • The #[assert_instr] macro isn't working here but I don't think it'll be too too hard to get it working with wasm-dis, I believe that's got all the necessary support for the SIMD instructions. I don't think that should block the initial implementation though, I can work on getting that done after this lands (when I get some time)

  • I'm surprised actually that asm! works for the wasm target!

  • Unlike x86, I don't think we can do runtime detection at all on wasm. Entire wasm modules are either valid or invalid, so there's no way to have some functions with SIMD not get used if the browser doesn't support SIMD. In that sense runtime detection must be done prior to instantiation in JS and then an appropriate module must be loaded, and projects which want to target browsers that do/don't use SIMD will need multiple builds of their crate

    Now this is super tricky with the standard library because the standard library won't enable this feature by default. That being said, I think it's very likely that we ship a second wasm standard library target which enables this (if it becomes popular) or we make it super easy to work with xargo making this a moot point.

    tl;dr; the #[cfg] on all functions is unfortunate but I believe that the wasm32 architecture simply requires it, we have no other choice

/// from `v1` when `1` and `v2` when `0`.
#[inline]
// #[target_feature(enable = "simd128")]
// FIXME: #[cfg_attr(test, assert_instr($id.bitselectnot))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bitselectnot/biselect/

// #[target_feature(enable = "simd128")]
// FIXME: #[cfg_attr(test, assert_instr($id.bitselectnot))]
pub unsafe fn bitselect(v1: v128, v2: v128, c: v128) -> v128 {
// FIXME: use llvm.select instead - we need to add a `simd_bitselect`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this FIXME mean that the implementation here is incorrect? If so, should this method perhaps be commented out for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is correct, but I haven't checked that it generates the appropriate v128.bitselect instruction. I only expect LLVM to do that if this gets optimized into an llvm.select, so I think we should just be generating that in the first place.

union U {
vec: self::sealed::$ivec_ty,
a: v128
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a particular reason to use unions instead of transmute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought / think that we can make these const_fn eventually, so unions here are forward compatible with that.

// #[target_feature(enable = "simd128")]
// FIXME: #[cfg_attr(test, assert_instr(v128.const, imm = [ImmByte::new(42); 16]))]
#[rustc_args_required_const(0)]
pub const unsafe fn const_(imm: [ImmByte; 16]) -> v128 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, how come this requires ImmByte? Couldn't this be [u8; 16]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just what the type in the WASM spec is called. I didn't wanted to start adding rules like "ImmByte is always translated to u8" but given that that's what I actually ended up doing while testing (ImmByte::new() is a pain), we might just as well add that rule. Maybe the same for the LaneIdx types.

Should we keep these as type aliases for consistency with the spec? E.g.

pub type ImmByte = u8;
pub type LaneIdx... = u8;
....

?

}
};
}
impl_laneidx!(LaneIdx2(u8): [0, 1] | /// A byte with values in the range 0–1 identifying a lane.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be really hard for us to stabilize these LandIdx* types unfortunately, would it be possible though to say it's UB to specify an out of bounds index and just take usize for indices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be really hard for us to stabilize these LandIdx* types unfortunately,

Why do you say so?

would it be possible though to say it's UB to specify an out of bounds index and just take usize for indices?

Duh, the docs don't say it but stating that if the value is out-of-bounds the behavior is undefined was the intent.

I went with the new types because it felt more conservative, and it should allow us to insert debug asserts in the future (undefined behavior here means we can do anything, including properly diagnosing the error at run-time).

Having been using these a bit while running the tests, I always just ended up transmuting u8s anyways, so I am myself unconvinced that this is worth it. (the spec says these are byte sized, so I think it makes sense to just make these u8 instead of usize here)

If we make them newtypes these should all be #[repr(transparent)] though as @cramertj pointed out.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

in that there's not really well-established prior art about how these intrinsics should be exposed or things like C signatures/names we should apply.

Yes, typically the platform vendor includes headers for this for their compilers, and those become the C API. I don't think Clang and GCC are there yet beyond having some __builtin_s in place, so this is all super unstable.

I think it's worthwhile (and stabilizable!) to map directly to the instructions in the standard,

Stabilizable, maybe some day. Right now I mostly wanted to be sure that our toolchain can handle these, that these are actually implementable (the spec was a bit ambiguous in some places), and that we can use these in packed_simd in case we need to workaround LLVM bugs. As the spec evolves, these should evolve as well.

Instead of pub struct i64x2; and such, perhaps it could be pub mod i64x2?

So this is a "chronological bug". I originally wanted the syntax to be the same as in the webassembly spec, that would mean i64x2.add(a, b) which would have i64x2 be a pub const i64x2: ... = ...; but then I arrived at some of the intrinsics with f32x4.convert_../i32x4:sat and just completely gave up on the idea of using the same syntax as the spec.

I tried modules, but a minor inconvenience is that I cannot reopen a module from the one it is defined is to add more stuff to it, like this:

// mod parent
mod child { ...stuff... }
mod child { ...more stuff... }  // ERROR

This makes it hard to incrementally grow the contents of the module with macros that are decoupled from one another (is it worth it to fill an issue for this or open an internal post?). Types makes this easy:

pub struct Foo;
impl Foo { ...stuff... }
impl Foo { ...more stuff... } // OK

So while I agree that modules would be better, and that we shouldn't put implementor inconvenience over user inconvenience, refactoring the type stuff to the module stuff when everything is implemented is easy, but incrementally growing the module stuff is hard. So I'd prefer to do this when we have resolved all other issues as a final cleanup step.

Thinking about C APIs, and using the same names as them, I think we should use i64x2_... instead of i64x2:: because :: would require C++ namespaces, but C does not have them, so it is pretty much impossible for these intrinsics to use :: in their C names, whenever they appear.

Sadly concatenating identifiers with macros is even messier than using modules :/ packed_simd heavily uses the interpolate_idents!(https://github.com/SkylerLipthay/interpolate_idents) proc macro for it, but since it is not in rustc, I'd be wary of using that here. TBH, I was expecting that macro to break every week, but I've been using it for months, and even though the crate is unmaintained, it hasn't broken a single time...

It would also mean that we may want to consider renaming the memory intrinsics to their new names which have mem.* in the name (food for thought)

We could do that exporting deprecated aliases to the old names to avoid breaking code for the time being.

I'm surprised actually that asm! works for the wasm target!

By works you mean that the syntax parses? I am adding a wasm-bindgen-test crate to stdsimd that actually exercises all of this and have found a lot of monomorphization time errors. Haven't arrived at shuffle yet, but it would surprise me if this really worked :D I only left it uncommented to see if it passed CI, but it seems that as long as these are not used, nothing really happens.

Unlike x86, I don't think we can do runtime detection at all on wasm. Entire wasm modules are either valid or invalid, so there's no way to have some functions with SIMD not get used if the browser doesn't support SIMD. In that sense runtime detection must be done prior to instantiation in JS and then an appropriate module must be loaded, and projects which want to target browsers that do/don't use SIMD will need multiple builds of their crate

So I think this is very unfortunate. As we have learned for the ARM ecosystem, this means that every target feature permutation is going to end up being its own "architecture" independent from the others, so with the dozen of WASM extensions in flight, we are going to easily land in the 100s and 1000s of WASM architectures that we are going to have to support as soon as these start releasing 1.1 and 2.0 versions of each ISA extension.

Do you know people that could raise these concerns with the WebAssembly working group?

WASM should have a way to state that "The following N bytes of the binary blob require feature F version x.y.z". This would allow code generators that do not support that to:

  • skip those
  • replace every jump to any code in there with a trap

That combined with an instruction that detects whether the feature is supported or not by the WASM "machine" would be enough.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

So the last commit adds run-time tests for all intrinsics using the wasm-bindgen-test crate (we still don't test that the disassembly is correct though).

The commit changes a couple of things:

  • the ImmBytes, LaneIdx, ... are now just type aliases to u8
  • v8x16::shuffle is now a macro - this means that we won't be able to stabilize this intrinsic, but without const function arguments or similar we cannot really make it anything else and I got tired of fighting against weird LLVM inline assembly errors
  • fixes the implementation of min/max intrinsics - it used to be minnum/maxnnum, but that was incorrect, now they do a vertical compare and a bitselect and they at least work correctly
  • there are still some weird errors in the saturating truncations where they fail to saturate properly, they work fine in all other cases
  • the float vertical comparisons failed to monomorphize before because they were returning a floating-point mask, so I've switched that to an integer mask (wasm doesn't differentiate so this shouldn't really matter)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

Summarizing the TODOS:

  • migrate intrinsics to modules e.g. mod i64x2 instead of using types pub struct i64x2;
  • consider renaming the memory intrinsics mem.*
  • use wasm-dis to enable assert_instr
  • saturating arithmetic (we need simd_{add,mul,sub}_with_overflow in rustc for these)
  • improve bitselect to use llvm.select
  • cfg() the whole simd128 module once target_feature = "simd128" lands in rustc

I am going to focus on getting CI green, and once that is done, the only blocker for this PR is the last point above (cfg() the whole simd128 module) since if we don't do that we might break rust upstream. All other points can be worked on incrementally afterwards.

@gnzlbg gnzlbg force-pushed the wasm_simd128 branch 4 times, most recently from 608bfe5 to 0bd5460 Compare August 9, 2018 14:38
@alexcrichton
Copy link
Member

@gnzlbg I've pushed up a commit which runs wasm tests like all the other tests (via docker and ci/run.sh), and while it looks like most of them are currently passing I think it's because the simd128 feature isn't enabled in LLVM, once that's turned on I'm seeing LLVM codegen errors all over the place :(

I've commented it out currently here -- https://github.com/gnzlbg/stdsimd/blob/608bfe588bd8b4b258f4bef50d7122489d05ac4d/ci/run.sh#L43

@alexcrichton
Copy link
Member

Oh oops looks like I should have told you first! In any case the commit gnzlbg@608bfe5 should put wasm on track to look like all the other targets (and be a bit faster to build with precompiled binaries)

@alexcrichton
Copy link
Member

All that makes sense to me, thanks @gnzlbg! For modules vs impls another point for modules is that #[assert_instr] doesn't work with impls :(

Do you know people that could raise these concerns with the WebAssembly working group?

I'm not sure! I can try to ask around though. @sunfish you may know more about this though?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

noooo, the build was finally green! :D

I think it's because the simd128 feature isn't enabled in LLVM

Yes, I haven't tried compiling with it enabled yet. I was waiting for it to become properly whitelisted.

We would have to get that working for the assert_instr tests to run, but now at least we have got most tests already in place.

For modules vs impls another point for modules is that #[assert_instr] doesn't work with impls :(

That's a good point. I'll do that next then to unblock work on assert_instr.

@gnzlbg gnzlbg force-pushed the wasm_simd128 branch 2 times, most recently from 371e779 to 2c0ccc3 Compare August 9, 2018 15:41
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

One problem with using modules instead of types, is that the module v128 would clash with the type v128.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

So the last commit uses modules for all interpretations except for v128, which is a type with inherent (without self) methods.

I think that for v128 we could add extra #[cfg(test)] functions that call the inherent methods and just use assert_instr instead of directly using assert_instr on the methods, right?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

cc @tlively

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

cc @sbc100


There seem to be some issues when lowering many of the intrinsics when simd128 is enabled. I'd expect the WASM backend to fall back to scalar code generation if one instruction is not supported, but operations like splat fail to lower. I'm trying to generate LLVM-IR to further debug the issue but rust / llvm chokes even in debug mode.

@sbc100
Copy link

sbc100 commented Aug 9, 2018

I think @tlively will be more useful than me here. I don't know much about the simd in the backend. Are you using some flag to enable the simd wam instructions? Are you generating the simd with intrinsics or via explicit simd types, or via auto-vectorization?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 9, 2018

@sbc100

Are you using some flag to enable the simd wam instructions?

simd128 is the target feature flag that I am trying

Are you generating the simd with intrinsics or via explicit simd types, or via auto-vectorization?

Currently most of the intrinsics via explicit simd types + generic operations on those, e.g. splat appears to lower to a shufflevector:

define hidden void @i8x16_splat(<1 x i128>* noalias nocapture sret dereferenceable(16), i8 %v) {
start:
%_3.0.vec.insert.i = insertelement <16 x i8> undef, i8 %v, i32 0
%_3.15.vec.insert.i = shufflevector <16 x i8> %_3.0.vec.insert.i, <16 x i8> undef, <16 x i32> zeroinitializer
%1 = bitcast <16 x i8> %_3.15.vec.insert.i to i128
%_2.sroa.0.0.vec.insert.i = insertelement <1 x i128> undef, i128 %1, i32 0
store <1 x i128> %_2.sroa.0.0.vec.insert.i, <1 x i128>* %0, align 16
ret void
}

but I could use an llvm.wasm.splat intrinsic if there is one and that's the preferred approach (e.g. sqrt explicitly uses llvm.sqrt...).

Looking at the IR, we see that I am currently representing v128 as <1 x i128>, and using insertelement, shufflevector, etc. on <16 x i8> for the splat operation. I can tune all this to whichever types or intrinsics the backend likes best. But I think the backend should be able to polyfill shufflevector and lower it to the appropriate instructions when possible.

All intrinsics are exercised at run-time and pass when simd128 is not enabled, so I guess the backend can already do this, but for some reason enabling simd128 makes it stop doing that.

@tlively
Copy link

tlively commented Aug 9, 2018

We've really only just started working on SIMD128 support in the wasm LLVM backend despite how long the proposal has been around, so I wouldn't expect anything to work for a while under the simd128 feature flag. Once we have it in a more workable state, having feedback from the Rust side will be very helpful.

Cargo.toml Outdated
@@ -3,6 +3,9 @@ members = [
"crates/stdsimd-verify",
"crates/stdsimd",
]
exclude = [
"crates/wasm-test"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can probably be removed now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alexcrichton
Copy link
Member

It's impressive that LLVM's backend is getting these tests passing as is already! (even without simd128 enabled)

@gnzlbg when I get a chance I'll start working on #[assert_instr] support for wasm, and then we should be able to properly see why this can't land just yet :)

@tlively
Copy link

tlively commented Aug 10, 2018

It's impressive that LLVM's backend is getting these tests passing as is already! (even without simd128 enabled)

@alexcrichton I would expect that SIMD intrinsics and types are currently being scalarized by LLVM, so performance may be worse than expected, but it's nice to hear that at least it compiles!

@alexcrichton
Copy link
Member

Ok I was itching a bit to get this done, so I've now gotten it done!

I've pushed up support for the #[assert_instr] macro on wasm. It's a bit wonky right now because the released version on crates.io doesn't work, but the bug has been fixed in git. Right now no #[assert_instr] comments are uncommented, so tests should still pass.

To see failures locally, you'll need a patch like this (ish), to generate: https://gist.github.com/alexcrichton/4809bfb8a818621684307ad368b71da4#file-out-log-L324-L621

Progress!

Although rust-lang/rust#53179 just landed if you use -C target-features=+simd128 to compile the whole crate today (which already works thanks to our own bugs) you still get tons of codegen errors though. The good news is that we're confirming that tests fail if the feature isn't enabled, and if it's enabled we're hitting expected LLVM bugs!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 10, 2018

@alexcrichton I've added a crate to crates/ with instructions on how to run the assert_instr tests. I couldn't manage to get it to compile properly with -C target-feature=+simd128. But that way hopefully the information in your patch about how to try won't get lost.

@alexcrichton
Copy link
Member

I've pushed up a commit which uses #[assert_instr] for the memory related intrinsics and adds a few hacks for working today without precompiled binaries from wasm-bindgen. Still not landable, but hopefully a good proof-of-concept showing that #[assert_instr] works!

@alexcrichton
Copy link
Member

If some intrinsics work with -C target-feature=+simd128 then we may want to just comment out everything that doesn't work as well. That way we can incrementally add support as LLVM gains support for the various instructions and operations.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 14, 2018

@alexcrichton We should put the simd128 module behind a #[cfg(wasm32_simd128)] --cfg private flag for the time being, and test it in CI (that is, test the "scalarized" versions), so that we can merge this.

Otherwise we are going to keep this PR alive on the side. Once simd128 starts working for any or more intrinsics, we can re-consider making only those for which this work available when #[cfg(target_feature = "simd128"))] while the rest could remain hidden behind the private feature flag.

@alexcrichton
Copy link
Member

@gnzlbg sounds good to me! If you want to add that in and configure CI to test it I'd be cool merging

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 14, 2018

will do

@alexcrichton
Copy link
Member

Looks great! I think the wasm-assert-instr-tests crate is no longer needed now, right?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 15, 2018

I just removed it, we'll see.

@alexcrichton alexcrichton merged commit 9467185 into rust-lang:master Aug 15, 2018
@alexcrichton
Copy link
Member

👍

@gnzlbg gnzlbg deleted the wasm_simd128 branch August 15, 2018 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants