Make mask types opaque#218
Conversation
daee6e0 to
8dead9c
Compare
|
The test failure from a completely unrelated part is spooky. I checked it for UB under miri with various CPU feature combinations, but it passes there, as well as on real hardware (locally and in CI) and on Intel's emulator on CI. I'm pinning it on a Rosetta bug, since real hardwware and 2 other emulators all pass. When running x86 code on the macos-latest image, it runs through Rosetta. We need to switch over to |
|
Porting vello was trivial: https://github.com/linebender/vello/compare/main...Shnatsel:opaque-masks?expand=1 So this is ready for review. |
LaurenzV
left a comment
There was a problem hiding this comment.
Haven't finished looking through it, just some first comments.
| fn as_array_mask8x16(self, a: mask8x16<Self>) -> [i8; 16usize] { | ||
| unsafe { core::mem::transmute::<__m128i, [i8; 16usize]>(a.val.0) } | ||
| } |
There was a problem hiding this comment.
Why can we still keep this method? Wouldn't this also expect that the inner representation is backed by actual integers?
There was a problem hiding this comment.
And similarly, how would load_array_mask8x16 be implemented for AVX512 if it takes the array as an integer array?
There was a problem hiding this comment.
We can't keep Deref into an integer slice/array, but explicit conversion into an owned instance is perfectly fine and still useful. So conversions to/from the integer array representation are left intact in the API on purpose. They're the cheapest way to create a mask right now, and are decently cheap even on AVX-512 (a single instruction).
| @@ -392,36 +396,12 @@ pub trait Simd: | |||
| fn widen_u8x16(self, a: u8x16<Self>) -> u16x16<Self>; | |||
| #[doc = "Reinterpret the bits of this vector as a vector of `u32` elements.\n\nThe total bit width is preserved; the number of elements changes accordingly."] | |||
| fn reinterpret_u32_u8x16(self, a: u8x16<Self>) -> u32x4<Self>; | |||
| #[doc = "Create a SIMD vector with all elements set to the given value."] | |||
| #[doc = "Create a SIMD mask with all lanes set from the given signed integer mask value."] | |||
| fn splat_mask8x16(self, val: i8) -> mask8x16<Self>; | |||
There was a problem hiding this comment.
How would splat_mask be implemented for AVX-512 since the representation of a single entry is always just a single bit?
I guess more fundamentally, the question is how we fundamentally want to allow construction of masks while preventing hidden footguns due to the exact internals of the SIMD level. For example, I imagine a from_bitmask method would be fast for AVX-512 since its the native representation but slow on NEON since you need to expand the bitmask manually, while a from_array, as it is now, is the best for NEON but would require manually encoding the mask into a single bitmask. I'm not sure what the best thing to do is here, but probably we can take inspiration from portable SIMD?
There was a problem hiding this comment.
It would check if the value is 0, and treat any other value as 1.
I've actually started this work by taking inspiration from std::simd, and I have a local branch that is a much more complete mirror of std::simd APIs. This PR is a stripped down, MVP version of it.
std::simd only implements construction of masks from bits represented as a u64 and from arrays of booleans; not from [i32; 4]. Their trade-off is greater portability but worse performance. I implemented that at first, then rolled back the removal of integer-based APIs because the assembly for bool arrays looked gnarly.
Now that you point it out, I agree it's probably better to mirror std::simd and make splat operate on booleans. I guess writing == 0 is not too much to ask of the users. I'll make that change to better align with std::simd in this case.
There was a problem hiding this comment.
I've experimented with this and looked at the assembly; it does add a neg instruction in the hot path for non-constant inputs on x86, so it's technically less efficient, but that's probably worth it for the nicer API.
…g std::simd API. Adds a `neg` on the hot path on x86 for non-constant inputs, but seems cheap enough and worth it for the nicer API.
… actual integer values involved, now that we have boolean APIs and the conversions are relevant
…re not clashing with std::simd API of the same name
|
I'd like to add to_bitmask()/from_bitmask() as well as |
This is necessary for eventual AVX-512 support. Part of #179.
This does not add any AVX-512 stuff yet, just lays the groundwork by abstracting away the internal representation.
Contrary to what the description of #196 said, we don't actually need i64 vectors in the public API so long as we're not providing direct conversions between mask types and integer vector types, which aren't available even on main.
Summary of changes:
SimdMask<S>trait independent fromSimdBase<S>.DerefBytesSimdSplit/SimdCombineslide/slide_within_blocks& | ^ !but notmask & -1.This was extricated from a larger changeset I had locally that also added some std::simd API compatibility functions, but that was getting too complicated to review. I'm happy to add more APIs if there's desire and review capacity for them.
A port of vello to this API can be found here.