Skip to content

Conversation

@valadaptive
Copy link
Contributor

Resolves #139.

These are implemented using whatever intrinsics seem to be fastest.

On x86, I use _mm_movemask, which should be fastest for floating-point operations at least. For AVX2, LLVM can optimize this to vtestps/vtestpd. This checks the high bits for 8-bit, 32-bit, and 64-bit types. For 16-bit types, there's no _mm_movemask_epi16, so there will be strange behavior if each 16-bit mask value is not all zeroes or all ones.

On AArch64, there are varying opinions on the fastest way to implement this operation. I went with the "vmaxvq/vminvq over 32-bit chunks" approach since it's nicely symmetric.

On WebAssembly, I use the v128_any_true and i[N]x[M]_all_true intrinsics, assuming that they'll be easiest for runtimes to optimize, especially if they directly follow the comparison operation that produced the mask.

The fallback implementation checks if any bit in the mask lane is nonzero.


There's no way to attach documentation to these methods now (#129), but once that's implemented, we should document their behavior as follows:

Returns true if [any/all] elements in this mask are [true (all ones)/false (all zeroes)].

Behavior on mask elements that are not all zeroes or all ones is unspecified. It may vary depending on architecture, feature level, the mask elements' width, the mask vector's width, or library version.
The behavior is also not guaranteed to be logically consistent if mask elements are not all zeroes or all ones. any_true may not return the same result as !all_false, and all_true may not return the same result as !any_false.
The select operation also has unspecified behavior for mask elements that are not all zeroes or all ones. That behavior may not match the behavior of this operation.

Copy link
Member

@DJMcNab DJMcNab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly as to #134, I'd be fairly happy approving this based on tests and trust, but it might be worth seeing if someone will actually review on substance

Copy link
Collaborator

@LaurenzV LaurenzV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry about the delay, thanks a lot, LGTM!

@valadaptive valadaptive added this pull request to the merge queue Dec 5, 2025
Merged via the queue into linebender:main with commit 7e1355d Dec 5, 2025
18 checks passed
@valadaptive valadaptive deleted the mask-reduce branch December 5, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Any/all true/false" operations for mask types

3 participants