Add mask reduction operations #141

valadaptive · 2025-11-23T20:04:31Z

Resolves #139.

These are implemented using whatever intrinsics seem to be fastest.

On x86, I use _mm_movemask, which should be fastest for floating-point operations at least. For AVX2, LLVM can optimize this to vtestps/vtestpd. This checks the high bits for 8-bit, 32-bit, and 64-bit types. For 16-bit types, there's no _mm_movemask_epi16, so there will be strange behavior if each 16-bit mask value is not all zeroes or all ones.

On AArch64, there are varying opinions on the fastest way to implement this operation. I went with the "vmaxvq/vminvq over 32-bit chunks" approach since it's nicely symmetric.

On WebAssembly, I use the v128_any_true and i[N]x[M]_all_true intrinsics, assuming that they'll be easiest for runtimes to optimize, especially if they directly follow the comparison operation that produced the mask.

The fallback implementation checks if any bit in the mask lane is nonzero.

There's no way to attach documentation to these methods now (#129), but once that's implemented, we should document their behavior as follows:

Returns true if [any/all] elements in this mask are [true (all ones)/false (all zeroes)].

Behavior on mask elements that are not all zeroes or all ones is unspecified. It may vary depending on architecture, feature level, the mask elements' width, the mask vector's width, or library version.
The behavior is also not guaranteed to be logically consistent if mask elements are not all zeroes or all ones. any_true may not return the same result as !all_false, and all_true may not return the same result as !any_false.
The select operation also has unspecified behavior for mask elements that are not all zeroes or all ones. That behavior may not match the behavior of this operation.

DJMcNab

Similarly as to #134, I'd be fairly happy approving this based on tests and trust, but it might be worth seeing if someone will actually review on substance

LaurenzV

Very sorry about the delay, thanks a lot, LGTM!

valadaptive requested review from DJMcNab, LaurenzV and Ralith November 23, 2025 20:04

DJMcNab reviewed Nov 24, 2025

View reviewed changes

valadaptive force-pushed the mask-reduce branch from 712b7ce to a803744 Compare November 24, 2025 17:47

valadaptive force-pushed the mask-reduce branch from a803744 to 62adae6 Compare December 3, 2025 05:11

LaurenzV approved these changes Dec 5, 2025

View reviewed changes

valadaptive added 4 commits December 5, 2025 08:36

Add mask reduction ops

9fc3671

Loosen mask reduction semantics

cd43b09

Match select ops' behavior for fallback

10a9a54

Add mask reduction ops to changelog

8c8805e

valadaptive force-pushed the mask-reduce branch from 62adae6 to 8c8805e Compare December 5, 2025 13:38

valadaptive added this pull request to the merge queue Dec 5, 2025

Merged via the queue into linebender:main with commit 7e1355d Dec 5, 2025
18 checks passed

valadaptive deleted the mask-reduce branch December 5, 2025 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add mask reduction operations #141

Add mask reduction operations #141

Uh oh!

valadaptive commented Nov 23, 2025

Uh oh!

DJMcNab left a comment

Uh oh!

LaurenzV left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add mask reduction operations #141

Add mask reduction operations #141

Uh oh!

Conversation

valadaptive commented Nov 23, 2025

Uh oh!

DJMcNab left a comment

Choose a reason for hiding this comment

Uh oh!

LaurenzV left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants