Skip to content
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.

Rotate and shift vector mask bits #919

Closed
camel-cdr opened this issue Sep 6, 2023 · 2 comments
Closed

Rotate and shift vector mask bits #919

camel-cdr opened this issue Sep 6, 2023 · 2 comments

Comments

@camel-cdr
Copy link

I ran into the problem of needing to shift mask bits up or down by one for quite a few vectorized algorithms I've been working on.

My current approach is to go from every bit to a full byte for, use vslide*up/down and go back to a mask register, but this is very waist full.

I was wondering if it could be possible to add bit shift and bit rotate instructions for the mask register, maybe v0 only? That is, it works across element boundaries, but only for mask registers.

This would help with the problem I described above, but also allow reducing register pressure, as you could e.g. store 8 mask for LMUL=1 registers in a single mask register, and just use a potential vmrotl by 8*idx to rotate the bits to the mask you want to use.

This can already be done by using vslideup + vslidedown + vmerge, but that's a lot more expensive (a possible future rotate elements instruction could also help with this).

I just want to put this idea out there, and see if other people could benefit from such an addition to future versions of the spec. I also might have missed a better way than described above to already do this in the current spec.

@aswaterman
Copy link
Member

aswaterman commented Sep 6, 2023

I've heard other potential use cases for mask shifts. (Mask rotates are more expensive if the rotation is taken with respect to vl rather than VLMAX, since the shifted-off bits need to land in an arbitrary bit position.)

If you can constrain VL <= 64 (which in portable code might require an extra min instruction to constrain AVL at the head of the strip-mine loop), then you can type-pun the mask as a single 64b element and right-shift it:

vsetivli x0, 1, e64, m1, tu, ma
vsrl.vi v0, v0, 1

The extra setvls aren't ideal, but on most implementations they're cheap. Constraining VL to 64 also isn't ideal, but for apps processors with relatively small VLEN, it won't hurt.

@nick-knight
Copy link
Contributor

nick-knight commented Sep 8, 2023

I think we agree that anything is possible using left/right shifts, up/down slides, and bitwise logic.

Additionally, special cases, like shifting by one bit, could leverage add-with-carry or subtract-with-borrow (to avoid slides).

I think it would be helpful to provide concrete use-cases, to evaluate these existing possibilities.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants