You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.
I ran into the problem of needing to shift mask bits up or down by one for quite a few vectorized algorithms I've been working on.
My current approach is to go from every bit to a full byte for, use vslide*up/down and go back to a mask register, but this is very waist full.
I was wondering if it could be possible to add bit shift and bit rotate instructions for the mask register, maybe v0 only? That is, it works across element boundaries, but only for mask registers.
This would help with the problem I described above, but also allow reducing register pressure, as you could e.g. store 8 mask for LMUL=1 registers in a single mask register, and just use a potential vmrotl by 8*idx to rotate the bits to the mask you want to use.
This can already be done by using vslideup + vslidedown + vmerge, but that's a lot more expensive (a possible future rotate elements instruction could also help with this).
I just want to put this idea out there, and see if other people could benefit from such an addition to future versions of the spec. I also might have missed a better way than described above to already do this in the current spec.
The text was updated successfully, but these errors were encountered:
I've heard other potential use cases for mask shifts. (Mask rotates are more expensive if the rotation is taken with respect to vl rather than VLMAX, since the shifted-off bits need to land in an arbitrary bit position.)
If you can constrain VL <= 64 (which in portable code might require an extra min instruction to constrain AVL at the head of the strip-mine loop), then you can type-pun the mask as a single 64b element and right-shift it:
vsetivli x0, 1, e64, m1, tu, ma
vsrl.vi v0, v0, 1
The extra setvls aren't ideal, but on most implementations they're cheap. Constraining VL to 64 also isn't ideal, but for apps processors with relatively small VLEN, it won't hurt.
I ran into the problem of needing to shift mask bits up or down by one for quite a few vectorized algorithms I've been working on.
My current approach is to go from every bit to a full byte for, use vslide*up/down and go back to a mask register, but this is very waist full.
I was wondering if it could be possible to add bit shift and bit rotate instructions for the mask register, maybe v0 only? That is, it works across element boundaries, but only for mask registers.
This would help with the problem I described above, but also allow reducing register pressure, as you could e.g. store 8 mask for LMUL=1 registers in a single mask register, and just use a potential
vmrotl by 8*idx
to rotate the bits to the mask you want to use.This can already be done by using vslideup + vslidedown + vmerge, but that's a lot more expensive (a possible future rotate elements instruction could also help with this).
I just want to put this idea out there, and see if other people could benefit from such an addition to future versions of the spec. I also might have missed a better way than described above to already do this in the current spec.
The text was updated successfully, but these errors were encountered: