You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new-ish Apple computers don't have x86 instructions. Make sure to write a fast implementation for those.
Edit: A few notes:
Instead of the vpshufb instruction, use tbl. For vec_generic, tbl can take two LUT directly in one instruction, making the code significantly simpler than the x86 one, where the entire function could be:
(1<< (x >>>5)) &tbl(x &0b00011111, lut1, lut2)
There is no equivalent to pmovmskb. Perhaps the fastest is just to cast to uint128 and do a direct count of leading zeros (for trailing zeros, do rbit + clz - probably LLVM already does this for you, check)
The text was updated successfully, but these errors were encountered:
The new-ish Apple computers don't have x86 instructions. Make sure to write a fast implementation for those.
Edit: A few notes:
pmovmskb
. Perhaps the fastest is just to cast to uint128 and do a direct count of leading zeros (for trailing zeros, do rbit + clz - probably LLVM already does this for you, check)The text was updated successfully, but these errors were encountered: