You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vclmul.vv vd, vs2, vs1, vm // Vector-vector lo
vclmul.vs vd, vs2, vs1, vm // Vector-scalar lo
vclmulh.vv vd, vs2, vs1, vm // Vector-vector hi
vclmulh.vs vd, vs2, vs1, vm // Vector-scalar hi
Where (for the Crypto extension) SEW=128. Other extensions might want to implement other SEW values, but for Crypto, we only need SEW=128?
The reduction operation. Again, based on Markku's analysis for the scalar stuff, this can be done efficiently (in terms of instruction counts at-least) using vclmul[h].vs instructions and vxor, so a dedicated instruction might not be needed?
I'm not sure how the calculus about reduction by multiplication or reduction by shift/xor applies in the vector world so much? Intuition says that the same calculus for the scalar world will apply to the vectors, in which case, it's up to implementers to optimise for themselves. The background being that the fastest method depends on the speed of the carry-less multiply operation relative to shifts and xors.
The text was updated successfully, but these errors were encountered:
Sure, I've split out the bit-reversal stuff into #18. I left the reduction discussion in because I'm not sure how to dis-entangle it. I hope that's alright.
A discussion / proposal for a carry-less multiply instruction to be added to the vector cryptography extensions.
This is motivated by the GHASH part of GCM.
The instructions can work similarly to the existing vector integer multiply instructions, something like:
Where (for the Crypto extension)
SEW=128
. Other extensions might want to implement other SEW values, but for Crypto, we only needSEW=128
?The reduction operation. Again, based on Markku's analysis for the scalar stuff, this can be done efficiently (in terms of instruction counts at-least) using
vclmul[h].vs
instructions andvxor
, so a dedicated instruction might not be needed?I'm not sure how the calculus about reduction by multiplication or reduction by shift/xor applies in the vector world so much? Intuition says that the same calculus for the scalar world will apply to the vectors, in which case, it's up to implementers to optimise for themselves. The background being that the fastest method depends on the speed of the carry-less multiply operation relative to shifts and xors.
The text was updated successfully, but these errors were encountered: