You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
__builtin_bswap is avoided (since together with -mssse3 it leads to SSSE3 instructions that won't work on e.g. AMD Phenom II)
chacha source is copied to be compiled twice, once with and once without -mssse3
newer compilers with more analysis may optimize even more code into ssse3 instructions (of other hashes / ...).
to avoid this potential miscompilation, "all" C code should be compiled twice:
once with machine intrinsics flags that are useful (atm ssse3/aes/pclmul)
once without any machine-dependent intrinsics
then all entries (called from OCaml) should be runtime dispatched on the specific feature flags.
the issue with the above approach is that it is not yet clear which CPU features could be used in which settings and which features are useful (thinking of SSSE4 etc. as well). to avoid a huge matrix, time should be used to research what is useful.
The text was updated successfully, but these errors were encountered:
at the moment (with #96 merged):
__builtin_bswap
is avoided (since together with-mssse3
it leads to SSSE3 instructions that won't work on e.g. AMD Phenom II)newer compilers with more analysis may optimize even more code into ssse3 instructions (of other hashes / ...).
to avoid this potential miscompilation, "all" C code should be compiled twice:
then all entries (called from OCaml) should be runtime dispatched on the specific feature flags.
the issue with the above approach is that it is not yet clear which CPU features could be used in which settings and which features are useful (thinking of SSSE4 etc. as well). to avoid a huge matrix, time should be used to research what is useful.
The text was updated successfully, but these errors were encountered: