-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727
Comments
That was a good catch. It is why I cringe at people writing all the inline stuff, just to gain a percent or 2, thus HIDING the things that can easily make better gains (such as improved algorithm or other simplification tricks). I know we have done many items recently that have unified code (the pbkdf2_*.h stuff is great examples). |
Note to self: |
SHA-2's Ch() implementations, using better optimized ones. OpenCL and CUDA formats. See #1727.
SHA-2's Ch() implementations, using better optimized ones. Intrinsics formats. See #1727.
All done. Re-assigning to @zzlei, please test/benchmark on NEON and Altivec if/when you can. I will test for regressions in OpenCL and Intel CPU. |
I just tried it on Power. The only access I have to Power is through GCC farm, and it fluctuates so bad (too many users perhaps). Here's just 3 consecutive runs:
I don't think I can get any useful benchmark results on this machine. |
At least we know it's working 😄 What if you run a lot fewer threads, like 4 or 8? |
Yes, that works! I'll post the result on john-dev. |
http://www.openwall.com/lists/john-dev/2015/09/02/5
Ch()
for CUDA and (non-bitselect) OpenCL to 3-op.The text was updated successfully, but these errors were encountered: