Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727

magnumripper · 2015-09-02T18:49:50Z

http://www.openwall.com/lists/john-dev/2015/09/02/5

Change all OpenCL definitions using bitselect to 2-op version.
Change all OpenCL non bitselect fallbacks, and CUDA versions, to 4-op version.
Change all Ch() for CUDA and (non-bitselect) OpenCL to 3-op.
Same for SIMD intrinsics.
Have a look at the scalar plain C stuff while at it.

The text was updated successfully, but these errors were encountered:

jfoug · 2015-09-02T19:01:02Z

That was a good catch. It is why I cringe at people writing all the inline stuff, just to gain a percent or 2, thus HIDING the things that can easily make better gains (such as improved algorithm or other simplification tricks). I know we have done many items recently that have unified code (the pbkdf2_*.h stuff is great examples).

magnumripper · 2015-09-02T21:45:59Z

Note to self: bitselect(x, y, z) in XOP is _mm_cmov_si128(y, x, z) (mind the order). z is inverted.

SHA-2's Ch() implementations, using better optimized ones. OpenCL and CUDA formats. See #1727.

SHA-2's Ch() implementations, using better optimized ones. Intrinsics formats. See #1727.

magnumripper · 2015-09-02T23:06:47Z

All done.

Re-assigning to @zzlei, please test/benchmark on NEON and Altivec if/when you can. I will test for regressions in OpenCL and Intel CPU.

magnumripper · 2015-09-03T17:47:23Z

Added e8703bb and 957a538 too after realizing MD4/5 F() is also same as Ch()

magnumripper · 2015-09-05T17:54:48Z

Oh, and MD4 G() is same as SHA-2 Maj(). 7071b4a and Solar found a new way of doing MD5 I() using one less ops 382a961.

lei-april · 2015-09-07T15:17:56Z

I just tried it on Power. The only access I have to Power is through GCC farm, and it fluctuates so bad (too many users perhaps).

Here's just 3 consecutive runs:

[zlei@gcc2-power8 src]$ ../run/john --test --format=pbkdf2-hmac-sha1
Will run 152 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 128/128 AltiVec 4x]... (152xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    29257 c/s real, 2307 c/s virtual

[zlei@gcc2-power8 src]$ ../run/john --test --format=pbkdf2-hmac-sha1
Will run 152 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 128/128 AltiVec 4x]... (152xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    133032 c/s real, 1706 c/s virtual

[zlei@gcc2-power8 src]$ ../run/john --test --format=pbkdf2-hmac-sha1
Will run 152 OpenMP threads
Benchmarking: PBKDF2-HMAC-SHA1 [PBKDF2-SHA1 128/128 AltiVec 4x]... (152xOMP) DONE
Speed for cost 1 (iteration count) of 1000
Raw:    93388 c/s real, 1609 c/s virtual

I don't think I can get any useful benchmark results on this machine.

magnumripper · 2015-09-07T21:36:48Z

At least we know it's working 😄

What if you run a lot fewer threads, like 4 or 8?

lei-april · 2015-09-08T01:40:42Z

What if you run a lot fewer threads, like 4 or 8?

Yes, that works! I'll post the result on john-dev.

magnumripper added the enhancement label Sep 2, 2015

magnumripper self-assigned this Sep 2, 2015

magnumripper added this to the 1.8.0-jumbo-2 milestone Sep 2, 2015

magnumripper added a commit that referenced this issue Sep 2, 2015

Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() as well as

c5f50a9

SHA-2's Ch() implementations, using better optimized ones. OpenCL and CUDA formats. See #1727.

magnumripper added a commit that referenced this issue Sep 2, 2015

Unify SHA-1's H() a.k.a F3(), a.k.a SHA-2's Maj() as well as

5916a57

SHA-2's Ch() implementations, using better optimized ones. Intrinsics formats. See #1727.

magnumripper assigned lei-april and unassigned magnumripper Sep 2, 2015

magnumripper closed this as completed Sep 14, 2015

magnumripper mentioned this issue May 17, 2021

SHA-2 Maj, SHA-1 H, MD4 G trick #4727

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727

Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727

magnumripper commented Sep 2, 2015

jfoug commented Sep 2, 2015

magnumripper commented Sep 2, 2015

magnumripper commented Sep 2, 2015

magnumripper commented Sep 3, 2015

magnumripper commented Sep 5, 2015

lei-april commented Sep 7, 2015

magnumripper commented Sep 7, 2015

lei-april commented Sep 8, 2015

Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727

Unify SHA-1's H() a.k.a F3() a.k.a SHA-2's Maj() implementations #1727

Comments

magnumripper commented Sep 2, 2015

jfoug commented Sep 2, 2015

magnumripper commented Sep 2, 2015

magnumripper commented Sep 2, 2015

magnumripper commented Sep 3, 2015

magnumripper commented Sep 5, 2015

lei-april commented Sep 7, 2015

magnumripper commented Sep 7, 2015

lei-april commented Sep 8, 2015