Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

solardiz · 2016-08-20T20:36:21Z

Just off Twitter:

<hashcat> New SHA2 meet-in-the-middle optimization quick writeup: https://hashcat.net/forum/thread-5778.html
<@solardiz> @hashcat So it's 57 steps for SHA-256, 73 for SHA-512, 61 for SHA-224, 77 for SHA-384. That's what we have in simd-intrinsics.c.
<@hashcat> @solardiz I made sure it's not in JtR, but i was looking in kernels/sha256_kernel.cl -- not there
<@solardiz> @hashcat Yeah. We also lack on-GPU mask mode for raw SHA-512, but we do have it for raw SHA-256, so makes sense to add this opt.
<@hashcat> @solardiz Good, it then explains why I couldn't see it
<@solardiz> @hashcat Thanks for reminding us to add it for OpenCL. Looks like Aleksey Cherepanov was first to find it http://www.openwall.com/lists/john-dev/2015/09/23/2

We also have these in CUDA - perhaps we don't care to optimize the CUDA code anymore (and it also lacks mask), but maybe add comments about possible yet unimplemented optimizations in there, along with a suggestion to look at and use the OpenCL kernels instead.

The text was updated successfully, but these errors were encountered:

claudioandre-br · 2016-08-22T23:46:08Z

When magnum did this I remember I had problems altering the kernel in order to achieve the desired fit on NVIDIA. In fact, I remember I poked him and asked about 'numbers'.
AMD gave me good results.

I can try it again sooner or later.

magnumripper · 2016-08-22T23:49:55Z

You should have it depend on some macro with #ifdef's. Then you can disable it for certain drivers, vendors or devices if needed. It also makes it easier to test/verify the boost on any particular gear.

claudioandre-br · 2017-03-24T20:09:21Z

In order to avoid data transfers from GPU to CPU, I'm using a Bloom filter.
Scenario:
- Loaded 4000002 password hashes with no different salts (Raw-SHA256-opencl [SHA256 OpenCL])
- Using mask on super, each GPU crypt_all() will hash (and discard) more than 110 millions of keys.
- Of course, GPU can produce false positives.
- In this case, Some checks are going to be done on CPU: 1515: 0,0014%
- So, as it is now, we copy 1 hash (3 bytes) for each 70,000 discarded on GPU.

But, when I revert steps I have less bytes to use on filtering, and the data transfers increase. In the same scenario:

Some checks are going to be done on CPU: 208320: 0,0939%

Since transfers GPU->CPU are slow, the result is a 300000Kp/s penality.

That said, based on how many hashes have been loaded to crack, I can:

on reset(), set fmt->binary to its regular or reversed version (it is NOT possible);
on crypt_all(), call the "right" kernel.

Small set of loaded hashes -> reversed version
Big set of loaded hashes -> full version.

The speed gain while reversing is > 7%.

claudioandre-br · 2018-12-15T17:51:26Z

Bug closed because it:

Low Priority - They have a tendency to become high priority if enough of the users complain.
WontFix - Covers all the reasons we chose to close the bug without taking action
- keep it as is reduces false positives.

solardiz · 2018-12-15T18:08:06Z

Maybe rather than work on this issue directly, we can add a source code comment briefly explaining that rounds reversing is potentially possible and why it's not done - similar to Claudio's comments here from 2017. Otherwise that useful info will rot in the GitHub comment and won't be recalled when needed. Claudio, would you do that? I'd appreciate it. Thanks!

solardiz · 2018-12-15T20:14:16Z

Re-opening for my suggestion above (add a source code comment).

magnumripper assigned claudioandre-br Aug 21, 2016

magnumripper added the enhancement label Aug 21, 2016

claudioandre-br closed this as completed Dec 15, 2018

solardiz reopened this Dec 15, 2018

claudioandre-br mentioned this issue Dec 17, 2018

Comment about rawSHA256/512 rounds reversing #3519

Merged

magnumripper closed this as completed in #3519 Dec 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

solardiz commented Aug 20, 2016

claudioandre-br commented Aug 22, 2016

magnumripper commented Aug 22, 2016

claudioandre-br commented Mar 24, 2017 •

edited

claudioandre-br commented Dec 15, 2018 •

edited

solardiz commented Dec 15, 2018

solardiz commented Dec 15, 2018

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

Comments

solardiz commented Aug 20, 2016

claudioandre-br commented Aug 22, 2016

magnumripper commented Aug 22, 2016

claudioandre-br commented Mar 24, 2017 • edited

claudioandre-br commented Dec 15, 2018 • edited

solardiz commented Dec 15, 2018

solardiz commented Dec 15, 2018

claudioandre-br commented Mar 24, 2017 •

edited

claudioandre-br commented Dec 15, 2018 •

edited