Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

Closed
solardiz opened this issue Aug 20, 2016 · 6 comments
Closed

Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227

solardiz opened this issue Aug 20, 2016 · 6 comments
Assignees

Comments

@solardiz
Copy link
Member

Just off Twitter:

<hashcat> New SHA2 meet-in-the-middle optimization quick writeup: https://hashcat.net/forum/thread-5778.html
<@solardiz> @hashcat So it's 57 steps for SHA-256, 73 for SHA-512, 61 for SHA-224, 77 for SHA-384. That's what we have in simd-intrinsics.c.
<@hashcat> @solardiz I made sure it's not in JtR, but i was looking in kernels/sha256_kernel.cl -- not there
<@solardiz> @hashcat Yeah. We also lack on-GPU mask mode for raw SHA-512, but we do have it for raw SHA-256, so makes sense to add this opt.
<@hashcat> @solardiz Good, it then explains why I couldn't see it
<@solardiz> @hashcat Thanks for reminding us to add it for OpenCL. Looks like Aleksey Cherepanov was first to find it http://www.openwall.com/lists/john-dev/2015/09/23/2

We also have these in CUDA - perhaps we don't care to optimize the CUDA code anymore (and it also lacks mask), but maybe add comments about possible yet unimplemented optimizations in there, along with a suggestion to look at and use the OpenCL kernels instead.

@claudioandre-br
Copy link
Member

When magnum did this I remember I had problems altering the kernel in order to achieve the desired fit on NVIDIA. In fact, I remember I poked him and asked about 'numbers'.
AMD gave me good results.


I can try it again sooner or later.

@magnumripper
Copy link
Member

You should have it depend on some macro with #ifdef's. Then you can disable it for certain drivers, vendors or devices if needed. It also makes it easier to test/verify the boost on any particular gear.

@claudioandre-br
Copy link
Member

claudioandre-br commented Mar 24, 2017

  • In order to avoid data transfers from GPU to CPU, I'm using a Bloom filter.
  • Scenario:
    • Loaded 4000002 password hashes with no different salts (Raw-SHA256-opencl [SHA256 OpenCL])
    • Using mask on super, each GPU crypt_all() will hash (and discard) more than 110 millions of keys.
    • Of course, GPU can produce false positives.
    • In this case, Some checks are going to be done on CPU: 1515: 0,0014%
    • So, as it is now, we copy 1 hash (3 bytes) for each 70,000 discarded on GPU.

But, when I revert steps I have less bytes to use on filtering, and the data transfers increase. In the same scenario:

  • Some checks are going to be done on CPU: 208320: 0,0939%

Since transfers GPU->CPU are slow, the result is a 300000Kp/s penality.


That said, based on how many hashes have been loaded to crack, I can:

  • on reset(), set fmt->binary to its regular or reversed version (it is NOT possible);
  • on crypt_all(), call the "right" kernel.

Small set of loaded hashes -> reversed version
Big set of loaded hashes -> full version.


The speed gain while reversing is > 7%.

@claudioandre-br
Copy link
Member

claudioandre-br commented Dec 15, 2018

Bug closed because it:

  • Low Priority - They have a tendency to become high priority if enough of the users complain.

  • WontFix - Covers all the reasons we chose to close the bug without taking action

    • keep it as is reduces false positives.

@solardiz
Copy link
Member Author

Maybe rather than work on this issue directly, we can add a source code comment briefly explaining that rounds reversing is potentially possible and why it's not done - similar to Claudio's comments here from 2017. Otherwise that useful info will rot in the GitHub comment and won't be recalled when needed. Claudio, would you do that? I'd appreciate it. Thanks!

@solardiz
Copy link
Member Author

Re-opening for my suggestion above (add a source code comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants