New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse 4 rounds more in SHA-256 and SHA-512 OpenCL kernels #2227
Comments
When magnum did this I remember I had problems altering the kernel in order to achieve the desired fit on NVIDIA. In fact, I remember I poked him and asked about 'numbers'. I can try it again sooner or later. |
You should have it depend on some macro with #ifdef's. Then you can disable it for certain drivers, vendors or devices if needed. It also makes it easier to test/verify the boost on any particular gear. |
But, when I revert steps I have less bytes to use on filtering, and the data transfers increase. In the same scenario:
Since transfers GPU->CPU are slow, the result is a 300000Kp/s penality. That said, based on how many hashes have been loaded to crack, I can:
Small set of loaded hashes -> reversed version The speed gain while reversing is > 7%. |
Bug closed because it:
|
Maybe rather than work on this issue directly, we can add a source code comment briefly explaining that rounds reversing is potentially possible and why it's not done - similar to Claudio's comments here from 2017. Otherwise that useful info will rot in the GitHub comment and won't be recalled when needed. Claudio, would you do that? I'd appreciate it. Thanks! |
Re-opening for my suggestion above (add a source code comment). |
Just off Twitter:
We also have these in CUDA - perhaps we don't care to optimize the CUDA code anymore (and it also lacks mask), but maybe add comments about possible yet unimplemented optimizations in there, along with a suggestion to look at and use the OpenCL kernels instead.
The text was updated successfully, but these errors were encountered: