-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
descrypt-opencl: Compile per-salt kernels using OpenMP #1618
Comments
It's doable but I don't expect speed up of 32x or anywhere near it due to disk IO limitations. I'll move the code around anyway because this format doesn't support LWS autotune. Also, if we are to build all kernels, we'd do it in reset before starting actual cracking. So, it should be easy. |
Even if it's not 32x I'm sure it will be several hundred percents faster. Building JtR with -j32 or not is a huge difference and it's nearly the same thing. You could add a macro to opencl_DES_hst_dev_shared.h, for having it optional (in case there are problems on some systems, as Solar imagined).
|
Bump! Now even more kernels seem to be used just for -test, and they don't even seem to be cached to disk? I had to cancel it after it compiled 45 kernels. All formats except yours does cache binaries, except for nvidia devices which has caching in the driver. Regardless, I really think parallelizing builds with OpenMP would be a good idea. |
Hmm I see on super's Tahiti, only 9 kernels were built for -test. And on Titan X, 12 were built. On my macbook using Intel Graphics HD5000, like I said I aborted it after 45... How come this difference? |
Oh, maybe it's because of the "autotune fail"...
|
Are clCreateProgramWithSource and clGetProgramBuildInfo thread safe ? Even if they are, they operate on one variable 'program[sequential_id]', making it unsafe. |
In OpenCL 1.1. all functions are thread-safe except for clSetKernelArg(). But that one is quick so we'd just call it sequentially. |
I think program object needs to be thread safe. i.e 'program[sequential_id][thread_id]'. Otherwise we can't call clBuildProgram in parallel. |
You may be right. We could probably work around it if we want to but it might be more complex than I hoped. |
Apart from this, include_source() function is not thread safe. Also we must ensure kernel_source is not being modified while building kernel and no instance of opencl_read_source() is running in parallel. |
Our own functions are no problem. I'll just make them thread-safe. |
I just made some changes required for parallel build and thread safety. Please review 0dbaf3a. |
I'll try to digest it. I was thinking we could pass a (thread local) buffer (or rather a pointer) to opencl_read_source() and then pass the same pointer to opencl_build() (or vice versa, something along the lines of that). The fact we had hard-coded program[0] was kind of funny :fail: |
program[0] is a local object. Not the global one. We'll be passing a thread local program buffer to opencl_build() and build_from_binary(). |
As far as I can see it's a global, declared as |
Unfortunately, we have same name for global program object and the pointer to program object(as function argument). I plan on using program object supplied by the format for descrypt-opencl. All other formats would be using global program object declared in common-opencl. |
@magnumripper I have implemented parallel build 830f6a0. You may turn it on by setting PARALLEL_BUILD to 1 in opencl_DES_hst_dev_shared.h. However, we should really make path_expand() function thread safe in order to reduce the number of critcal sections and speed up build process. |
Cool, I'll try it out. |
I used Solar's way of pre-compiling all 4096 salts to benchmark this. While the format works fine with other test files, this one make it segfault:
|
|
Hmm @Sayantan2048 is the problem that we have 4096 unique salts but only 1 unique binary? |
No, that doesn't seem to be it. |
BTW note that I did not even enable OpenMP builds yet! I was going to make a baseline first. |
On Wed, Oct 7, 2015 at 3:13 AM, magnum notifications@github.com wrote:
Another edge case issue!! Binaries are all zero!! These tables use zero to |
LOL. OK, the following works better
|
It really shouldn't segfault though, that's nasty. |
This is for HARDCODE_SALT of course.
@Sayantan2048 this should basically be really trivial and would speed up initial building of all salts' kernels a whole lot on multicore systems (eg. super would do it 32x faster). However, you'd probably need to move some code around.
What do you think?
The text was updated successfully, but these errors were encountered: