-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.9.0-jumbo-1+bleeding-84a4aeb20 - wpapsk-opencl broken after driver/cuda update #5205
Comments
This time it might be a driver bug. I can't find any problem like the ones in #4667 - we aren't explicitly saying |
The problem seems related to using arrays in function arguments. I've seen problems [as in driver bugs] with that before, in OpenCL. We could do things like this as a workaround: - void _sha256_process(SHA256_CTX *ctx, const uchar data[64]) {
+ void _sha256_process(SHA256_CTX *ctx, const uchar *data) { But this one is trickier: sha256_vector(uint num_elem, const uchar *addr[], const uint *len, uchar *mac) |
That looks like a plan. |
Latest production driver is 515.76. The version you run (520.56.06) is the current latest from "New Feature Branch", which is kinda unstable I guess. Perhaps we just wait for nvidia to fix the issue. I'm not sure how to report this to nvidia but a bug like this should surface in many places. EDIT: I filed a case with customer support. |
Correct. Update was done by Arch Linux in combination with kernel 6.0.0 and CUDA 11.8 |
As I recall, CUDA typically includes a certain driver version bundled with it. What driver version is that for CUDA 11.8? |
You're right. the compatibility is described in Table 3. CUDA Application Compatibility Support Matrix, here: |
So what nvidia version do you see from |
The same version as expected after pacman -Q output:
It (520.56.06) is not as described in the table (520.61.05), but it is working fine some of my own tools. |
Strange. It would be nice to know if the bug is still present in 520.61.05. |
I think 520.61.05. is the windows driver. |
Oh, that makes sense. I'm installing CUDA 11.8 now, so I can reproduce this problem. |
If CUDA version doesn't match to the driver you'll get a toolchain warning like this this: |
No - when I installed CUDA 11.8 on Ubuntu 20.04, I actually got the 520.61.05 driver. However, that driver show the same errors from WPAPSK-opencl format. I'm nearly 100% sure this is a driver bug but I'll see if I can work around it. |
OK, so obviously other formats are affected - not sure how many. Here's a workaround for you though: Edit john.conf and find
IIRC, a problem is that not all OpenCL compilers will accept that option - they will bug out 😢. So we might not be able to simply throw that in there in a PR. I will test that though. |
Strange:
https://forums.developer.nvidia.com/c/gpu-graphics/announcements-and-news/146 |
I think I recall this has happened before - last time I updated CUDA on this machine I also got a driver version "sligthly newer than latest". It doesn't matter for this issue though: The problem is there with any of them. |
I agree, but it is interesting to see that the driver is packet by RHel and Ubuntu, but it is not official present on nvidia.com. |
The CUDA .deb packages for Ubuntu are actually sourced from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ so they are not "packaged by Ubuntu" at all. |
Ok, thanks for the info. |
Workaround
Thanks for your effort. |
Shall we close this issue report or wit for another solution? |
Let's keep it open please. |
This fixes most errors:
That's 100% things like this: diff --git a/run/opencl/opencl_des.h b/run/opencl/opencl_des.h
index acbcac5ad..698654c25 100644
--- a/run/opencl/opencl_des.h
+++ b/run/opencl/opencl_des.h
@@ -296,7 +296,7 @@ __constant uchar odd_parity_table[128] = { 1, 2, 4, 7, 8,
227, 229, 230, 233, 234, 236, 239, 241, 242, 244, 247, 248, 251, 253,
254 };
-inline void des_key_set_parity(uchar key[DES_KEY_SIZE])
+inline void des_key_set_parity(uchar *key)
{
int i;
However, now I'm hitting more complex issues:
Also, I'm not glad to push the current fixes - they make the code slightly worse. This is a driver problem. |
Again I fully agree. Driver problems are ugly. |
@magnumripper Rather than revise the kernels, would it possibly be cleaner to conditionally add Also, how does/will hashcat approach the same problem, or do their kernels build fine with that driver as-is? |
That just hit me as well. That patch and PR wasn't correct - it would only kick in at max. verbosity. My current idea is to implement two more configuration keys Meanwhile, using that compile option "All 91 formats passed self-tests!". |
Array arguments in kernel functions would trigger bugs unless explicitly reverting to OpenCL 1.2. Closes openwall#5205
Apparently (try googling it) some older drivers don't recognize the "-cl-std=CL1.2" option, including nvidias, even though it's in the standard. Or perhaps such drivers were only 1.1 compliant (only supporting We should probably go for the device or platform version string (as listed with On an other note the CUDA 11.8 device version says "OpenCL 3.0 CUDA" - I never even heard of OpenCL 3.0. The platform version is "OpenCL 3.0 CUDA 11.8.88". I guess we should look at the device version for this logic but it wouldn't matter for my machine.
HPC village:
My Linux gear:
Macbook:
|
Here is another one (also running Arch Linux):
|
Array arguments in kernel functions would trigger bugs unless explicitly reverting to OpenCL 1.2. We're now adding -cl-std=CL1.2 when applicable. Closes openwall#5205
Array arguments in kernel functions would trigger bugs unless explicitly reverting to OpenCL 1.2. We're now adding -cl-std=CL1.2 when applicable. Closes #5205
Everything is working fine. Again thanks for your effort. Unfortunately all Linux distribution that recently updated to last NVIDIA are hit by this issue and have to wait for upcoming JtR release. |
I think I start to understand this "driver bug" now (if it's even a bug?). Here's the deal: OpenCL doesn't really allow arrays as function parameters unless they are in private memory (see #4946 for an issue with So this driver (when in OpenCL 2.0 mode) regard array parameters as implicitly specified with
This ends up similar to #4667 where we explicitly used |
I got the same problem with So I read up on the generic address space (looking at the OpenCL 2.0 spec, not the later ones). First of all, constant memory is disjoint from the generic address space - I wasn't aware of that (or had forgot about it). So you still can't write a single memcpy() function from/to any memory, you'd need one for "generic to generic" and one for "from constant to generic". Still, it's just two functions instead of twelve for handling any combo... Also:
So a pointer to generic CAN NOT be implicitly converted to a pointer to global, local or private. Perhaps that's our problem - but since we're pretty far from being able to require OpenCL 2.0, we should just as well simply do what we do now - enforce OpenCL 1.2. The fact that the option sometimes fail is very annoying though. While at this, I tried making an experimental fork of jumbo that requires OpenCL 2.0. I dropped all redundant memcpy/memset/memchr/memmem functions in favor of ones that use generic memory as well as some other tricks for handling different memory spaces. This worked just fine (all formats passed self-test) on nvidia provided I never used the Oh BTW I found this interesting bit of information: https://stackoverflow.com/a/22757591 |
Excellent explanation and good work. |
@magnumripper , a little bit out of scope. I've never seen people posting JtR WPA hash lines, but I often have seen PMKID-EAPOL hash lines: |
Yeah that's #4183. Unfortunately it doesn't line up well with JtR's core so it'll take some effort to implement. But I am planning to do it, some day... |
I am pleased to hear this, because it removes the entire internal (ancient) hccap structure. BTW: |
@magnumripper What about adding casts from whatever (generic or private depending on OpenCL version, which we won't need to care about?) to private - wouldn't that make our source code compatible with both OpenCL 1.2 and 2.0? |
I'm not sure I understand the concept of casting between memory types at all: If you have a generic pointer pointing to data in global memory, it obviously can't be cast to private. What would that even mean? What I do understand is this: If you write eg. a memcpy function using generic pointer parameters (actually by using unnamed memory because apparently you can't explicitly say Beyond that, I don't understand this at all. |
@magnumripper I don't really know, but my guess is that by casting from |
Apparently the concept of a generic address space comes from Embedded C, whatever that is. Perhaps I should google that and see if there are any more mature descriptions and examples. |
@magnumripper Reading that Intel article you referenced, I get the impression that we'd also avoid the problem by explicitly specifying |
I think that would be an awful lot of places unless we add it on a case-by-case basis as problems are seen. Since we seem to be good right now, let's just leave everything alone until new problems emerge, if ever. |
I agree. |
OpenCL broken (after driver/cuda update) on some hash modes:
word.list content:
hashcat!
pmkid.john content:
2582a8281bf9d4308d6f5731d0e61c61*4604ba734d4e*89acf0e761f4*ed487162465a774bfba60eb603a39f3a
both taken from https://hashcat.net/wiki/doku.php?id=example_hashes
Additional information about john:
$ john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-84a4aeb20 2022-10-17 14:03:56 +0200
Build: linux-gnu 64-bit x86_64 AVX AC MPI + OMP OPENCL
SIMD: AVX, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
System-wide exec: /usr/bin
System-wide home: /usr/share/john
Private home: ~/.john
CPU tests: AVX
CPU fallback binary: john-non-avx
$JOHN is /usr/share/john/
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 12.2.0
GNU libc version: 2.36 (loaded: 2.36)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 01010111f
OpenSSL 1.1.1q 5 Jul 2022
GMP library version: 6.2.1
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: clock_gettime(), latency 29 ns
Total physical host memory: 15944 MiB
Available physical host memory: 12610 MiB
Terminal locale string: de_DE.utf8
Parsed terminal locale: UTF-8
Additional information about distribution and driver :
$ uname -a
Linux tux1 6.0.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 15 Oct 2022 14:00:49 +0000 x86_64 GNU/Linux
$ pacman -Q | grep nvidia
nvidia 520.56.06-4
nvidia-settings 520.56.06-1
nvidia-utils 520.56.06-2
opencl-nvidia 520.56.06-2
$ pacman -Q | grep cuda
cuda 11.8.0-1
Similar to this (fixed)
#4667
Looks like NVIDA changed some API calls from time to time.
The text was updated successfully, but these errors were encountered: