Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.9.0-jumbo-1+bleeding-84a4aeb20 - wpapsk-opencl broken after driver/cuda update #5205

Closed
ZerBea opened this issue Oct 20, 2022 · 43 comments · Fixed by #5207
Closed

1.9.0-jumbo-1+bleeding-84a4aeb20 - wpapsk-opencl broken after driver/cuda update #5205

ZerBea opened this issue Oct 20, 2022 · 43 comments · Fixed by #5207

Comments

@ZerBea
Copy link

ZerBea commented Oct 20, 2022

OpenCL broken (after driver/cuda update) on some hash modes:

word.list content:
hashcat!

pmkid.john content:
2582a8281bf9d4308d6f5731d0e61c61*4604ba734d4e*89acf0e761f4*ed487162465a774bfba60eb603a39f3a

both taken from https://hashcat.net/wiki/doku.php?id=example_hashes

$ john -w:word.list --format=wpapsk-opencl pmkid.john
Device 1@tux1: NVIDIA GeForce GTX 1080 Ti
Using default input encoding: UTF-8
Loaded 1 password hash (wpapsk-opencl, WPA/WPA2/PMF/PMKID PSK [PBKDF2-SHA1 OpenCL])
5 errors generated.
Options used: -I opencl -cl-mad-enable -DSM_MAJOR=6 -DSM_MINOR=1 -D__GPU__ -DDEVICE_INFO=524306 -D__SIZEOF_HOST_SIZE_T__=8 -DDEV_VER_MAJOR=520 -DDEV_VER_MINOR=56 -D_OPENCL_COMPILER -DHASH_LOOPS=105 -DITERATIONS=4095 -DPLAINTEXT_LENGTH=63 -DV_WIDTH=1 /usr/share/john/opencl/wpapsk_kernel.cl
Build log: In file included from <kernel>:12:
opencl/opencl_sha2_ctx.h:122:24: error: passing '__generic uchar *' (aka '__generic unsigned char *') to parameter of type 'const uchar *' (aka 'const unsigned char *') changes address space of pointer
                _sha256_process(ctx, ctx->buffer);
                                     ^~~~~~~~~~~
opencl/opencl_sha2_ctx.h:40:51: note: passing argument to parameter 'data' here
void _sha256_process(SHA256_CTX *ctx, const uchar data[64]) {
                                                  ^
opencl/opencl_sha2_ctx.h:130:24: error: passing 'const __generic uchar *' (aka 'const __generic unsigned char *') to parameter of type 'const uchar *' (aka 'const unsigned char *') changes address space of pointer
                _sha256_process(ctx, input);
                                     ^~~~~
opencl/opencl_sha2_ctx.h:40:51: note: passing argument to parameter 'data' here
void _sha256_process(SHA256_CTX *ctx, const uchar data[64]) {
                                                  ^
opencl/opencl_sha2_ctx.h:291:24: error: passing '__generic uchar *' (aka '__generic unsigned char *') to parameter of type 'const uchar *' (aka 'const unsigned char *') changes address space of pointer
                _sha512_process(ctx, ctx->buffer);
                                     ^~~~~~~~~~~
opencl/opencl_sha2_ctx.h:209:51: note: passing argument to parameter 'data' here
void _sha512_process(SHA512_CTX *ctx, const uchar data[128]) {
                                                  ^
opencl/opencl_sha2_ctx.h:299:24: error: passing 'const __generic uchar *' (aka 'const __generic unsigned char *') to parameter of type 'const uchar *' (aka 'const unsigned char *') changes address space of pointer
                _sha512_process(ctx, input);
                                     ^~~~~
opencl/opencl_sha2_ctx.h:209:51: note: passing argument to parameter 'data' here
void _sha512_process(SHA512_CTX *ctx, const uchar data[128]) {
                                                  ^
<kernel>:549:15: error: passing '__generic uchar *' (aka '__generic unsigned char *') to parameter of type 'uchar *' (aka 'unsigned char *') changes address space of pointer
        SHA256_Final(mac, &ctx);
                     ^~~
opencl/opencl_sha2_ctx.h:145:25: note: passing argument to parameter 'output' here
void SHA256_Final(uchar output[32], SHA256_CTX *ctx) {
                        ^

Error building kernel /usr/share/john/opencl/wpapsk_kernel.cl. DEVICE_INFO=524306
0: OpenCL CL_BUILD_PROGRAM_FAILURE (-11) error in opencl_common.c:1296 - clBuildProgram

Additional information about john:
$ john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-84a4aeb20 2022-10-17 14:03:56 +0200
Build: linux-gnu 64-bit x86_64 AVX AC MPI + OMP OPENCL
SIMD: AVX, interleaving: MD4:3 MD5:3 SHA1:1 SHA256:1 SHA512:1
System-wide exec: /usr/bin
System-wide home: /usr/share/john
Private home: ~/.john
CPU tests: AVX
CPU fallback binary: john-non-avx
$JOHN is /usr/share/john/
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
gcc version: 12.2.0
GNU libc version: 2.36 (loaded: 2.36)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 01010111f
OpenSSL 1.1.1q 5 Jul 2022
GMP library version: 6.2.1
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: clock_gettime(), latency 29 ns
Total physical host memory: 15944 MiB
Available physical host memory: 12610 MiB
Terminal locale string: de_DE.utf8
Parsed terminal locale: UTF-8

Additional information about distribution and driver :
$ uname -a
Linux tux1 6.0.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 15 Oct 2022 14:00:49 +0000 x86_64 GNU/Linux

$ pacman -Q | grep nvidia
nvidia 520.56.06-4
nvidia-settings 520.56.06-1
nvidia-utils 520.56.06-2
opencl-nvidia 520.56.06-2

$ pacman -Q | grep cuda
cuda 11.8.0-1

Similar to this (fixed)
#4667

Looks like NVIDA changed some API calls from time to time.

@solardiz solardiz added this to the Definitely 2.0.0 milestone Oct 20, 2022
@magnumripper
Copy link
Member

This time it might be a driver bug. I can't find any problem like the ones in #4667 - we aren't explicitly saying __private (or __generic) anywhere in any kernel code.

@magnumripper
Copy link
Member

magnumripper commented Oct 21, 2022

The problem seems related to using arrays in function arguments. I've seen problems [as in driver bugs] with that before, in OpenCL.

We could do things like this as a workaround:

- void _sha256_process(SHA256_CTX *ctx, const uchar data[64]) {
+ void _sha256_process(SHA256_CTX *ctx, const uchar *data) {

But this one is trickier:

sha256_vector(uint num_elem, const uchar *addr[], const uint *len, uchar *mac)

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

That looks like a plan.
But a driver issue could be possible, too. That is the reason why I added a bug report on Arch Linux:
https://bugs.archlinux.org/task/76252

@magnumripper
Copy link
Member

magnumripper commented Oct 21, 2022

Latest production driver is 515.76. The version you run (520.56.06) is the current latest from "New Feature Branch", which is kinda unstable I guess. Perhaps we just wait for nvidia to fix the issue.

I'm not sure how to report this to nvidia but a bug like this should surface in many places. EDIT: I filed a case with customer support.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

Correct.
Unfortunately I don't expect a fix from NVIDIA coming soon.

Update was done by Arch Linux in combination with kernel 6.0.0 and CUDA 11.8

@solardiz
Copy link
Member

As I recall, CUDA typically includes a certain driver version bundled with it. What driver version is that for CUDA 11.8?

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

You're right. the compatibility is described in Table 3. CUDA Application Compatibility Support Matrix, here:
https://docs.nvidia.com/deploy/cuda-compatibility/
CUDA 11.8 = nvidia 520.61.05+

@magnumripper
Copy link
Member

So what nvidia version do you see from john --list=opencl-devices?

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

$ john --list=opencl-devices
Platform #0 name: NVIDIA CUDA, version: OpenCL 3.0 CUDA 11.8.87
    Device #0 (1) name:     NVIDIA GeForce GTX 1080 Ti
    Device vendor:          NVIDIA Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 3.0 CUDA
    Driver version:         520.56.06 [recommended]
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          11175 MiB
    Global Memory Cache:    1344 KiB
    Local Memory:           48 KiB (Local)
    Constant Buffer size:   64 KiB
    Max memory alloc. size: 2793 MiB
    Max clock (MHz):        1620
    Profiling timer res.:   1000 ns
    Max Work Group Size:    1024
    Parallel compute cores: 28
    CUDA INT32 cores:       3584  (28 x 128)
    Speed index:            5806080
    Warp size:              32
    Max. GPRs/work-group:   65536
    Compute capability:     6.1 (sm_61)
    Kernel exec. timeout:   no
    NVML id:                0
    PCI device topology:    26:00.0
    PCI lanes:              16/16
    Fan speed:              29%
    Temperature:            27°C
    Utilization:            3%

The same version as expected after pacman -Q output:

$ pacman -Q | grep nvidia
nvidia 520.56.06-4
nvidia-settings 520.56.06-1
nvidia-utils 520.56.06-2
opencl-nvidia 520.56.06-2

It (520.56.06) is not as described in the table (520.61.05), but it is working fine some of my own tools.

@magnumripper
Copy link
Member

Strange. It would be nice to know if the bug is still present in 520.61.05.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

I think 520.61.05. is the windows driver.

@magnumripper
Copy link
Member

Oh, that makes sense. I'm installing CUDA 11.8 now, so I can reproduce this problem.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

If CUDA version doesn't match to the driver you'll get a toolchain warning like this this:
Unsupported .version 7.3; current version is '7.2'

@magnumripper
Copy link
Member

magnumripper commented Oct 21, 2022

I think 520.61.05. is the windows driver.

No - when I installed CUDA 11.8 on Ubuntu 20.04, I actually got the 520.61.05 driver. However, that driver show the same errors from WPAPSK-opencl format. I'm nearly 100% sure this is a driver bug but I'll see if I can work around it.

@magnumripper
Copy link
Member

OK, so obviously other formats are affected - not sure how many.

Here's a workaround for you though: Edit john.conf and find GlobalBuildOpts. Append -cl-std=CL1.2 to what is already there:

GlobalBuildOpts = -cl-mad-enable -cl-std=CL1.2

IIRC, a problem is that not all OpenCL compilers will accept that option - they will bug out 😢. So we might not be able to simply throw that in there in a PR. I will test that though.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

Strange:
https://www.nvidia.com/Download/driverResults.aspx/193764/en-us/

Version: | 520.56.06
Release Date: | 2022.10.12
Operating System: | Linux 64-bit
Language: | English (US)
File Size: | 387.36 MB

https://forums.developer.nvidia.com/c/gpu-graphics/announcements-and-news/146
[Linux, Solaris, and FreeBSD driver 520.56.06

@magnumripper
Copy link
Member

I think I recall this has happened before - last time I updated CUDA on this machine I also got a driver version "sligthly newer than latest". It doesn't matter for this issue though: The problem is there with any of them.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

I agree, but it is interesting to see that the driver is packet by RHel and Ubuntu, but it is not official present on nvidia.com.

@magnumripper
Copy link
Member

magnumripper commented Oct 21, 2022

The CUDA .deb packages for Ubuntu are actually sourced from https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ so they are not "packaged by Ubuntu" at all.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

Ok, thanks for the info.
No trying to modify john.conf as suggested.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

Workaround
#5205 (comment)
is working as expected:

$ john -w:word.list --format=wpapsk-opencl pmkid.john
Device 1@tux1: NVIDIA GeForce GTX 1080 Ti
Using default input encoding: UTF-8
Loaded 1 password hash (wpapsk-opencl, WPA/WPA2/PMF/PMKID PSK [PBKDF2-SHA1 OpenCL])
Note: Minimum length forced to 8 by format
LWS=256 GWS=1048576 (4096 blocks) 
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Warning: Only 1 candidate buffered, minimum 1048576 needed for performance.
hashcat!         (?)     
1g 0:00:00:00 DONE (2022-10-21 17:50) 20.00g/s 20.00p/s 20.00c/s 20.00C/s Dev#1:62°C hashcat!
Use the "--show" option to display all of the cracked passwords reliably
Session completed. 

Thanks for your effort.

@ZerBea
Copy link
Author

ZerBea commented Oct 21, 2022

Shall we close this issue report or wit for another solution?

@magnumripper
Copy link
Member

Let's keep it open please.

@magnumripper
Copy link
Member

This fixes most errors:

$ git diff --stat
 run/opencl/ed25519-donna/curve25519-donna-32bit.h |  6 +++---
 run/opencl/opencl_des.h                           | 38 ++++++++++++++------------------------
 run/opencl/opencl_md4_ctx.h                       |  4 ++--
 run/opencl/opencl_md5_ctx.h                       |  4 ++--
 run/opencl/opencl_sha1_ctx.h                      |  4 ++--
 run/opencl/opencl_sha2_ctx.h                      |  8 ++++----
 run/opencl/opencl_twofish.h                       |  8 ++++----
 7 files changed, 31 insertions(+), 41 deletions(-)

That's 100% things like this:

diff --git a/run/opencl/opencl_des.h b/run/opencl/opencl_des.h
index acbcac5ad..698654c25 100644
--- a/run/opencl/opencl_des.h
+++ b/run/opencl/opencl_des.h
@@ -296,7 +296,7 @@ __constant uchar odd_parity_table[128] = { 1,  2,  4,  7,  8,
        227, 229, 230, 233, 234, 236, 239, 241, 242, 244, 247, 248, 251, 253,
        254 };
 
-inline void des_key_set_parity(uchar key[DES_KEY_SIZE])
+inline void des_key_set_parity(uchar *key)
 {
        int i;
 

However, now I'm hitting more complex issues:

In file included from opencl/ed25519-donna/ed25519-donna.c:7:
In file included from opencl/ed25519-donna/ed25519-donna.h:42:
opencl/ed25519-donna/ed25519-donna-impl-base.h:7:17: error: passing '__generic uint32_t *' (aka '__generic unsigned int *') to parameter of type 'uint32_t *' (aka 'unsigned int *') changes address space of pointer
        curve25519_mul(r->x, p->x, p->t);
                       ^~~~
opencl/ed25519-donna/curve25519-donna-32bit.h:154:28: note: passing argument to parameter 'out' here
curve25519_mul(bignum25519 out, const bignum25519 a, const bignum25519 b) {
                           ^

Also, I'm not glad to push the current fixes - they make the code slightly worse. This is a driver problem.

@ZerBea
Copy link
Author

ZerBea commented Oct 22, 2022

Again I fully agree. Driver problems are ugly.
While coding hcxdumptool/hcxlabtool I noticed several driver issues. Only two drivers (mt76 and rt28000usb) are flawless and I gave it up to workaround the other ones. In contrast to JtR, hcxdumptool is on the fly and it must respond within a specified time gap. Otherwise the attack will fail. A driver problem prevent this completely.
For me the best solution is to report it to the driver developer/maintainer. That worked fine on mt76 and rt28000usb.

@solardiz
Copy link
Member

@magnumripper Rather than revise the kernels, would it possibly be cleaner to conditionally add -cl-std=CL1.2 only on NVIDIA?

Also, how does/will hashcat approach the same problem, or do their kernels build fine with that driver as-is?

@magnumripper
Copy link
Member

magnumripper commented Oct 24, 2022

That just hit me as well. The following patch should do the trick.

That patch and PR wasn't correct - it would only kick in at max. verbosity. My current idea is to implement two more configuration keys nvidiaBuildOpts and AMDbuildOpts and put the mentioned option in the former.

Meanwhile, using that compile option "All 91 formats passed self-tests!".

magnumripper added a commit to magnumripper/john that referenced this issue Oct 24, 2022
Array arguments in kernel functions would trigger bugs unless explicitly
reverting to OpenCL 1.2.

Closes openwall#5205
@magnumripper
Copy link
Member

Apparently (try googling it) some older drivers don't recognize the "-cl-std=CL1.2" option, including nvidias, even though it's in the standard. Or perhaps such drivers were only 1.1 compliant (only supporting -cl-std=CL1.1).

We should probably go for the device or platform version string (as listed with --list=opencl-devices) instead. I hope/believe they always start with OpenCL x.y so we could add this option whenever it does and x is larger than '1' (and assume such drivers will accept the option - they really should).

On an other note the CUDA 11.8 device version says "OpenCL 3.0 CUDA" - I never even heard of OpenCL 3.0. The platform version is "OpenCL 3.0 CUDA 11.8.88". I guess we should look at the device version for this logic but it wouldn't matter for my machine.

./john --list=opencl-devices | grep -E "(Device|Platform).*(name|version.*OpenCL .)"

HPC village:

Platform #0 name: AMD Accelerated Parallel Processing, version: OpenCL 2.1 AMD-APP (2766.4)
    Device #0 (1) name:     gfx900
    Device version:         OpenCL 2.0 AMD-APP (2766.4)
Platform #1 name: Intel(R) OpenCL, version: OpenCL 1.2 LINUX
    Device #0 (2) name:     Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
    Device version:         OpenCL 1.2 (Build 76921)
    Device #1 (3) name:     Intel(R) Many Integrated Core Acceleration Card
    Device version:         OpenCL 1.2 (Build 76921)
Platform #2 name: NVIDIA CUDA, version: OpenCL 1.2 CUDA 10.1.105
    Device #0 (4) name:     GeForce GTX 1080
    Device version:         OpenCL 1.2 CUDA
    Device #1 (5) name:     GeForce GTX TITAN X
    Device version:         OpenCL 1.2 CUDA
    Device #2 (6) name:     GeForce GTX TITAN
    Device version:         OpenCL 1.2 CUDA

My Linux gear:

Platform #0 name: Portable Computing Language, version: OpenCL 1.2 pocl 1.4, None+Asserts, LLVM 9.0.1, RELOC, SLEEF, DISTRO, POCL_DEBUG
    Device #0 (1) name:     pthread-Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
    Device version:         OpenCL 1.2 pocl HSTR: pthread-x86_64-pc-linux-gnu-haswell
Platform #1 name: NVIDIA CUDA, version: OpenCL 3.0 CUDA 11.8.88
    Device #0 (2) name:     NVIDIA GeForce RTX 2080 Ti
    Device version:         OpenCL 3.0 CUDA

Macbook:

Platform #0 name: Apple, version: OpenCL 1.2 (Aug  8 2022 21:29:33)
    Device #0 (1) name:     Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
    Device version:         OpenCL 1.2 
    Device #1 (2) name:     Intel(R) UHD Graphics 630
    Device version:         OpenCL 1.2 
    Device #2 (3) name:     AMD Radeon Pro Vega 20 Compute Engine
    Device version:         OpenCL 1.2 

@ZerBea
Copy link
Author

ZerBea commented Oct 25, 2022

Here is another one (also running Arch Linux):

$ john --list=opencl-devices
Platform #0 name: NVIDIA CUDA, version: OpenCL 3.0 CUDA 11.8.87
    Device #0 (1) name:     NVIDIA GeForce GTX 1650
    Device vendor:          NVIDIA Corporation
    Device type:            GPU (LE)
    Device version:         OpenCL 3.0 CUDA
    Driver version:         520.56.06 [recommended]
    Native vector widths:   char 1, short 1, int 1, long 1
    Preferred vector width: char 1, short 1, int 1, long 1
    Global Memory:          3912 MiB
    Global Memory Cache:    512 KiB
    Local Memory:           48 KiB (Local)
    Constant Buffer size:   64 KiB
    Max memory alloc. size: 978 MiB
    Max clock (MHz):        1560
    Profiling timer res.:   1000 ns
    Max Work Group Size:    1024
    Parallel compute cores: 16
    CUDA INT32 cores:       1024  (16 x 64)
    Speed index:            1597440
    Warp size:              32
    Max. GPRs/work-group:   65536
    Compute capability:     7.5 (sm_75)
    Kernel exec. timeout:   no
    NVML id:                0
    PCI device topology:    01:00.0
    PCI lanes:              8/16
    Temperature:            63°C
    Utilization:            89%

magnumripper added a commit to magnumripper/john that referenced this issue Oct 25, 2022
Array arguments in kernel functions would trigger bugs unless explicitly
reverting to OpenCL 1.2.  We're now adding -cl-std=CL1.2 when applicable.

Closes openwall#5205
magnumripper added a commit that referenced this issue Oct 31, 2022
Array arguments in kernel functions would trigger bugs unless explicitly
reverting to OpenCL 1.2.  We're now adding -cl-std=CL1.2 when applicable.

Closes #5205
@ZerBea
Copy link
Author

ZerBea commented Oct 31, 2022

Everything is working fine. Again thanks for your effort.

Unfortunately all Linux distribution that recently updated to last NVIDIA are hit by this issue and have to wait for upcoming JtR release.

@magnumripper
Copy link
Member

magnumripper commented Nov 2, 2022

I think I start to understand this "driver bug" now (if it's even a bug?). Here's the deal: OpenCL doesn't really allow arrays as function parameters unless they are in private memory (see #4946 for an issue with __constant arrays).

So this driver (when in OpenCL 2.0 mode) regard array parameters as implicitly specified with __private while the caller is using arguments implicitly using __generic:

In file included from <kernel>:12:
opencl/opencl_sha2_ctx.h:122:24: error: passing '__generic uchar *' (aka '__generic unsigned char *') to parameter of type 'const uchar *' (aka 'const unsigned char *') changes address space of pointer
                _sha256_process(ctx, ctx->buffer);
                                     ^~~~~~~~~~~
opencl/opencl_sha2_ctx.h:40:51: note: passing argument to parameter 'data' here
void _sha256_process(SHA256_CTX *ctx, const uchar data[64]) {
                                                  ^

This ends up similar to #4667 where we explicitly used __private: In the end it is private memory so it could (should?) work just fine alas it doesn't. It's stupid but I'm not sure it's caused by a miss in the standard or a poor implementation in the driver. Anyway, for now the committed workaround is probably the best way to tackle it.

@magnumripper
Copy link
Member

magnumripper commented Nov 14, 2022

I got the same problem with -cl-std=CL1.2 on nvidia 510.47.03 although that wasn't with JtR code. That is kinda hopeless. Oh, I got it on 470.141.03 as well. And it gets worse: When I dropped the -cl-std=CL1.2 option, problem went away in that case. Sometimes you'd think they make these quirks just to make our lives miserable. Changing array parameters to pointers fixed it (now works fine with or without -cl-std=CL1.2).

So I read up on the generic address space (looking at the OpenCL 2.0 spec, not the later ones). First of all, constant memory is disjoint from the generic address space - I wasn't aware of that (or had forgot about it). So you still can't write a single memcpy() function from/to any memory, you'd need one for "generic to generic" and one for "from constant to generic". Still, it's just two functions instead of twelve for handling any combo...

Also:
https://registry.khronos.org/OpenCL/specs/opencl-2.0.pdf

  • A pointer to generic can be cast to a pointer to global, local or private.
  • A pointer to global, local or private can be cast to a pointer to generic.
  • A pointer to global, local or private can be implicitly converted to a pointer to generic but the converse is not allowed.

So a pointer to generic CAN NOT be implicitly converted to a pointer to global, local or private. Perhaps that's our problem - but since we're pretty far from being able to require OpenCL 2.0, we should just as well simply do what we do now - enforce OpenCL 1.2. The fact that the option sometimes fail is very annoying though.

While at this, I tried making an experimental fork of jumbo that requires OpenCL 2.0. I dropped all redundant memcpy/memset/memchr/memmem functions in favor of ones that use generic memory as well as some other tricks for handling different memory spaces. This worked just fine (all formats passed self-test) on nvidia provided I never used the __generic keyword(!).

Oh BTW I found this interesting bit of information: https://stackoverflow.com/a/22757591

@ZerBea
Copy link
Author

ZerBea commented Nov 15, 2022

Excellent explanation and good work.
I know this "problem" well, because I'm running a rolling release distribution (Arch). After a system update, I have to check my entire code.

@ZerBea
Copy link
Author

ZerBea commented Nov 15, 2022

@magnumripper , a little bit out of scope.
How about the new EAPOL-PMKID hash line?
hashcat/hashcat#1816
which is a huge improvement, especially on large hash files / pot files:
ZerBea/hcxtools#227 (comment)

I've never seen people posting JtR WPA hash lines, but I often have seen PMKID-EAPOL hash lines:
https://forum.hashkiller.io/index.php?threads/fastweb-cap.62484/#post-316855

@magnumripper
Copy link
Member

Yeah that's #4183. Unfortunately it doesn't line up well with JtR's core so it'll take some effort to implement. But I am planning to do it, some day...

@ZerBea
Copy link
Author

ZerBea commented Nov 15, 2022

I am pleased to hear this, because it removes the entire internal (ancient) hccap structure.
During conversion, I have to use it and JtR have to use it again to convert them back:
WiFi packet -> hccap -> JtR hash line <-> JtR hash line -> hccap -> GPU

BTW:
It would we a huge step forward fo services e.g. like this:
https://wpa-sec.stanev.org

@solardiz
Copy link
Member

@magnumripper What about adding casts from whatever (generic or private depending on OpenCL version, which we won't need to care about?) to private - wouldn't that make our source code compatible with both OpenCL 1.2 and 2.0?

@magnumripper
Copy link
Member

I'm not sure I understand the concept of casting between memory types at all: If you have a generic pointer pointing to data in global memory, it obviously can't be cast to private. What would that even mean?

What I do understand is this: If you write eg. a memcpy function using generic pointer parameters (actually by using unnamed memory because apparently you can't explicitly say __generic), the OpenCL 2.0 compiler will just "do the right thing". So if that function is called once using pointers-to-global and another time using pointers-to-private, I assume two different versions of the function will be produced (or inlined) accordingly. That's convenient but I can't understand why they left __constant out. Also it's a pity it wasn't in the standard from the beginning - it's a simple concept.

Beyond that, I don't understand this at all.

@solardiz
Copy link
Member

@magnumripper I don't really know, but my guess is that by casting from __generic to __private, you guarantee the pointer was actually to private memory anyway.

@magnumripper
Copy link
Member

https://www.intel.com/content/www/us/en/developer/articles/technical/the-generic-address-space-in-opencl-20.html

Apparently the concept of a generic address space comes from Embedded C, whatever that is. Perhaps I should google that and see if there are any more mature descriptions and examples.

@solardiz
Copy link
Member

@magnumripper Reading that Intel article you referenced, I get the impression that we'd also avoid the problem by explicitly specifying __private on all pointers that are currently not explicitly address space qualified. Can we do that easily enough?

@magnumripper
Copy link
Member

I think that would be an awful lot of places unless we add it on a case-by-case basis as problems are seen.

Since we seem to be good right now, let's just leave everything alone until new problems emerge, if ever.

@solardiz
Copy link
Member

Since we seem to be good right now, let's just leave everything alone until new problems emerge, if ever.

I agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants