-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia busy-wait workaround #4781
Comments
If you are using NVIDIA (it seems to be the case), this is a known issue (examples):
BTW: the workaround looks ugly. |
@solardiz this sounds bad:
We are old and tired but not dead. BTW: I think we will close the issue shortly, it is known and it is "by design" (?) from NVIDIA itself. |
Hello
I know about Nvidia OpenCL problem.
My question - if JtR is still maintained - is why not moving to CUDA (since
currently OpenCL is deprecated.)
BTW there is a piece of code that can even fix the OpenCL problem if anyone
cares about it
…On Mon, Aug 9, 2021, 12:49 Claudio André ***@***.***> wrote:
If you are using NVIDIA (it seems to be the case), this is a known issue
(examples):
-
https://forums.developer.nvidia.com/t/opencl-busy-wait-still-not-fixed/46441
- openmm/openmm#1541 <openmm/openmm#1541>
BTW: the workaround looks ugly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4781 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALB37M7TJBNL4YYKX7R3I5LT4ABKPANCNFSM5B2L4OIQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Note: in one of the filed bugs - OpenCL will take over one core at 100%
In my case I have 4 cores ALL at 100%
…On Mon, Aug 9, 2021, 12:49 Claudio André ***@***.***> wrote:
If you are using NVIDIA (it seems to be the case), this is a known issue
(examples):
-
https://forums.developer.nvidia.com/t/opencl-busy-wait-still-not-fixed/46441
- openmm/openmm#1541 <openmm/openmm#1541>
BTW: the workaround looks ugly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4781 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALB37M7TJBNL4YYKX7R3I5LT4ABKPANCNFSM5B2L4OIQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
If you are willing to spend a couple thousand hours coding CUDA for us, you are very welcome to do so! 😉
I might experiment with that some time. I never cared a lot though.
Sure, one per GPU. It's stupid but it's not the end of the world, you have a multitasking OS. |
Here you can find a discussion about the toptic. |
In my car using 4 cores at 100% = 4x 35w = 12oW => 2.88 KW / 24h
In a long run usually 8 days that is cca 24 KW WAISTED energy + the power
required by GPUs
So, if the development for JtR is closed, I need to find another solution
since JTR is too expensive to run.
Do you know who should I contact on the development team?
…On Mon, Aug 9, 2021 at 1:52 PM magnum ***@***.***> wrote:
why not moving to CUDA (since currently OpenCL is deprecated
If you are willing to spend a couple thousand hours coding CUDA for us,
you are very welcome to do so! 😉
We had CUDA along with OpenCL years ago but since CUDA is proprietary and
we lacked volunteers we ended up with OpenCL only. Not sure why you say
it's deprecated?
there is a piece of code that can even fix the OpenCL problem
I might experiment with that some time. I never cared a lot though.
In my case I have 4 cores ALL at 100%
Sure, one per GPU. It's stupid but it's not the end of the world, you have
a multitasking OS.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4781 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALB37M7WZTL6IJ63JPI7XVDT4AIV5ANCNFSM5B2L4OIQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Solar is going to complain, but that is too much.
Are you asking magnum this question? Is that is a joke? A Millennial(ness)? If you need paid support you can contact Openwall. 3rd line below.
|
Just to reinforce (9 year ago, john-dev, the complain is about CUDA):
Something we prefer to avoid. |
Hi @asaidac. John the Ripper is maintained. The fact that we might not fix this one issue doesn't mean the project as a whole is unmaintained. You don't need to contact anyone in particular - you've already contacted us here. As to fixing this issue, ideally NVIDIA would. As we're currently aware, we can only do one of these things:
We don't currently intend to go and reintroduce CUDA support. Definitely not before the next release. Maybe later. However, I don't mind us experimenting with the workaround. BTW, another workaround, and one a user can use (with |
I guess you can also lower that by forcing those cores into a lower power state (lower clock rate). Something like Of course, ideally the issue wouldn't exist in NVIDIA OpenCL, or would be worked around in JtR, so I am not posting the above suggestions about |
I hear you and fixing this, in any way, would be "timely" and better for our battered environment. The best (and easiest, and most effective for the world) would be for Nvidia to fix their shit.
I wasn't aware of such dual-use kernels, that's conceptually very interesting. But it only solves kernel side - I think hashcat has a whole lot more shared code for formats whereas we'd need to write full CPU-side code for CUDA for each of our ~89 GPU formats. Obviously we would add an abstraction layer instead, but that would still be a rewrite of 89 formats (OTOH it would pay off for every future format we add). If we could afford pulling off a GSoC again (provided they still run it?) it would be an excellent task: Primarily, write all shared code and move at least one format. Secondary, move more/all formats. |
Hello
I consider this issue closed -
I implement a solution that will alleviate the problem I WAS experienced.
Since the CPU creates spin cycles / no actual work being done, and the fact
the CPU will support OpenCL I've added the CPU to the device list
Node numbers 1-5 of 5 (fork)
Device 4: GeForce GTX 1080 Ti
*Device 5: pthread-Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz*
Device 3: GeForce GTX 1660
Device 2: GeForce GTX 1060 6GB
Device 1: GeForce GTX 1080
In this configuration, the CPU does not make a big contribution to the
general performance level but at least does not use empty power any longer.
1 0g 0:00:04:41 0.01% (8) (ETA: 2021-09-01 09:34) 0g/s 295710p/s 295710c/s
295710C/s Dev#1:57°C kt4jheaa..zi61heaa
3 0g 0:00:04:41 0.01% (8) (ETA: 2021-09-08 15:39) 0g/s 222116p/s 222116c/s
222116C/s Dev#3:60°C wy1xbbuu..q8z1bbuu
2 0g 0:00:04:42 0.01% (8) (ETA: 2021-09-18 15:39) 0g/s 165752p/s 165752c/s
165752C/s Dev#2:52°C 7tjo7sss..lovc7sss
*5 0g 0:00:04:43 (8) 0g/s 567.9p/s 567.9c/s 567.9C/s v9t52222..b8t52222*
4 0g 0:00:04:43 0.02% (8) (ETA: 2021-08-26 14:17) 0g/s 401689p/s 401689c/s
401689C/s Dev#4:56°C rgoygzff..tk4fgzff
I am tuning the GPUs for power control - hence the performance is 20-30%
lower than the max.
20210810 08:35:40 0 159 57 - 89 4 0 0 4513
1822
20210810 08:35:40 1 101 51 - 100 2 0 0 3802
1708
20210810 08:35:40 2 100 60 - 79 3 0 0 4001
2115
20210810 08:35:40 3 174 55 - 88 3 0 0 5005
1708
John the Ripper 1.9.0-jumbo-1+bleeding-2e6eba49f 2021-07-07 17:16:06 +0200
OMP [linux-gnu 64-bit x86_64 AVX2 AC]
Copyright (c) 1996-2021 by Solar Designer and others
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2
Thank you for your time
…On Tue, Aug 10, 2021 at 8:18 AM magnum ***@***.***> wrote:
Most likely, in a way similar to the way hashcat reintroduced it -
avoiding having to maintain separate kernel sources for OpenCL and CUDA,
instead writing them in a language that fits both (luckily, OpenCL and CUDA
aren't too dissimilar, making this practical). As I understand, only OpenCL
vs. CUDA specific include files would need to be separate.
I wasn't aware of such dual-use kernels, that's conceptually very
interesting. But it only solves kernel side - I think hashcat has a whole
lot more shared code for formats whereas we'd need to write full CPU-side
code for CUDA for each of our ~89 GPU formats. Obviously we would add an
abstraction layer instead, but that would still be a rewrite of 89 formats
(OTOH it would pay off for every future format we add).
If we could afford pulling off a GSoC again (provided they still run it?)
it would be an excellent task: Primarily, write all shared code and move at
least one format. Secondary, move more/all formats.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4781 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALB37M7JS3SWSXR73OPQULDT4EKJXANCNFSM5B2L4OIQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
OK I'm closing this, but we still have #4363 for possibly doing something about it. |
Hello,
Not sure if the JtR is still maintained - but I'll try anyway.
I have a headless 4 GPUs rig, with a 4 physical cores CPU
Using OpenCL "-dev=1,2,3,4" where -dev are the GPU's assigned numbers by JtR.
The GPU is working well - the problem is the CPU.
During the run all 4 cores are loaded to cca. 100%.
I know that part of the job is done by CPU - but not a that kind of load.
I am suspecting that
cudaThreadSynchronize() is effectively a spin lock which polls the GPU at rather high frequency, waiting until the GPU kernel is finished. Because the CPU thread is just sitting in a polling loop, it actually isn’t doing much work - just using unnecessary power.
Any ideas how to fix this?
Thanks
Checklist
IMPORTANT
We expect only reports of issues with the latest revision found in this GitHub repository. We do not expect in here, and have little use for, reports of issues only seen in a release or in a distro package.
Attach details about your OS and about john, including:
./john --list=build-info
.The text was updated successfully, but these errors were encountered: