New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: opensource OpenCL support #3

Closed
ignatenkobrain opened this Issue Dec 5, 2015 · 38 comments

Comments

Projects
None yet
7 participants
@ignatenkobrain

ignatenkobrain commented Dec 5, 2015

Now we have support OpenCL for:

  • CPU -> POCL
  • Intel GPU -> Beignet
  • AMD GPU -> mesa

Would be great to get opensource OpenCL support.

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 5, 2015

Sorry, I don't understand what you mean, could you please rephrase?

@ignatenkobrain

This comment has been minimized.

ignatenkobrain commented Dec 5, 2015

Currently oclHashcat supports only AMD OpenCL SDK and CUDA, but we have OpenCL which is not part of those. I want to get support of it.

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 5, 2015

I see, that is indeed interessting especially for the CPU part, but definitly a long-term suggestion.

@ignatenkobrain

This comment has been minimized.

ignatenkobrain commented Dec 5, 2015

I am not interested in CPU part, I'm mostly interested in beignet part, because it works on Intel GPUs which is mostly everywhere.

@epixoip

This comment has been minimized.

Member

epixoip commented Dec 5, 2015

Intel GPUs are not likely to be much if at all faster than CPU, which is one of the reasons we never bothered to support them.

@ignatenkobrain

This comment has been minimized.

ignatenkobrain commented Dec 5, 2015

It is faster (at least with beignet). https://01.org/beignet

@epixoip

This comment has been minimized.

Member

epixoip commented Dec 5, 2015

That link doesn't really say anything of value.

Let's look at it this way: current top-of-the-line Intel GPUs have up to 384 cores which run at up to 1100 Mhz, and lack instructions that help reduce instruction count for many hash functions (nothing similar to bitselect, bitalign, LOP3.LUT, etc.) So basically these will be almost identical in speed very low-end pre-Maxwell Nvidia GPUs.

For example, the Iris Pro 6200 would be about as fast as a GT 640, which yields a staggering 450 MH/s on raw MD5. And by "staggering," I mean that's 23 times slower than a GTX 970. And that's for their top of the line GPU, if you have something like an HD4200 then it would be more like 50 times slower than a GTX 970.

So this is why we've never cared about adding Intel GPU support. Even their fastest GPUs are about the same speed as high-end CPUs.

@magnumripper

This comment has been minimized.

Contributor

magnumripper commented Dec 5, 2015

IMHO, regardless of Intel GPUs' current state there's really no reason for Hashcat not to "work" with just any OpenCL device. Instead you could just state that only AMD is actively "supported" and/or that the OpenCL code is optimized for AMD and YMMV.

So I think we should look into wrapping AMD-specific stuff (just guessing there is some) in #ifdef blocks and provide some alternative "generic" alternative. Once it builds and runs on nvidia (as an alternative to CUDA) it should also run fine on Iris or crappy AMD APU's - as well as on some really cool FPGA board with OpenCL abstraction layer...

I haven't looked much at the code base yet so I'm mostly assuming things here.

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 5, 2015

You're right. Now that we're not forced any longer to ship binary kernels we could allow other platforms than AMD to run the OpenCL kernels. Sure, some of the optimizations wouldn't work. However I'm absolutely positive about this, especially as it would give us an easy start into support CPU as well. I'll look into this in some time, but would also would be happy on any help on it

@magnumripper

This comment has been minimized.

Contributor

magnumripper commented Dec 5, 2015

In fact it might save you a lot of time and work if you (eventually) ditch all CUDA stuff and go all-in for OpenCL. Unless there are some clear cases where CUDA can be significantly faster?

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 6, 2015

Yeah I'd like to do so, but the most important reason for using CUDA is that only with CUDA it's possible to cross-compile GPU code using nvcc's -arch option. So we'd have a strange situation in which we'd use OpenCL on Nvidia if the users chooses to use the JIT but for binary distribution we still need to use CUDA. That sounds very complicated. BTw, there was such an option back in earlier versions of NVidia OpenCL runtime, but it got dropped without any reason.

And yes there's also the reason with the performance:

a) While this is just speculation, it seems NV isn't really happy with OpenCL. It's obvious that they try to push CUDA instead of OpenCL, for business reasons.
b) I remember with driver change somewhere between 150.x and 200.x the speeds for the OpenCL began to drop by 15% without any reason or any change within the ptx. It's also speculation but I think they tried to artificially slow down OpenCL to make CUDA more interessting.

@jsteube jsteube added the new feature label Dec 9, 2015

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 14, 2015

It seems we have alot of good reasons to fully switch to OpenCL. I've tried that yesterday but it's more complicated than one would think in the first place. We need to get rid of all of our C++ code (which is mostly only overloaded functions) because NVidias OpenCL runtime does not support C++. But there's no way around, since so many other tasks depend on this change.

@magnumripper

This comment has been minimized.

Contributor

magnumripper commented Dec 14, 2015

In JtR we use function-like macros for "overloading". Too bad the typeof keyword is not mandatory in OpenCL - that would have been perfect. Most drivers support it though (including AMD, nvidia and Apple) but Intel does not (well some of their drivers do).

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 15, 2015

For those interessted in development: https://github.com/hashcat/oclHashcat/tree/GetRidOfCUDA

@shellster

This comment has been minimized.

shellster commented Dec 20, 2015

While I'm all for OpenCL, there is potentially at least one reason not to. I currently bought an NVIDIA Jetson TK1 with the hope of eventually getting oclHashCat running on it. There is currently (and likely never will be) any OpenCL support. We are stuck with CUDA. There are other issues with running on this platform (ARM and no NVML support), but only OpenCL will add yet another hurdle. This may or may not be enough of a reason to not go full OpenCL. I predict that we will see more of this specialty boards in the near future, and they offer some really nice power to energy ratios for cracking.

@magnumripper

This comment has been minimized.

Contributor

magnumripper commented Dec 20, 2015

@shellster is that thing usable in real life? That is a sincere question, I'm not bashing. What's the performance?

There's no need to actually ditch the working CUDA code. Sure, the tree would be much easier to maintain without CUDA but you could keep the kernels for a while and just separate the builds: Stop calling them "nvidia" and "amd" and instead call them what they really are: "cuda" and "opencl". That is what we do in JtR although in that case, the CUDA stuff is mostly laughable.

@shellster

This comment has been minimized.

shellster commented Dec 20, 2015

@magnumripper That's a good question. It's hard to know without being able to test it. This guy is getting some pretty decent WPA cracking speeds with the board: https://devtalk.nvidia.com/default/topic/785084/jetson-for-pyrit-cpyrit-cuda-stats/

The killer feature is that the board only uses about 5W of power, so the potential power comes from clustering them.

@epixoip

This comment has been minimized.

Member

epixoip commented Dec 20, 2015

525 H/s @ 5W. Meanwhile a GTX 970 gets 180000 H/s @ 135W. That's 105 H/W vs 1333 H/W. So that's both poor performance AND poor perf/Watt.

@shellster

This comment has been minimized.

shellster commented Dec 20, 2015

@epixoip: I believe you are misreading the post. The first benchmark at the top is after adjusting the clock speed: 6203.2 PMKs/s

Also, I don't really want to get too tied up about this particular board. I'm just wondering if the trend of single purpose compute boards is likely to continue, and if that's enough of a reason to keep CUDA specific support around. I wanted to bring it up if no one had considered it yet.

@epixoip

This comment has been minimized.

Member

epixoip commented Dec 20, 2015

Sorry, all I saw was "#1: 'CUDA-Device #1 'GK20A'': 525.1 PMKs/s (RTT 2.8)". But even 6203 H/s falls short in the perf/W category, and well short in perf/$ as well. You'd need 29 of these boards to match the speed of one GTX 970 for a cost of $5600 vs $320.

Historically we've had zero interest in underpowered / poor-performing hardware. I suppose that's because in our view, Hashcat is for serious professionals, and serious professionals have real hardware. I know JtR aims to be compatible with pretty much anything and everything, but I personally do not see our focus shifting away from serious use cases.

@ignatenkobrain

This comment has been minimized.

ignatenkobrain commented Dec 20, 2015

I am very happy, that you are working on opencl support, but I have question. What is JtR?

@epixoip

This comment has been minimized.

Member

epixoip commented Dec 20, 2015

JtR is the acronym we frequently use to refer to John the Ripper.

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 20, 2015

@shellster The whole Jetson TK1 discussion makes no sense since they dropped the support for the board with CUDA 7.0: https://devtalk.nvidia.com/default/topic/805540/no-cuda-7-0-support-for-jetson-tk1-board/

@magnumripper I'll drop CUDA completely not just because it's easier to maintain afterwards but also because it makes the host code easier to read. If I remove the code from the host, there's no sense in keeping the NV folder. If anyone really wants it simply checkout an older version. PS: The CUDA branch uses already an OpenCL/ folder

@jsteube

This comment has been minimized.

Member

jsteube commented Dec 20, 2015

So far the GetRidOfCUDA branch works, all unit-tests passed. Now we have to focus on the performance, and there's a lot to do! I justed started a sheet for comparison:

https://docs.google.com/spreadsheets/d/1B1S_t1Z0KsqByH3pNkYUM-RCFMu860nlfSsYEqOoqco/edit#gid=0

Don't get shocked by those early numbers, I'm confident we'll stabilize them after optimizing each single kernel

@jsteube

This comment has been minimized.

Member

jsteube commented Jan 4, 2016

FYI, first version working with pocl now

@ignatenkobrain

This comment has been minimized.

ignatenkobrain commented Jan 4, 2016

Very cool, how I can compile it without having amd sdk, cuda and etc. ?

@jsteube

This comment has been minimized.

Member

jsteube commented Jan 5, 2016

You can not, however the depencies were reduced to only a few which you can automatically install using tools/deps.sh -- see building.md

@jsteube jsteube referenced this issue Jan 5, 2016

Closed

oclHashcat API #9

@pjaaskel

This comment has been minimized.

pjaaskel commented Jan 6, 2016

FYI, latest pocl versions support HSA, thus e.g. AMD Kaveri GPU and in the future hopefully more. Thus, not for commercial CPU-only devices anymore, but any HSA-supported devices should in principle work.

We'll try to take a look at pocl/pocl#290 as soon as possible.

@01BTC10 01BTC10 referenced this issue Feb 17, 2017

Closed

Scrypt #393

magimix123 referenced this issue in DoZ10/hashcat Jun 8, 2017

- Added hash-mode 15600 = Ethereum Wallet, PBKDF2-HMAC-SHA256
- Added hash-mode 15700 = Ethereum Wallet, PBKDF2-SCRYPT
Resolves hashcat#1227
Resolves hashcat#1228

This was referenced Jan 28, 2018

@vendforce vendforce referenced this issue Jan 31, 2018

Closed

Nvidia error #1509

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment