-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bindings for clEnqueueNDRangeKernel different to c++ version #21
Comments
As already discussed in the issue that you linked to: It's hard to figure out where exactly in the stack of
this error is introduced. As a first, very basic step, I created a (quick and dirty) test based on one of the JOCL Samples, and plugged in your kernel, and it basically seems to work with larger "local work sizes":
The output is probably not the desired one... ... but I think the kernel is still preliminary, and in any case, this is unrelated to the original issue. Sorry, this may not immediately help you, but may be a first step to narrow down the search space... |
@gpu: wow. thanks for the example. the output is a little weird but it shows that it's working and way better than all zeros. Let me verify this on my end and get back to you. |
I'm running this example on a Macbook Pro OSX 10.13.2. In order to get the code working, I've had to disable:
otherwise this happens:
In the example, for a LOCAL_WORK_SIZE of {1, 1} I'm getting this: In the example, for a LOCAL_WORK_SIZE of {10, 10} I'm getting this:
|
I just tried it out on my...
and also encountered a crash for the last one. However, this crash might in fact be unrelated to the local work size: Some write operations are not supported in older OpenCL versions. See https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/restrictions.html :
And you are writing to such a pointer. So if you're running this on an "old" platform, arbitrary things may go wrong... Can you run a device query on your platform? (Note: This is still a guess. If you edit your kernel to omit all write operations, then I'm pretty sure it would "work", which would (not confirm, but at least) be a hint that it might not the binding or launch configuration or work sizes, but indeed the fact that you're doing invalid write operations...) |
@gpu: okay, I've found the mistake. I'd always thought that the GPU was the first device on the platform and so I was making the call:
which, I realised only after running the device query that I was selecting the CPU
Selecting the GPU works fine, although for an image of 500x500, [8 8] workgroup still does not work but [10 10] does. |
Is there a way to determine the type of device that is available to be selected? Like as in - select the gpu first, or else select the cpu. |
Good to hear that it is (basically) resolved. For the device type, there are two options:
For the latter, you may have a look at the
in the example above. (This could be changed to The fact that it does not work for (500,500)/(8,8) is then more likely due to the divisibility issue. I think your kernel is already prepared to handle that, basically: You can add a "padding" for the image, i.e. use a global work size that is larger than the image, and just do a bounds check at the beginning of the kernel. (This may be particularly relevant when the size of the image is a prime number, and no sensible local work size except for (1,1) can be found). BTW, it may be worth mentioning here: When passing in |
@gpu: Thanks so much for your help and patience. I'll close this issue and will experiment a bit more with the above info. |
I'm looking to reuse opencl code for a sobel filter that I got working with opencv.
I'm getting a
CL_INVALID_WORK_GROUP_SIZE
when using a local worksize of anything other than[1,1]
. The project is here and I'm also asking on the clojurecl repo.I'm hoping to figure out why this is happening.
The text was updated successfully, but these errors were encountered: