New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscounts number of arguments when structs present #197
Comments
|
Current pyopencl test suite stats: 12 of the failures are this issue. |
|
Check issue #1. It is unportable to use structs to pass data between the host and the device in OpenCL due to the possible alignment and padding differences between the host and device. So, if possible, passing structs should be avoided, at least as value arguments. However, this is clearly a (very long standing) bug in pocl which could be now perhaps fixed more easily thanks to the introduction of the kernel argument metadata in the bitcode from the frontend. But I'm not pushing this to very high in my personal priority list due to the above reason. |
|
In general, I agree that struct alignment is an issue. But for simple stuff like "two doubles", I doubt there's going to be much disagreement over the layout. (That said, PyOpenCL has machinery to adapt struct layouts to the target machine, but that's sort of beside the point.) I'll take a look. Can you point me in vaguely the right direction? |
|
I agree that simple cases like this should be rather portable. In this case I doubt it's only the getting number of arguments that's failing, but the real issue will reveal itself when calling the kernel. The struct is probably split to two doubles in the finger print of the kernel to adhere to the calling convention of the target. The problem here must be that pocl looks at the kernel finger print when it returns the number of arguments. Anyways, tracking this problem can be started from pocl_llvm_api.cc around line 701 where it gets the number of kernel arguments by looking at the LLVM representation of the kernel (which is now in the target's CC with the struct split to multiple function args). If possible, one could check here if the kernel has metadata of the arguments and produce them correctly from the metadata instead of looking at the LLVM IR of the kernel function. Another issue (which I haven't looked/tested/worked on for a while so what I write in the following might be outdated) is to pass the arguments correctly. The idea here is that Workgroup.cc generates a wrapper to the kernel function produced by Clang (and de-SPMD'd by the kernel compiler for MIMD machines). The wrapper takes in the arguments in an array where there's one slot per argument and assumes that either the value argument fits into the slot or the slot contains a pointer to the actual data. Then the Workgroup.cc generated wrapper loads the arg data from the array (let's call this an argument space) and calls the actual kernel function using its CC. Now I'm not sure what happens in this case where the data doesn't fit into the 64b slot. This is the problem with #1 and the long standing failing test cases in the suite: I'm glad if you can improve this part. I also hang in #pocl channel (oftc) where I can help more interactively. |
|
Oh, and the argument passing code for the 'pthread' CPU driver is in lib/CL/devices/pthread/pthread.c:638. |
|
FWIW, Apple CPU (on OS X 10.10.4 even) has the same issue. |
|
Hmm, I looked into fixing this, and I couldn't figure out where to even get source-level information about what the arguments were before the calling convention was applied. Since Apple has the same bug, I've (somewhat yuckily) worked around this in PyOpenCL for now. That's good in principle, because it means I should be able to use pocl. That's also bad because it means I no longer have a high-priority motivation to fix this... |
|
...and here I am, psychoanalyzing LLVM's calling conventions from Python. Well-deserved, too. I'm slightly more motivated now to get this fixed. |
|
pocl_llvm_get_kernel_arg_metadata() has some code that could be useful. I see it has the type info unimplemented, but in principle you should get that info (only) by looking into the metadata, unless wanting to parse the kernel sources. |
|
On a build (of 75d64c6) with assertions, this also appears to cause Here's the associated backtrace: |
|
Fixed along with #1. |
Compiling this snippet (let's call it
axpb.cl):and checking the number arguments to the kernel yields 2 on pocl 438ce41. Sample driver script:
The text was updated successfully, but these errors were encountered: