Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscounts number of arguments when structs present #197

Closed
inducer opened this issue Jun 30, 2015 · 11 comments
Closed

Miscounts number of arguments when structs present #197

inducer opened this issue Jun 30, 2015 · 11 comments

Comments

@inducer
Copy link
Contributor

inducer commented Jun 30, 2015

Compiling this snippet (let's call it axpb.cl):

typedef struct {
  double real, imag;
} cdouble_t;

__kernel void axpb(cdouble_t a)
{
}

and checking the number arguments to the kernel yields 2 on pocl 438ce41. Sample driver script:

import pyopencl as cl

ctx = cl.create_some_context()
with open("axpb.cl") as inf:
    src = inf.read()
prg = cl.Program(ctx, src).build()

print prg.axpb.num_args
@inducer
Copy link
Contributor Author

inducer commented Jun 30, 2015

Current pyopencl test suite stats:

17 failed, 83 passed, 9 skipped

12 of the failures are this issue.

@pjaaskel
Copy link
Member

pjaaskel commented Jul 1, 2015

Check issue #1. It is unportable to use structs to pass data between the host and the device in OpenCL due to the possible alignment and padding differences between the host and device. So, if possible, passing structs should be avoided, at least as value arguments.

However, this is clearly a (very long standing) bug in pocl which could be now perhaps fixed more easily thanks to the introduction of the kernel argument metadata in the bitcode from the frontend. But I'm not pushing this to very high in my personal priority list due to the above reason.

@inducer
Copy link
Contributor Author

inducer commented Jul 1, 2015

In general, I agree that struct alignment is an issue. But for simple stuff like "two doubles", I doubt there's going to be much disagreement over the layout. (That said, PyOpenCL has machinery to adapt struct layouts to the target machine, but that's sort of beside the point.)

I'll take a look. Can you point me in vaguely the right direction?

@pjaaskel
Copy link
Member

pjaaskel commented Jul 1, 2015

I agree that simple cases like this should be rather portable.

In this case I doubt it's only the getting number of arguments that's failing, but the real issue will reveal itself when calling the kernel. The struct is probably split to two doubles in the finger print of the kernel to adhere to the calling convention of the target. The problem here must be that pocl looks at the kernel finger print when it returns the number of arguments.

Anyways, tracking this problem can be started from pocl_llvm_api.cc around line 701 where it gets the number of kernel arguments by looking at the LLVM representation of the kernel (which is now in the target's CC with the struct split to multiple function args). If possible, one could check here if the kernel has metadata of the arguments and produce them correctly from the metadata instead of looking at the LLVM IR of the kernel function.

Another issue (which I haven't looked/tested/worked on for a while so what I write in the following might be outdated) is to pass the arguments correctly. The idea here is that Workgroup.cc generates a wrapper to the kernel function produced by Clang (and de-SPMD'd by the kernel compiler for MIMD machines). The wrapper takes in the arguments in an array where there's one slot per argument and assumes that either the value argument fits into the slot or the slot contains a pointer to the actual data. Then the Workgroup.cc generated wrapper loads the arg data from the array (let's call this an argument space) and calls the actual kernel function using its CC.

Now I'm not sure what happens in this case where the data doesn't fit into the 64b slot. This is the problem with #1 and the long standing failing test cases in the suite:

72: struct kernel arguments - lp:987905             skipped (testsuite-regression.at:107)
73: vector kernel arguments - lp:987905             skipped (testsuite-regression.at:114)

I'm glad if you can improve this part. I also hang in #pocl channel (oftc) where I can help more interactively.

@pjaaskel
Copy link
Member

pjaaskel commented Jul 1, 2015

Oh, and the argument passing code for the 'pthread' CPU driver is in lib/CL/devices/pthread/pthread.c:638.

@inducer
Copy link
Contributor Author

inducer commented Jul 2, 2015

FWIW, Apple CPU (on OS X 10.10.4 even) has the same issue.

@inducer
Copy link
Contributor Author

inducer commented Jul 5, 2015

Hmm, I looked into fixing this, and I couldn't figure out where to even get source-level information about what the arguments were before the calling convention was applied.

Since Apple has the same bug, I've (somewhat yuckily) worked around this in PyOpenCL for now. That's good in principle, because it means I should be able to use pocl. That's also bad because it means I no longer have a high-priority motivation to fix this...

@inducer
Copy link
Contributor Author

inducer commented Jul 5, 2015

...and here I am, psychoanalyzing LLVM's calling conventions from Python. Well-deserved, too. I'm slightly more motivated now to get this fixed.

@pjaaskel
Copy link
Member

pjaaskel commented Jul 6, 2015

pocl_llvm_get_kernel_arg_metadata() has some code that could be useful. I see it has the type info unimplemented, but in principle you should get that info (only) by looking into the metadata, unless wanting to parse the kernel sources.

@inducer
Copy link
Contributor Author

inducer commented Aug 21, 2015

On a build (of 75d64c6) with assertions, this also appears to cause

python: pocl_llvm_api.cc:570: int pocl_llvm_get_kernel_arg_metadata(const char*, llvm::Module*, cl_kernel): Assertion `has_meta_for_every_arg && "kernel_arg_addr_space meta incomplete"' failed.

Here's the associated backtrace:

#0  0x00007ffff6f28107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff6f294e8 in __GI_abort () at abort.c:89
#2  0x00007ffff6f21226 in __assert_fail_base (fmt=0x7ffff7057d08 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fffe25c24e0 "has_meta_for_every_arg && \"kernel_arg_addr_space meta incomplete\"", file=file@entry=0x7fffe25c2e1d "pocl_llvm_api.cc", 
    line=line@entry=570, 
    function=function@entry=0x7fffe25c45e0 <pocl_llvm_get_kernel_arg_metadata(char const*, llvm::Module*, _cl_kernel*)::__PRETTY_FUNCTION__> "int pocl_llvm_get_kernel_arg_metadata(const char*, llvm::Module*, cl_kernel)") at assert.c:92
#3  0x00007ffff6f212d2 in __GI___assert_fail (assertion=assertion@entry=0x7fffe25c24e0 "has_meta_for_every_arg && \"kernel_arg_addr_space meta incomplete\"", 
    file=file@entry=0x7fffe25c2e1d "pocl_llvm_api.cc", line=line@entry=570, 
    function=function@entry=0x7fffe25c45e0 <pocl_llvm_get_kernel_arg_metadata(char const*, llvm::Module*, _cl_kernel*)::__PRETTY_FUNCTION__> "int pocl_llvm_get_kernel_arg_metadata(const char*, llvm::Module*, cl_kernel)") at assert.c:101
#4  0x00007fffe12d960e in pocl_llvm_get_kernel_arg_metadata (kernel_name=kernel_name@entry=0x7fffe8c93654 "axpb", input=input@entry=0x2f9e280, kernel=kernel@entry=0x320abb0)
    at pocl_llvm_api.cc:570
#5  0x00007fffe12da914 in pocl_llvm_get_kernel_metadata (program=program@entry=0x2b349a0, kernel=kernel@entry=0x320abb0, device_i=device_i@entry=0, 
    kernel_name=kernel_name@entry=0x7fffe8c93654 "axpb", errcode=errcode@entry=0x7fffffffb49c) at pocl_llvm_api.cc:850
#6  0x00007fffe12c488a in POclCreateKernel (program=0x2b349a0, kernel_name=0x7fffe8c93654 "axpb", errcode_ret=0x7fffffffb4e4) at clCreateKernel.c:91
#7  0x00007fffe97d8023 in _call_func<_cl_kernel* (*)(_cl_program*, char const*, int*), 0, 1, 2, _cl_program* const&, char const*&, int*> (args=..., func=<optimized out>)
    at src/c_wrapper/function.h:38
#8  call_tuple<_cl_kernel* (*&)(_cl_program*, char const*, int*), std::tuple<_cl_program* const&, char const*&, int*> > (args=<optimized out>, func=<synthetic pointer>)
    at src/c_wrapper/function.h:49
#9  call<__CLArgGetter, _cl_kernel* (*)(_cl_program*, char const*, int*)> (func=<optimized out>, this=<synthetic pointer>) at src/c_wrapper/function.h:108
#10 clcall<_cl_kernel* (*)(_cl_program*, char const*, int*)> (name=0x7fffe97de4dc "clCreateKernel", func=<optimized out>, this=<synthetic pointer>) at src/c_wrapper/error.h:186
#11 call_guarded<_cl_kernel*, _cl_program*, char const*, int*, program const*&, char const*&> (name=0x7fffe97de4dc "clCreateKernel", func=<optimized out>)
    at src/c_wrapper/error.h:232
#12 operator() (__closure=<synthetic pointer>) at src/c_wrapper/kernel.cpp:97
#13 c_handle_error<create_kernel(clbase**, clobj_t, char const*)::<lambda()> > (func=...) at src/c_wrapper/error.h:271
#14 create_kernel (knl=0x7fffcadb5648, _prog=0x2b34910, name=0x7fffe8c93654 "axpb") at src/c_wrapper/kernel.cpp:98
#15 0x00007fffe9794034 in _cffi_f_create_kernel (self=<optimized out>, args=<optimized out>) at build/temp.linux-x86_64-2.7/pyopencl._cffi.cpp:2338
#16 0x00000000004c9ae5 in PyEval_EvalFrameEx ()
#17 0x00000000004e42d0 in ?? ()
#18 0x0000000000502608 in ?? ()
#19 0x00000000004ca803 in PyEval_EvalFrameEx ()
#20 0x00000000004e42d0 in ?? ()
#21 0x0000000000502608 in ?? ()
#22 0x000000000050124c in ?? ()
#23 0x00000000004ba30b in ?? ()
#24 0x00000000004ca803 in PyEval_EvalFrameEx ()
#25 0x00000000004e42d0 in ?? ()
#26 0x0000000000502608 in ?? ()
#27 0x00000000004be506 in PyObject_CallFunctionObjArgs ()
#28 0x000000000059d8b6 in ?? ()
#29 0x00000000005780a6 in ?? ()
#30 0x00000000004fe278 in ?? ()
#31 0x00000000004c9ae5 in PyEval_EvalFrameEx ()
#32 0x00000000004e42d0 in ?? ()
#33 0x00000000004cd352 in PyEval_EvalFrameEx ()
#34 0x00000000004c8329 in PyEval_EvalCodeEx ()
#35 0x00000000004c9ff6 in PyEval_EvalFrameEx ()
#36 0x00000000004e42d0 in ?? ()
#37 0x00000000004cd352 in PyEval_EvalFrameEx ()
#38 0x00000000004c8329 in PyEval_EvalCodeEx ()
#39 0x00000000004cb577 in PyEval_EvalFrameEx ()
#40 0x00000000004c8329 in PyEval_EvalCodeEx ()
#41 0x00000000004cb577 in PyEval_EvalFrameEx ()
#42 0x00000000004e42d0 in ?? ()
#43 0x00000000004cd352 in PyEval_EvalFrameEx ()
#44 0x00000000004c8329 in PyEval_EvalCodeEx ()
#45 0x00000000004cb577 in PyEval_EvalFrameEx ()
#46 0x00000000004e42d0 in ?? ()
#47 0x0000000000502608 in ?? ()
#48 0x00000000004b34ce in PyObject_Call ()
#49 0x000000000057e2bd in ?? ()
#50 0x00000000005192f9 in PyNumber_Multiply ()
#51 0x00000000004cb07d in PyEval_EvalFrameEx ()
#52 0x00000000004c8329 in PyEval_EvalCodeEx ()
#53 0x00000000004cb577 in PyEval_EvalFrameEx ()
#54 0x00000000004c8329 in PyEval_EvalCodeEx ()
#55 0x00000000004cb577 in PyEval_EvalFrameEx ()
#56 0x00000000004c8329 in PyEval_EvalCodeEx ()
#57 0x000000000053627a in PyRun_StringFlags ()
#58 0x00000000004cf567 in PyEval_EvalFrameEx ()
#59 0x00000000004c8329 in PyEval_EvalCodeEx ()
#60 0x000000000050111f in ?? ()
#61 0x00000000004f6e22 in PyRun_FileExFlags ()
#62 0x00000000004f5f17 in PyRun_SimpleFileExFlags ()
#63 0x0000000000497afd in Py_Main ()
#64 0x00007ffff6f14b45 in __libc_start_main (main=0x497590 <main>, argc=3, argv=0x7fffffffe3e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffe3d8) at libc-start.c:287
#65 0x00000000004974b8 in _start ()

@inducer
Copy link
Contributor Author

inducer commented Feb 21, 2017

Fixed along with #1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants