-
Notifications
You must be signed in to change notification settings - Fork 253
Description
There is a bug in the Neo driver that prevents the OpenCL runtime from working under application/languages that link to different versions of LLVM than the version support by the OpenCL compiler. This bug shows up in domain-specific languages like Halide (http://halide-lang.org) and TVM (https://github.com/dmlc/tvm) that use LLVM for Jitting, as well as tools like OCLGrind (https://github.com/jrprice/Oclgrind). It’s possible that there are co-existence problems where multiple OpenCL vendors have conflicting LLVM versions under the hood, but this has not been observed.
Reproducing the problem:
Install the latest Neo driver, download Halide and run the GPU sample from here: http://halide-lang.org/tutorials/tutorial_lesson_12_using_the_gpu.html using an OpenCL target.
You will see an error similar to this:
CommandLine Error: Option 'disable-inlined-alloca-merging' registered more than once!
Root Cause:
Halide is built against LLVM 6 and IGC uses LLVM 4. LLVM uses static initializers to register options. When Halide is linked it execution the static initializers in LLVM6 and when Neo->IGC->LLVM4 are dynamically loaded the LLVM4 static initializers are called causing the above error. Commenting out the error trap still results in runtime errors, which is likely caused by symbol preempted from the later version of LLVM (e.g. same symbol names but with LLVM6 bits instead of LLVM4 bits).
Possible fix:
Both the OpenCL and IGC are late bound using dlopen. Adding the RTLD_DEEPBIND flag to the dlopen calls in Neo (which load IGC) and to IGC (where it loads LLVM) should fix the problem. The deep bind scopes symbol resolutions to the local module, separating the LLVM versions and initializers. I’m not sure if there are any problematic side-effects to a deep bind.