New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WiP] OpenCL support #582
[WiP] OpenCL support #582
Conversation
ctypes wrapping seems to be working on mac os X. It is now possible to call a selected number of functions through the wrapper. Automatic error handling of error code working.
Platform extracts its devices. Devices export their parameters.
It is now possible to build very simple OpenCL programs with this driver.
A new sample that mimics apple's "OpenCL hello world" example. Added whatever was needed for the example to work :)
Better use "array.nbytes" that relying on len(array.data). As array.data is a memory view in python3 that returns number of elements instead of byte length (as it happen on 2.7 where data is a buffer.
- "b" prefix for opencl program string - tweaks to make for a different image format in skimage.data.moon(). Made more robust.
There is a framework now to handle the get info for the different OpenCL objects. They are exported as (read-only) properties and they don't cache results (caching support could be added if needed, on a per-property basis).
Now it is possible to get devices by type using type-specific accessors. No state is preserved, so queries will be repeated each time the property is accessed. That shouldn't be an issue though, and caching can be implemented on an as-needed basis. Also adds error code enums
The will be returned as a python list, querying the length of the array in the same way used by string arguments.
Instead of dumping a numeric error code, it gets translated to a string containing the CL_WHATEVER that describes the error. That symbol is what appears in API descriptions. A function, opencl_strerror(code) handles the conversion and is available.
This makes creation of a call to a kernel in a program very compact.
This allows device memory copying.
Add .size attribute to oclarray.
|
this looks exciting @sklam |
|
This work is blocked pending support of the SPIR 2.0 standard (which is not yet finalized) in OpenCL runtimes. We really need the support for generic pointer address spaces in order for this Numba target to integrate into the general compiler pipeline. |
|
I think it provides a path in the future for OpenCL support in Numba, but it will require the OpenCL 2.1 standard to be approved and implemented. (Historically, it has taken a long time for OpenCL standards to be implemented broadly, even if we ignore NVIDIA, who is typically the slowest since they already have CUDA.) We also require LLVM to add the necessary SPIR-V support, but it sounds like that is already actively being investigated by developers on the LLVM mailing list. |
|
Can we pull the ocl-compiler branch from Siu's repo into Numba (to preserve this bit of history), but then close this PR? We won't be coming back to OpenCL until we see how SPIR-V is implemented in OpenCL 2.1. |
|
The work is preserved as https://github.com/numba/numba/tree/ocl-compiler |
|
@seibert I stumbled across this old issue. In various posts from 2015 you mentioned several blocking requirements. Do you have the time to write a short follow-up with the state of availability of these requirements now (4 years later)? I imagine, other numba users searching for "numba" and "OpenCL" will land here as well and would benefit from a short update. - Though I understand if you do not have the time... |
|
What has changed most with OpenCL is that the current standard has moved from SPIR (which is an LLVM-based IR) to SPIR-V (which is not LLVM at all). Someone could write a Numba target for OpenCL, but would need to confirm the following:
Basically, we're not actively looking at OpenCL support (with all the other things on our todo list), but if someone wants to work on this, we would try to figure out a clear target extension API so that such an extension could develop asynchronous to the core Numba repository. |
|
@seibert Thank you very much for taking the time to explain this so clearly. |
|
Hello, it's 2 year since the last post. But I think now there might be a solution to convert LLVM IR code to SPIR-V using KhronosGroup/SPIRV-LLVM-Translator. And about SPIR-V support, I think it might be possible now as multiple vendor also support Vulkan (both nVidia, and AMD). Moreover, for apple there is a KhronosGroup/MoltenVK project which translate vulkan to metal. So maybe it is now possible to target modern GPU? I could be wrong though, because I'm not sure about the technical detail. If so please correct me |
A quick 2022 update: the target extension API is pretty much complete and the CUDA target uses it, so it should be possible to add out-of-tree support for an OpenCL backend to Numba from 0.56 onwards. Unfortunately it's not really documented, but one could follow the pattern adopted by the CUDA target - the main classes to follow the implementation of would be |
OpenCL support
numba.oclnamespaceOnly tested on Linux + AMD hardware
The lack of generic address space support is making it difficult for high-level programming. Current implementation assume all array in global memory space. The lack of inttoptr/ptrtoint also makes array with non contiguous layout (numpy order 'A') impossible. These limitation poses a barrier to further implementation.
We can provide this as an experimental support. OpenCL 2.0 will have generic address space and may make inttoptr/ptrtoint possible.
TODO: