PoCL is a conformant implementation (for CPU and Level Zero GPU targets) of the OpenCL 3.0 standard which can be easily adapted for new targets.
This section contains instructions for building PoCL in its default configuration and a subset of driver backends. You can find the full build instructions including a list of available options in the install guide.
In order to build PoCL, you need the following support libraries and tools:
- Latest released version of LLVM & Clang
- development files for LLVM & Clang + their transitive dependencies
(e.g.
libclang-dev,libclang-cpp-dev,libllvm-dev,zlib1g-dev,libtinfo-dev...) - CMake 3.15 or newer
- GNU make or ninja
- Optional: pkg-config
- Optional: hwloc v1.0 or newer (e.g.
libhwloc-dev) - Optional (but enabled by default): python3 (for support of LLVM bitcode with SPIR target)
- Optional: llvm-spirv (version-compatible with LLVM) and spirv-tools (required for SPIR-V support in CPU / CUDA; Vulkan driver supports SPIR-V through clspv)
For more details, consult the install guide.
Building PoCL follows the usual CMake build steps. Note however, that PoCL can be used from the build directory (without installing it system-wide).
PoCL supports several backend drivers, with different levels of maturity in terms of received testing, reliability and available features.
CPU/x86_64 is continuously tested to pass CTS, and is also able to pass >99% of all CTS tests when built with Thread or Address sanitizers. CPU driver is also tested on RISCV and ARM64; CPU driver on ARM32, i386, PPC, S390x is not tested or supported. We won't prevent building on these architectures, but we don't actively support them currently.
CTS pass rate reflects the expected pass rate of OpenCL-CTS tests when PoCL is compiled with ENABLE_CONFORMANCE=ON setting.
| Driver | Maturity | CTS pass rate | Dev. OpenCL ver. | input SPIR-V |
|---|---|---|---|---|
| CPU/x86_64 | very high | 100% | 3.0 | 1.4 |
| CPU/ARM64 | high | >95% | 3.0 | 1.4 |
| CPU/RISCV | high | >99% | 3.0 | 1.4 |
| LevelZero | high | >99% | 3.0 | as LZ runtime |
| CUDA | low | 3.0 | 1.2 | |
| OpenASIP | low | 1.2 | none | |
| Vulkan | low | 3.0 | ExecModel=Shader only |
π’ : Supported with all hardware & LLVM versions
π‘ : Partially supported, see notes
π΄ : Unsupported
empty cell : Unknown status
Some extensions are available at Platform level:
cl_khr_icd- if compiled with ENABLE_ICD=1cl_khr_create_command_queuecl_pocl_content_sizecl_ext_buffer_device_address
Note that Remote devices pass-through most of their extensions, with a few exceptions; these are marked Unsupported in the table.
| Extension | CPU device | Level Zero | CUDA | OpenASIP | Remote |
|---|---|---|---|---|---|
| cl_exp_tensor | π‘ 2οΈβ£ | π‘ 1οΈβ£ | |||
| cl_exp_defined_builtin_kernels | π‘ 2οΈβ£ | π‘ 1οΈβ£ | |||
| cl_ext_buffer_device_address | π’ | π‘ 1οΈβ£ | π΄ | ||
| cl_ext_float_atomics | π’ | π‘ 1οΈβ£ | π’ | ||
| cl_intel_command_queue_families | π΄ | ||||
| cl_intel_device_attribute_query | π΄ | π‘ 1οΈβ£ | |||
| cl_intel_required_subgroup_size | π‘ 2οΈβ£ | ||||
| cl_intel_split_work_group_barrier | π΄ | π‘ 2οΈβ£ | |||
| cl_intel_spirv_subgroups | π΄ | π‘ 1οΈβ£ | |||
| cl_intel_subgroups | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_intel_subgroups_short | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_intel_subgroups_char | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_intel_subgroups_long | π΄ | π‘ 2οΈβ£ | |||
| cl_intel_subgroup_local_block_io | π΄ | π‘ 2οΈβ£ | |||
| cl_intel_unified_shared_memory | π’ | π‘ 1οΈβ£ | π΄ | ||
| cl_khr_3d_image_writes | π’ | π‘ 2οΈβ£ | |||
| cl_khr_byte_addressable_store | π’ | π’ | π’ | ||
| cl_khr_device_uuid | π’ | π’ | π΄ | ||
| cl_khr_extended_bit_ops | π‘ 6οΈβ£ | ||||
| cl_khr_global_int32_base_atomics | π’ | π’ | π’ | ||
| cl_khr_global_int32_extended_atomics | π’ | π’ | π’ | ||
| cl_khr_local_int32_base_atomics | π’ | π’ | π’ | ||
| cl_khr_local_int32_extended_atomics | π’ | π’ | π’ | ||
| cl_khr_int64_base_atomics | π’ | π‘ 2οΈβ£ 1οΈβ£ | π’ | ||
| cl_khr_int64_extended_atomics | π’ | π‘ 2οΈβ£ 1οΈβ£ | π’ | ||
| cl_khr_suggested_local_work_size | π’ | ||||
| cl_khr_pci_bus_info | π΄ | π‘ 1οΈβ£ | |||
| cl_khr_depth_images | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_integer_dot_product | π’ | π‘ 1οΈβ£ | |||
| cl_khr_command_buffer | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_khr_command_buffer_multi_device | π‘ 2οΈβ£ | ||||
| cl_khr_command_buffer_mutable_dispatch | π‘ 2οΈβ£ | ||||
| cl_khr_subgroups | π‘ 2οΈβ£ | π‘ 1οΈβ£ | π‘ 2οΈβ£ | ||
| cl_khr_subgroup_ballot | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_shuffle | π‘ 2οΈβ£ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_shuffle_relative | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_extended_types | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_non_uniform_arithmetic | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_non_uniform_vote | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_subgroup_clustered_reduce | π΄ | π‘ 2οΈβ£ | |||
| cl_khr_il_program | π‘ 3οΈβ£ | π‘ 3οΈβ£ | π‘ 3οΈβ£ | ||
| cl_khr_spir | π΄ | π΄ | π΄ | π΄ | π΄ |
| cl_khr_spirv_queries | π‘ 3οΈβ£ | π‘ 3οΈβ£ | π‘ 3οΈβ£ | ||
| cl_khr_spirv_no_integer_wrap_decoration | π’ | π’ | |||
| cl_khr_spirv_linkonce_odr | π’ | π‘ 1οΈβ£ | |||
| cl_khr_fp16 | π‘ 4οΈβ£ | π‘ 2οΈβ£ 1οΈβ£ | π‘ 7οΈβ£ | ||
| cl_khr_fp64 | π‘ 5οΈβ£ | π‘ 2οΈβ£ 1οΈβ£ | π’ | ||
| cl_nv_device_attribute_query | π΄ | π΄ | π’ | π΄ | |
| cl_pocl_svm_rect | π‘ 2οΈβ£ | ||||
| cl_pocl_command_buffer_svm | π‘ 2οΈβ£ | ||||
| cl_pocl_command_buffer_host_buffer | π‘ 2οΈβ£ |
Some of these have prequisites (e.g. for __opencl_c_ext_fp64_local_atomic_add requires cl_khr_fp64 & cl_ext_float_atomics), these must be additionally supported by the device.
| Features | CPU device | Level Zero | CUDA | OpenASIP | Remote |
|---|---|---|---|---|---|
| __opencl_c_images | π’ | π‘ 1οΈβ£ | |||
| __opencl_c_3d_image_writes | π’ | π‘ 1οΈβ£ | |||
| __opencl_c_atomic_order_acq_rel | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_atomic_order_seq_cst | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_atomic_scope_device | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_atomic_scope_all_devices | π’ | π‘ 1οΈβ£ | |||
| __opencl_c_generic_address_space | π’ | π’ | π’ | ||
| __opencl_c_work_group_collective_functions | π’ | π’ | |||
| __opencl_c_integer_dot_product_input_4x8bit | π’ | π‘ 2οΈβ£ 1οΈβ£ | |||
| __opencl_c_integer_dot_product_input_4x8bit_packed | π’ | π‘ 2οΈβ£ 1οΈβ£ | |||
| __opencl_c_subgroups | π‘ 2οΈβ£ | π‘ 2οΈβ£ 1οΈβ£ | π‘ 2οΈβ£ | ||
| __opencl_c_read_write_images | π‘ 2οΈβ£ | π‘ 1οΈβ£ | |||
| __opencl_c_program_scope_global_variables | π‘ 2οΈβ£ | π‘ 2οΈβ£ | π’ | ||
| __opencl_c_ext_fp32_global_atomic_add | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp32_local_atomic_add | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp32_global_atomic_min_max | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp32_local_atomic_min_max | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp64_global_atomic_add | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp64_local_atomic_add | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp64_global_atomic_min_max | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_ext_fp64_local_atomic_min_max | π’ | π‘ 1οΈβ£ | π’ | ||
| __opencl_c_work_group_collective_functions | π΄ | π’ |
- Availability depends on Hardware and Runtime (LevelZero, CUDA) support; if both are available, the extensions/features are enabled by default.
- These extensions are only enabled when ENABLE_CONFORMANCE=OFF, because they're incomplete or fail some corner-cases or similar.
- These extensions are supported when PoCL is compiled with SPIR-V support.
- The
cl_khr_fp16extension is enabled on CPU if all of these are met:- both the host & device compilers support the required type (_Float16) and can emulate / execute operations on the type
- Note: GCC only supports _Float16 since version 12
- LLVM >= 19, ENABLE_CONFORMANCE=OFF, Linux, CpuArch != i386
- The
cl_khr_fp64extension is enabled by default on all CPU architectures, unless explicitly disabled. - The
cl_khr_extended_bit_opsis only supported with LLVM 20+. - The
cl_khr_fp16is supported on CUDA devices with Compute Capability >= 6.0 only.
π· Achieved status of OpenCL conformant implementation
πΆ Tested in CI extensively, including OpenCL-CTS tests
π’ : Tested in CI
π‘ : Should work, but is untested
π΄ : Unsupported
| CPU device | LLVM 18 | LLVM 19 | LLVM 20 | LLVM 21 | LLVM 22 |
|---|---|---|---|---|---|
| x86-64 | π· | π’ | π’ | πΆ | πΆ |
| ARM64 | π‘ | π‘ | π‘ | π‘ | π’ |
| i686 | π‘ | π‘ | π‘ | π‘ | π‘ |
| ARM32 | π‘ | π‘ | π‘ | π‘ | π‘ |
| RISC-V | π‘ | π‘ | π‘ | π‘ | π‘ |
| PowerPC | π‘ | π‘ | π‘ | π‘ | π‘ |
| GPU device | LLVM 18 | LLVM 19 | LLVM 20 | LLVM 21 | LLVM 22 |
|---|---|---|---|---|---|
| CUDA SM5.0 | π‘ | π‘ | π’ | π΄ | π’ |
| CUDA SM other than 5.0 | π‘ | π‘ | π‘ | π΄ | π‘ |
| Level Zero | π‘ | π‘ | π’ | πΆ | π’ |
| Vulkan | π’ | π΄ | π΄ | π΄ | π΄ |
Note: CUDA with LLVM 21 is broken due to a bug in Clang (llvm/llvm-project#154772).
| Special device | LLVM 18 | LLVM 19 | LLVM 20 | LLVM 21 | LLVM 22 |
|---|---|---|---|---|---|
| OpenASIP | π΄ | π΄ | π΄ | π’ | π΄ |
| Remote | π’ | π’ | π’ | π’ | π‘ |
| CPU device | LLVM 18 | LLVM 19 | LLVM 20 | LLVM 21 | LLVM 22 |
|---|---|---|---|---|---|
| Apple Silicon | π‘ | π‘ | π’ | π’ | π‘ |
| Intel CPU | π‘ | π΄ | π΄ | π΄ | π΄ |
| CPU device | LLVM 18 | LLVM 19 | LLVM 20 | LLVM 21 | LLVM 22 |
|---|---|---|---|---|---|
| MinGW / x86-64 | π‘ | π’ | π‘ | π‘ | π‘ |
| MSVC / x86-64 | π‘ | π’ | π’ | π‘ | π‘ |
PoCL with CPU device support can be found on many linux distribution managers.
See
PoCL with CUDA driver support for Linux x86_64, aarch64 and ppc64le
can be found on conda-forge distribution and can be installed with
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh # install mambaforge
To install pocl with cuda driver
mamba install pocl-cuda
To install all drivers
mamba install pocl
PoCL with CPU driver support Intel and Apple Silicon chips can be found on homebrew and can be installed with
brew install pocl
Note that this installs an ICD loader from KhronoGroup and the builtin OpenCL implementation will be invisible when your application is linked to this loader.
PoCL with CPU driver support Intel and Apple Silicon chips can be found on conda-forge distribution and can be installed with
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh
To install the CPU driver
mamba install pocl
Note that this installs an ICD loader from KhronosGroup and the builtin OpenCL implementation will be invisible when your application is linked to this loader. To make both pocl and the builtin OpenCL implementaiton visible, do
mamba install pocl ocl_icd_wrapper_apple
PoCL is distributed under the terms of the MIT license. Contributions are expected to be made with the same terms.