-
Notifications
You must be signed in to change notification settings - Fork 248
Description
Describe the bug
Launching a kernel via PyOpenCL is 20 times slower than launching the same kernel with the corresponding OpenCL functions via ctypes.
To Reproduce
I discovered this problem while trying to implement bitonic sort. Here is a demonstration which first invokes the kernel with PyOpenCL's call syntax and then switches to calling the same kernel directly via the OpenCL functions loaded from libOpenCL.so using ctypes.
Output with Intel Core i5-2400 CPU:
231.444 ms with PyOpenCL dispatch method
221.348 ms with PyOpenCL dispatch method
221.905 ms with PyOpenCL dispatch method
220.082 ms with PyOpenCL dispatch method
219.206 ms with PyOpenCL dispatch method
223.196 ms with PyOpenCL dispatch method
246.064 ms with PyOpenCL dispatch method
252.709 ms with PyOpenCL dispatch method
255.384 ms with PyOpenCL dispatch method
249.199 ms with PyOpenCL dispatch method
8.495 ms with ctypes dispatch method
8.350 ms with ctypes dispatch method
8.540 ms with ctypes dispatch method
7.898 ms with ctypes dispatch method
7.979 ms with ctypes dispatch method
7.343 ms with ctypes dispatch method
7.300 ms with ctypes dispatch method
9.271 ms with ctypes dispatch method
7.459 ms with ctypes dispatch method
7.111 ms with ctypes dispatch method
Output with NVIDIA V100 GPU:
25.175 ms with PyOpenCL dispatch method
24.560 ms with PyOpenCL dispatch method
24.169 ms with PyOpenCL dispatch method
23.987 ms with PyOpenCL dispatch method
24.396 ms with PyOpenCL dispatch method
23.612 ms with PyOpenCL dispatch method
23.937 ms with PyOpenCL dispatch method
24.184 ms with PyOpenCL dispatch method
23.541 ms with PyOpenCL dispatch method
23.887 ms with PyOpenCL dispatch method
1.951 ms with ctypes dispatch method
1.872 ms with ctypes dispatch method
1.876 ms with ctypes dispatch method
1.885 ms with ctypes dispatch method
1.879 ms with ctypes dispatch method
1.872 ms with ctypes dispatch method
1.914 ms with ctypes dispatch method
1.962 ms with ctypes dispatch method
1.869 ms with ctypes dispatch method
1.863 ms with ctypes dispatch method
Expected behavior
Should be faster. 🚀
Environment (please complete the following information):
Computer 1:
- OS: Debian 12
- ICD Loader and version: 2.3.1-1
- ICD and version: pocl 1.5
- CPU/GPU: Intel Core i5-2400 CPU
- Python version: 3.10.12
- PyOpenCL version: 2024.2.7
Computer 2:
- OS: Debian 12
- ICD Loader and version: 2.3.1-1
- ICD and version: NVIDIA driver 535.247.01
- CPU/GPU: NVIDIA V100 GPU
- Python version: 3.12.3
- PyOpenCL version: 2025.1
Additional context
A similar issue had been created in 2016, but the answer links to a now non-existent mailing list. Unfortunately, the mailing list had not been archived on https://web.archive.org