New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft) #20203
Conversation
* Update filenames and function names for clBLAS (formerly, clAmdBlas) * Update filenames and function names for clFFT (formerly, clAmdFft) * Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contribution!
Did you measure performance changes after enabling of clFFT?
modules/core/src/opencl/runtime/autogenerated/opencl_clamdblas_impl.hpp
Outdated
Show resolved
Hide resolved
* Update generators to parse recent clBLAS and clFFT library headers * Update generators to be compatible with Python 3 * Re-generate OpenCV's clBLAS and clFFT headers * Update function calls to match names in newly generated headers * Disable (and comment on) teardown code for clBLAS and clFFT
Thanks for the review. I have now pushed changes to fix the generators, re-generate the headers, and disable the teardown code. I have successfully re-tested the changes on Windows 10 and Manjaro Linux. Please review the pull request again. I have run performance comparisons (with v. without the clMath libraries) using perf_opencv_core. As far as I understand:
Thus, I looked at the results from The use of the clMath libraries yields mixed results, ranging from a significant speed-up to a significant slow-down, depending on the device, data size, and data type. Basically, I would say that these optimizations will be valuable in some applications. I have pasted the total times below and I am attaching the full output for various cases. Later (not as part of this pull request), I plan to do tests with additional hardware and with real-world applications. Running on Vega 8 + Ryzen V1605B + Windows 10Without clMath libraries
opencv_perf_core_vega8_win10_without_clmath.txt With clMath libraries
opencv_perf_core_vega8_win10_with_clmath.txt Running on RTX 3080 + Ryzen 5950X + Manjaro LinuxWithout clMath libraries
opencv_perf_core_rtx3080_manjaro_without_clmath.txt With clMath libraries
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
@@ -94,15 +94,15 @@ | |||
numEnabled = readFunctionFilter(fns, filterFileName) | |||
|
|||
functionsFilter = generateFilterNames(fns) | |||
filter_file = open(filterFileName, 'wb') | |||
filter_file = open(filterFileName, 'w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of problem is observed with 'wb'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Python 3, 'wb'
gives TypeError: a bytes-like object is required
when we try to write a string to the file.
'w'
works with both Python 3 and Python 2 strings, and (in our use case) it produces an ASCII-encoded text file.
'wb'
is (in my view) simply wrong here because it implies that we are writing a binary file. We are not; we are writing a text file. Despite being conceptually wrong, 'wb'
may work for writing text files in Python 2 because Python 2 represents strings as byte arrays.
* Renaming *clamdblas* files to *clblas* * Renaming *clamdfft* files to *clfft*
* Update generator to be compatible with Python 3
Thanks for the second review. I have made the changes more consistent by:
I have re-tested successfully. If you see any other issues, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome 👍
Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft) * Fix dynamic loading of clBLAS and clFFT * Update filenames and function names for clBLAS (formerly, clAmdBlas) * Update filenames and function names for clFFT (formerly, clAmdFft) * Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS * Fix generators for clBLAS and clFFT headers * Update generators to parse recent clBLAS and clFFT library headers * Update generators to be compatible with Python 3 * Re-generate OpenCV's clBLAS and clFFT headers * Update function calls to match names in newly generated headers * Disable (and comment on) teardown code for clBLAS and clFFT * Renaming *clamd* files * Renaming *clamdblas* files to *clblas* * Renaming *clamdfft* files to *clfft* * Update generator for CL headers * Update generator to be compatible with Python 3
resolves #5444
Problem
OpenCV attempts to dynamically load the clAmdBlas and clAmdFft libraries using filenames and function names that became outdated in 2013-2014. The issue and the necessary name changes are described in detail by Dr. Fredrik C. Bruhn in his blog post at https://bruhnspace.com/en/bruhnspace-opencv-optimization-project/.
Also, OpenCV's teardown code for clAmdFft is commented out for no apparent reason.
Solution
This pull request updates the dynamic loading code for the clAmdBlas and clAmdFft libraries (now called clBLAS and clFFT) and also uncomments the clAmdFft teardown code.
Tests
I have tested the changes on the following systems:
Without the changes,
opencv_version --opencl
andopencv_perf_core
both report that the libraries are missing:With the changes,
opencv_version --opencl
andopencv_perf_core
both report that the libraries are found:With or without the changes,
opencv_perf_core
runs successfully.