Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft) #20203

Merged
merged 6 commits into from Jun 7, 2021

Conversation

JoeHowse
Copy link
Contributor

@JoeHowse JoeHowse commented Jun 2, 2021

resolves #5444

Problem

OpenCV attempts to dynamically load the clAmdBlas and clAmdFft libraries using filenames and function names that became outdated in 2013-2014. The issue and the necessary name changes are described in detail by Dr. Fredrik C. Bruhn in his blog post at https://bruhnspace.com/en/bruhnspace-opencv-optimization-project/.

Also, OpenCV's teardown code for clAmdFft is commented out for no apparent reason.

Solution

This pull request updates the dynamic loading code for the clAmdBlas and clAmdFft libraries (now called clBLAS and clFFT) and also uncomments the clAmdFft teardown code.

Tests

I have tested the changes on the following systems:

  • An up-to-date Manjaro Linux (x64) system with clBLAS 2.12.0 and clFFT 2.12.2 from the AUR repository
  • An up-to-date Windows 10 (x64) system with pre-built clBLAS 2.12.0 and clFFT 2.12.2 releases from the clMathLibraries GitHub repositories (https://github.com/clMathLibraries)

Without the changes, opencv_version --opencl and opencv_perf_core both report that the libraries are missing:

    Has AMD Blas = No
    Has AMD Fft = No

With the changes, opencv_version --opencl and opencv_perf_core both report that the libraries are found:

    Has AMD Blas = Yes
    Has AMD Fft = Yes

With or without the changes, opencv_perf_core runs successfully.

* Update filenames and function names for clBLAS (formerly, clAmdBlas)

* Update filenames and function names for clFFT (formerly, clAmdFft)

* Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS
Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

Did you measure performance changes after enabling of clFFT?

modules/core/src/ocl.cpp Outdated Show resolved Hide resolved
* Update generators to parse recent clBLAS and clFFT library headers

* Update generators to be compatible with Python 3

* Re-generate OpenCV's clBLAS and clFFT headers

* Update function calls to match names in newly generated headers

* Disable (and comment on) teardown code for clBLAS and clFFT
@JoeHowse
Copy link
Contributor Author

JoeHowse commented Jun 4, 2021

Thanks for the review. I have now pushed changes to fix the generators, re-generate the headers, and disable the teardown code. I have successfully re-tested the changes on Windows 10 and Manjaro Linux. Please review the pull request again.

I have run performance comparisons (with v. without the clMath libraries) using perf_opencv_core. As far as I understand:

  • OpenCV uses clBLAS for GEMM
  • OpenCV uses clFFT for DFT

Thus, I looked at the results from OCL_GemmFixture_Gemm and OCL_DftFixture_Dft.

The use of the clMath libraries yields mixed results, ranging from a significant speed-up to a significant slow-down, depending on the device, data size, and data type. Basically, I would say that these optimizations will be valuable in some applications. I have pasted the total times below and I am attaching the full output for various cases. Later (not as part of this pull request), I plan to do tests with additional hardware and with real-world applications.

Running on Vega 8 + Ryzen V1605B + Windows 10

Without clMath libraries

[----------] 144 tests from OCL_DftFixture_Dft (33134 ms total)
[----------] 24 tests from OCL_GemmFixture_Gemm (39603 ms total)

opencv_perf_core_vega8_win10_without_clmath.txt

With clMath libraries

[----------] 144 tests from OCL_DftFixture_Dft (80982 ms total)
[----------] 24 tests from OCL_GemmFixture_Gemm (25820 ms total)

opencv_perf_core_vega8_win10_with_clmath.txt

Running on RTX 3080 + Ryzen 5950X + Manjaro Linux

Without clMath libraries

[----------] 144 tests from OCL_DftFixture_Dft (4526 ms total)
[----------] 24 tests from OCL_GemmFixture_Gemm (1700 ms total)

opencv_perf_core_rtx3080_manjaro_without_clmath.txt

With clMath libraries

[----------] 144 tests from OCL_DftFixture_Dft (4074 ms total)
[----------] 24 tests from OCL_GemmFixture_Gemm (2279 ms total)

opencv_perf_core_rtx3080_manjaro_with_clmath.txt

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

@@ -94,15 +94,15 @@
numEnabled = readFunctionFilter(fns, filterFileName)

functionsFilter = generateFilterNames(fns)
filter_file = open(filterFileName, 'wb')
filter_file = open(filterFileName, 'w')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of problem is observed with 'wb'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Python 3, 'wb' gives TypeError: a bytes-like object is required when we try to write a string to the file.

'w' works with both Python 3 and Python 2 strings, and (in our use case) it produces an ASCII-encoded text file.

'wb' is (in my view) simply wrong here because it implies that we are writing a binary file. We are not; we are writing a text file. Despite being conceptually wrong, 'wb' may work for writing text files in Python 2 because Python 2 represents strings as byte arrays.

* Renaming *clamdblas* files to *clblas*

* Renaming *clamdfft* files to *clfft*
* Update generator to be compatible with Python 3
@JoeHowse
Copy link
Contributor Author

JoeHowse commented Jun 7, 2021

Thanks for the second review. I have made the changes more consistent by:

  • Renaming *clamblas* files to *clblas*
  • Renaming *clamdfft* files to *clfft*
  • Changing 'wb' to 'w' in parser_cl.py as well. This change makes the parsers compatible with Python 3, and does not break Python 2 compatibility.

I have re-tested successfully. If you see any other issues, please let me know.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 👍

@alalek alalek merged commit 3418323 into opencv:master Jun 7, 2021
@alalek alalek mentioned this pull request Jun 13, 2021
@JoeHowse JoeHowse deleted the clMath-patches branch June 20, 2021 13:38
@alalek alalek mentioned this pull request Oct 15, 2021
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Fix dynamic loading of clBLAS and clFFT (formerly, clAmdBlas and clAmdFft)

* Fix dynamic loading of clBLAS and clFFT

* Update filenames and function names for clBLAS (formerly, clAmdBlas)

* Update filenames and function names for clFFT (formerly, clAmdFft)

* Uncomment teardown of clFFT; tear down clFFT in same way as clBLAS

* Fix generators for clBLAS and clFFT headers

* Update generators to parse recent clBLAS and clFFT library headers

* Update generators to be compatible with Python 3

* Re-generate OpenCV's clBLAS and clFFT headers

* Update function calls to match names in newly generated headers

* Disable (and comment on) teardown code for clBLAS and clFFT

* Renaming *clamd* files

* Renaming *clamdblas* files to *clblas*

* Renaming *clamdfft* files to *clfft*

* Update generator for CL headers

* Update generator to be compatible with Python 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clFFT support instead of clAmdFft request
3 participants