-
-
Notifications
You must be signed in to change notification settings - Fork 56.4k
core: ARM64 loop unrolling in kmeans to improve Weighted Filter performance #27596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
modules/core/src/kmeans.cpp
Outdated
| { | ||
| double p = (double)rng*sum0; | ||
| int ci = 0; | ||
| #if defined(_M_ARM64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if CV_ENABLE_UNROLLED should be checked too.
|
Hi @opencv-alalek , just following up on this PR. This loop unrolling has resulted in significant performance improvements in the WeightedMedianFilterTest function from the ximgproc module. |
|
The patch shows speedup on other platforms too. |
|
Mac M1 (single thread): |
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
This PR improves the performance of the Weighted Filter function from the ximgproc module on Windows on ARM64.
The optimization is achieved by unrolling two performance-critical loops in the generateCentersPP function in modules/core/src/kmeans.cpp, which is internally used by the Weighted Filter function.
The unrolling is enabled only for ARM64 builds using #if defined(_M_ARM64) guards to preserve compatibility and maintain performance on other architectures.
Performance Improvements: