-
-
Notifications
You must be signed in to change notification settings - Fork 55.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SIFT for arm64/Apple silicon #20204
Improve SIFT for arm64/Apple silicon #20204
Conversation
Investigating build failures |
TIPS 1 : rather filtering with
TIPS 2: use
Will give you something like this
Which is WAY MORE READABLE on Github Median
Also, the commit message
|
Thanks for the tips, will reformat.
We tested full vector on coffeelake, marginal up (and down). Since the benefit was clear on NEON, thats where we landed. |
@Developer-Ecosystem-Engineering, thank you for the contribution! This work on improving OpenCV@M1 performance is brilliant! Note, however, that except for the kernels in DNN module, which are few and really critical, we do not accept native optimizations any longer. It would be just impossible for our tiny team to maintain all those branches. Please, rewrite the native NEON code using our universal intrinsics. |
An updated patch is available with it rewritten. |
@Developer-Ecosystem-Engineering, thank you! the patch is almost ready to be merged. Please, fix the compile warnings on Windows (see pullrequest.opencv.org) and squash commits into one. |
3506d96
to
cb12f86
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contribution!
- Reduce branch density by collapsing compares. - Fix windows build errors - Use OpenCV universal intrinsics - Use v_check_any and v_signmask as requested
cb12f86
to
9557b9f
Compare
Modifications requested by @alalek have been integrated and re-squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
/cc @vpisarev
Reduce branch density by collapsing compares.
Performance improvements from 1.03 to 1.53 with existing tests
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.