Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SIFT for arm64/Apple silicon #20204

Merged

Conversation

Developer-Ecosystem-Engineering
Copy link
Contributor

@Developer-Ecosystem-Engineering Developer-Ecosystem-Engineering commented Jun 2, 2021

Reduce branch density by collapsing compares.

Performance improvements from 1.03 to 1.53 with existing tests

% ./modules/ts/misc/summary.py -u us -m median ./build-{3.4,improv-sift}/*.xml | grep SIFT
(x-factor)
detect::OCL_feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 50256.333 48789.417 1.03
detect::OCL_feature2d::(SIFT_DEFAULT, "stitching/a3.png") 64591.083 52250.542 1.24
detect::OCL_feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 132094.667 99417.667 1.33
detect::feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 77521.167 60114.083 1.29
detect::feature2d::(SIFT_DEFAULT, "stitching/a3.png") 70179.083 55914.917 1.26
detect::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 137359.208 101112.250 1.36
detectAndExtract::OCL_feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 99694.083 67476.833 1.48
detectAndExtract::OCL_feature2d::(SIFT_DEFAULT, "stitching/a3.png") 101392.958 71199.708 1.42
detectAndExtract::OCL_feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 203133.500 133395.083 1.52
detectAndExtract::feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 98465.083 66492.083 1.48
detectAndExtract::feature2d::(SIFT_DEFAULT, "stitching/a3.png") 100793.042 68811.333 1.46
detectAndExtract::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 203328.083 132630.917 1.53
extract::OCL_feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 26723.042 19980.250 1.34
extract::OCL_feature2d::(SIFT_DEFAULT, "stitching/a3.png") 19655.625 15591.500 1.26
extract::OCL_feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 59479.917 44881.208 1.33
extract::feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 27984.542 19793.625 1.41
extract::feature2d::(SIFT_DEFAULT, "stitching/a3.png") 21164.250 16235.250 1.30
extract::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg")
extract::feature2d::(SIFT_DEFAULT, "stitching/s2.jpg") 58240.000 44191.833 1.32

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=ARMv7,ARMv8,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Investigating build failures

@tomoaki0705
Copy link
Contributor

TIPS 1 : rather filtering with grep, I recommend to filter using --gtest_filter option that you pass to the gtest executable file

./opnecv_test_features2d --gtest_filter=*SIFT*

TIPS 2: use -o markdown option of summary.py

summary.py -o markdown -m median build-before.xml build-after.xml

Will give you something like this

Median
|Name of Test|build-before|build-after|build-before vs build-after(x-factor)|
|---|:-:|:-:|:-:|
|detect::OCL_feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png")|50.256 ms|48.789417 ms|1.03|
|detect::OCL_feature2d::(SIFT_DEFAULT, "stitching/a3.png")|64.591083ms| 52.250542ms | 1.24 |

Which is WAY MORE READABLE on Github

Median

Name of Test build-before build-after build-before vs build-after(x-factor)
detect::OCL_feature2d::(SIFT_DEFAULT, "cv/detectors_descriptors_evaluation/images_datasets/leuven/img1.png") 50.256 ms 48.789 ms 1.03
detect::OCL_feature2d::(SIFT_DEFAULT, "stitching/a3.png") 64.591ms 52.250ms 1.24

Also, the commit message

only enable for NEON
is a red flag for me.

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Thanks for the tips, will reformat.

only enable for NEON
is a red flag for me.

We tested full vector on coffeelake, marginal up (and down).
Tested 128 bit vector and saw even lower benefits.

Since the benefit was clear on NEON, thats where we landed.

@asmorkalov asmorkalov added category: features2d optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Jun 4, 2021
modules/features2d/src/sift.simd.hpp.orig Outdated Show resolved Hide resolved
modules/features2d/src/sift.simd.hpp Outdated Show resolved Hide resolved
@vpisarev
Copy link
Contributor

vpisarev commented Jun 4, 2021

@Developer-Ecosystem-Engineering, thank you for the contribution! This work on improving OpenCV@M1 performance is brilliant!

Note, however, that except for the kernels in DNN module, which are few and really critical, we do not accept native optimizations any longer. It would be just impossible for our tiny team to maintain all those branches. Please, rewrite the native NEON code using our universal intrinsics.

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

@Developer-Ecosystem-Engineering, thank you for the contribution! This work on improving OpenCV@M1 performance is brilliant!

Note, however, that except for the kernels in DNN module, which are few and really critical, we do not accept native optimizations any longer. It would be just impossible for our tiny team to maintain all those branches. Please, rewrite the native NEON code using our universal intrinsics.

An updated patch is available with it rewritten.

@vpisarev
Copy link
Contributor

vpisarev commented Jun 8, 2021

@Developer-Ecosystem-Engineering, thank you! the patch is almost ready to be merged. Please, fix the compile warnings on Windows (see pullrequest.opencv.org) and squash commits into one.

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

@Developer-Ecosystem-Engineering, thank you! the patch is almost ready to be merged. Please, fix the compile warnings on Windows (see pullrequest.opencv.org) and squash commits into one.

  • Squashed into a single commit
  • Fix windows build errors
  • Use OpenCV universal intrinsics
  • Remove CV_NEON/CV_SIMD guards

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

modules/features2d/src/sift.simd.hpp Outdated Show resolved Hide resolved
modules/features2d/src/sift.simd.hpp Outdated Show resolved Hide resolved
modules/features2d/src/sift.simd.hpp Outdated Show resolved Hide resolved
modules/features2d/src/sift.simd.hpp Outdated Show resolved Hide resolved
- Reduce branch density by collapsing compares.
- Fix windows build errors
- Use OpenCV universal intrinsics
- Use v_check_any and v_signmask as requested
@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Modifications requested by @alalek have been integrated and re-squashed.

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

/cc @vpisarev

@opencv-pushbot opencv-pushbot merged commit 5091e64 into opencv:master Jun 22, 2021
@alalek alalek mentioned this pull request Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: features2d optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants