-
-
Notifications
You must be signed in to change notification settings - Fork 55.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and refactor softmax layer #24466
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
The performance test result was updated, the speed increase is very obvious. BTW, I am not sure why windows CI failed, seems like it's not related to this PR. |
Please take a look at the failed log from
|
@asmorkalov This build is actually failed but somehow the workflow did not catch a failed signal and it continued: https://github.com/opencv/opencv/actions/runs/6682987045/job/18158738007?pr=24466. It seems |
Windows:
|
Windows:
|
Thanks @asmorkalov. I found the code will throw |
I just tried armv7 configuration locally. It produces the following warning (ubuntu 16.04):
|
@asmorkalov That's because the operators |
Armv7 (Jetson-tk1) perf results with and without NEON:
|
Jetson Tk1 with 2 GBs of RAM:
|
The performance test has a large input with |
The error on windows because a marco was defined as |
@WanliZhong, excellent job, great acceleration numbers! As we discussed, please, refactor the code to reduce code duplication. Then we will gladly merge it. |
cbf0474
to
790da1b
Compare
Update: As discuss with Vadim, I only use the universal intrinsics to accelerate the softmax layer. The results show that even faster than implementing it individually on each platform. Note: Added performance tests on different axis. The test results show some cases are slower than before, especially with small size softmax and 0 or 1 axis. |
I have no idea why this error occur in some platforms. /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:78:32: error: 'cv::hal_baseline::v_float32x4::<unnamed enum> cv::hal_baseline::v_float32x4::nlanes' is private within this context
78 | size_t nlanes = v_float32::nlanes;
| ^~~~~~
In file included from /home/ci/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:221,
from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.hpp:15,
from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:13:
/home/ci/opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:301:12: note: declared private here
301 | enum { nlanes = 4 };
| ^~~~~~ |
OpenCV migrated to new Universal Intrinsics approach to support scalable intrinsics like RISC-V RVV. The size of vector is not defined in compile time and may be different in runtime. You need to replace:
|
Enable softmax layer vectorization on RISC-V RVV #24510 Related: #24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
This PR improves softmax from ficus nn.
Performance Test result (use min value and Muti-threads):
macOS M2
Ubuntu Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
Ubuntu Loongnix