Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and refactor softmax layer #24466

Merged
merged 11 commits into from
Nov 6, 2023
Merged

Conversation

WanliZhong
Copy link
Member

@WanliZhong WanliZhong commented Oct 28, 2023

This PR improves softmax from ficus nn.

Performance Test result (use min value and Muti-threads):

macOS M2

Name of Test before after after vs before (x-factor)
{ 16, 50, 50 }, 0 0.047 0.048 0.98
{ 16, 50, 50 }, 1 0.052 0.075 0.69
{ 16, 50, 50 }, 2 0.367 0.045 8.19
{ 16, 197, 197 }, 0 0.700 0.256 2.73
{ 16, 197, 197 }, 1 0.602 0.368 1.64
{ 16, 197, 197 }, 2 5.706 0.230 24.81
{ 16, 1024, 1024 }, 0 17.143 18.464 0.93
{ 16, 1024, 1024 }, 1 16.001 30.027 0.53
{ 16, 1024, 1024 }, 2 162.174 3.120 51.99

Ubuntu Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

Name of Test before after after vs before (x-factor)
{ 16, 50, 50 }, 0 0.017 0.060 0.29
{ 16, 50, 50 }, 1 0.022 0.058 0.38
{ 16, 50, 50 }, 2 0.198 0.042 4.78
{ 16, 197, 197 }, 0 0.425 0.130 3.26
{ 16, 197, 197 }, 1 0.368 0.674 0.55
{ 16, 197, 197 }, 2 3.281 0.164 20.00
{ 16, 1024, 1024 }, 0 27.985 6.639 4.22
{ 16, 1024, 1024 }, 1 21.230 22.219 0.96
{ 16, 1024, 1024 }, 2 91.406 4.153 22.01

Ubuntu Loongnix

Name of Test before after after vs before (x-factor)
{ 16, 50, 50 }, 0 0.198 0.158 1.25
{ 16, 50, 50 }, 1 0.239 0.259 0.92
{ 16, 50, 50 }, 2 1.036 0.263 3.93
{ 16, 197, 197 }, 0 3.178 0.309 10.27
{ 16, 197, 197 }, 1 3.152 1.032 3.05
{ 16, 197, 197 }, 2 15.053 0.961 15.66
{ 16, 1024, 1024 }, 0 127.870 50.779 2.52
{ 16, 1024, 1024 }, 1 116.085 37.200 3.12
{ 16, 1024, 1024 }, 2 405.589 19.363 20.95

@WanliZhong WanliZhong added this to the 4.9.0 milestone Oct 28, 2023
@WanliZhong

This comment was marked as resolved.

@WanliZhong
Copy link
Member Author

WanliZhong commented Oct 29, 2023

The performance test result was updated, the speed increase is very obvious. BTW, I am not sure why windows CI failed, seems like it's not related to this PR.

@fengyuentau
Copy link
Member

Please take a look at the failed log from default Win64:

C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(288): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(290): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(290): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(291): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(291): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(292): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(292): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(293): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(293): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(294): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(294): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(295): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(295): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(312): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(312): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(314): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(314): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]

@fengyuentau
Copy link
Member

@asmorkalov This build is actually failed but somehow the workflow did not catch a failed signal and it continued: https://github.com/opencv/opencv/actions/runs/6682987045/job/18158738007?pr=24466. It seems if: ${{ always() && steps.build-opencv.outcome == 'success' }} from the workflow file is not always working?

@asmorkalov
Copy link
Contributor

Windows:

C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value
C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(288): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator
C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator

modules/dnn/perf/perf_layer.cpp Outdated Show resolved Hide resolved
modules/dnn/perf/perf_layer.cpp Outdated Show resolved Hide resolved
modules/dnn/perf/perf_layer.cpp Show resolved Hide resolved
modules/dnn/src/layers/softmax_layer.cpp Outdated Show resolved Hide resolved
@asmorkalov
Copy link
Contributor

Windows:

C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]

@WanliZhong
Copy link
Member Author

Thanks @asmorkalov. I found the code will throw error C2105: '--' needs l-value on windows, but I think I don't use -- operator. Let me try to solve it.

@asmorkalov
Copy link
Contributor

I just tried armv7 configuration locally. It produces the following warning (ubuntu 16.04):

n file included from /home/ubuntu/Projects/opencv-build/modules/dnn/layers/cpu_kernels/softmax_kernels.neon.cpp:3:0:
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp: In function ‘float cv::dnn::opt_NEON::_calculate_axis(float*, size_t, size_t)’:
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp:247:26: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float maxVal = vmax[0];
                          ^
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp:269:19: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float s = vs[0] + vs[1] + vs[2] + vs[3];
                   ^
[ 95%] Linking CXX shared library ../../lib/libopencv_dnn.so

@WanliZhong
Copy link
Member Author

WanliZhong commented Oct 30, 2023

@asmorkalov That's because the operators [idx], +, -, *, / are not overrided on some platform. I can solve it by copying the result to an array then do this operation.

@asmorkalov
Copy link
Contributor

Armv7 (Jetson-tk1) perf results with and without NEON:

Geometric mean (ms)

             Name of Test              dnn-baseline-1 dnn-NEON-1   dnn-NEON-1  
                                                                       vs      
                                                                 dnn-baseline-1
                                                                   (x-factor)  
Softmax_large::Layer_Softmax::OCV/CPU     4610.644     1452.926       3.17     
Softmax_middle::Layer_Softmax::OCV/CPU     27.993       8.684         3.22     
Softmax_small::Layer_Softmax::OCV/CPU      2.483        1.013         2.45  

@asmorkalov
Copy link
Contributor

Jetson Tk1 with 2 GBs of RAM:

Note: Google Test filter = Layer_Softmax*
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Layer_Softmax
[ RUN      ] Layer_Softmax.Softmax_small/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=1.15   median=1.14   min=1.13   stddev=0.01 (1.1%))
[       OK ] Layer_Softmax.Softmax_small/0 (22 ms)
[ RUN      ] Layer_Softmax.Softmax_middle/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=100   mean=8.56   median=8.43   min=8.32   stddev=0.52 (6.0%))
[       OK ] Layer_Softmax.Softmax_middle/0 (933 ms)
[ RUN      ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU
/home/ubuntu/Projects/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ubuntu/Projects/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 398131200 bytes in function 'OutOfMemoryError'

params    =     OCV/CPU
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU (1760 ms)
[----------] 3 tests from Layer_Softmax (2717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (2719 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU

@WanliZhong
Copy link
Member Author

WanliZhong commented Oct 30, 2023

Jetson Tk1 with 2 GBs of RAM:

Note: Google Test filter = Layer_Softmax*
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Layer_Softmax
[ RUN      ] Layer_Softmax.Softmax_small/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=1.15   median=1.14   min=1.13   stddev=0.01 (1.1%))
[       OK ] Layer_Softmax.Softmax_small/0 (22 ms)
[ RUN      ] Layer_Softmax.Softmax_middle/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=100   mean=8.56   median=8.43   min=8.32   stddev=0.52 (6.0%))
[       OK ] Layer_Softmax.Softmax_middle/0 (933 ms)
[ RUN      ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU
/home/ubuntu/Projects/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ubuntu/Projects/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 398131200 bytes in function 'OutOfMemoryError'

params    =     OCV/CPU
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU (1760 ms)
[----------] 3 tests from Layer_Softmax (2717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (2719 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU

The performance test has a large input with 16x1080x1920x3 and takes 398131200 bytes, it's too large. I think I need to create a smaller one for "large" case.

@WanliZhong
Copy link
Member Author

WanliZhong commented Oct 30, 2023

The error on windows because a marco was defined as -2.12194440e-4 and use it as --2.12194440e-4. Others complier will treat it as a positive number, but VS2019 on windows will treat it as --variable , so the error occurred. 😂

modules/dnn/test/test_onnx_importer.cpp Outdated Show resolved Hide resolved
modules/dnn/src/layers/softmax_layer.cpp Outdated Show resolved Hide resolved
modules/dnn/src/layers/cpu_kernels/softmax.hpp Outdated Show resolved Hide resolved
modules/dnn/perf/perf_layer.cpp Outdated Show resolved Hide resolved
modules/dnn/src/layers/cpu_kernels/softmax.cpp Outdated Show resolved Hide resolved
modules/dnn/src/layers/cpu_kernels/softmax.cpp Outdated Show resolved Hide resolved
@vpisarev
Copy link
Contributor

vpisarev commented Nov 1, 2023

@WanliZhong, excellent job, great acceleration numbers! As we discussed, please, refactor the code to reduce code duplication. Then we will gladly merge it.

@WanliZhong
Copy link
Member Author

Update: As discuss with Vadim, I only use the universal intrinsics to accelerate the softmax layer. The results show that even faster than implementing it individually on each platform.

Note: Added performance tests on different axis. The test results show some cases are slower than before, especially with small size softmax and 0 or 1 axis.

@WanliZhong
Copy link
Member Author

I have no idea why this error occur in some platforms.

/home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:78:32: error: 'cv::hal_baseline::v_float32x4::<unnamed enum> cv::hal_baseline::v_float32x4::nlanes' is private within this context
   78 |     size_t nlanes = v_float32::nlanes;
      |                                ^~~~~~
In file included from /home/ci/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:221,
                 from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.hpp:15,
                 from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:13:
/home/ci/opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:301:12: note: declared private here
  301 |     enum { nlanes = 4 };
      |            ^~~~~~

@asmorkalov
Copy link
Contributor

asmorkalov commented Nov 3, 2023

OpenCV migrated to new Universal Intrinsics approach to support scalable intrinsics like RISC-V RVV. The size of vector is not defined in compile time and may be different in runtime. You need to replace:

  • v_float32::nlanes -> VTraits<v_float32>::vlanes() for loops and other places, where it's applicable
  • v_float32::nlanes -> VTraits<v_float32>::max_nlanes for local arrays. It defines maximal possible vector size.

@vpisarev vpisarev merged commit ed52f7f into opencv:4.x Nov 6, 2023
26 checks passed
asmorkalov added a commit that referenced this pull request Nov 11, 2023
Enable softmax layer vectorization on RISC-V RVV #24510 

Related: #24466

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
* improve and refactor softmax layer

* fix building error

* compatible region layer

* fix axisStep when disable SIMD

* fix dynamic array

* try to fix error

* use nlanes from VTraits

* move axisBias to srcOffset

* fix bug caused by axisBias

* remove macro

* replace #ifdef with #if for CV_SIMD
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Enable softmax layer vectorization on RISC-V RVV opencv#24510 

Related: opencv#24466

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
* improve and refactor softmax layer

* fix building error

* compatible region layer

* fix axisStep when disable SIMD

* fix dynamic array

* try to fix error

* use nlanes from VTraits

* move axisBias to srcOffset

* fix bug caused by axisBias

* remove macro

* replace #ifdef with #if for CV_SIMD
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Enable softmax layer vectorization on RISC-V RVV opencv#24510 

Related: opencv#24466

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
* improve and refactor softmax layer

* fix building error

* compatible region layer

* fix axisStep when disable SIMD

* fix dynamic array

* try to fix error

* use nlanes from VTraits

* move axisBias to srcOffset

* fix bug caused by axisBias

* remove macro

* replace #ifdef with #if for CV_SIMD
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Enable softmax layer vectorization on RISC-V RVV opencv#24510 

Related: opencv#24466

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants