Improve and refactor softmax layer #24466

WanliZhong · 2023-10-28T17:16:43Z

This PR improves softmax from ficus nn.

Performance Test result (use min value and Muti-threads):

macOS M2

Name of Test	before	after	after vs before (x-factor)
{ 16, 50, 50 }, 0	0.047	0.048	0.98
{ 16, 50, 50 }, 1	0.052	0.075	0.69
{ 16, 50, 50 }, 2	0.367	0.045	8.19
{ 16, 197, 197 }, 0	0.700	0.256	2.73
{ 16, 197, 197 }, 1	0.602	0.368	1.64
{ 16, 197, 197 }, 2	5.706	0.230	24.81
{ 16, 1024, 1024 }, 0	17.143	18.464	0.93
{ 16, 1024, 1024 }, 1	16.001	30.027	0.53
{ 16, 1024, 1024 }, 2	162.174	3.120	51.99

Ubuntu Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

Name of Test	before	after	after vs before (x-factor)
{ 16, 50, 50 }, 0	0.017	0.060	0.29
{ 16, 50, 50 }, 1	0.022	0.058	0.38
{ 16, 50, 50 }, 2	0.198	0.042	4.78
{ 16, 197, 197 }, 0	0.425	0.130	3.26
{ 16, 197, 197 }, 1	0.368	0.674	0.55
{ 16, 197, 197 }, 2	3.281	0.164	20.00
{ 16, 1024, 1024 }, 0	27.985	6.639	4.22
{ 16, 1024, 1024 }, 1	21.230	22.219	0.96
{ 16, 1024, 1024 }, 2	91.406	4.153	22.01

Ubuntu Loongnix

Name of Test	before	after	after vs before (x-factor)
{ 16, 50, 50 }, 0	0.198	0.158	1.25
{ 16, 50, 50 }, 1	0.239	0.259	0.92
{ 16, 50, 50 }, 2	1.036	0.263	3.93
{ 16, 197, 197 }, 0	3.178	0.309	10.27
{ 16, 197, 197 }, 1	3.152	1.032	3.05
{ 16, 197, 197 }, 2	15.053	0.961	15.66
{ 16, 1024, 1024 }, 0	127.870	50.779	2.52
{ 16, 1024, 1024 }, 1	116.085	37.200	3.12
{ 16, 1024, 1024 }, 2	405.589	19.363	20.95

WanliZhong · 2023-10-29T13:07:29Z

The performance test result was updated, the speed increase is very obvious. BTW, I am not sure why windows CI failed, seems like it's not related to this PR.

modules/dnn/src/layers/cpu_kernels/softmax.hpp

modules/dnn/src/layers/cpu_kernels/softmax_kernels.default.hpp

modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp

modules/dnn/src/layers/softmax_layer.cpp

fengyuentau · 2023-10-30T02:58:23Z

Please take a look at the failed log from default Win64:

C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(288): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(290): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(290): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(291): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(291): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(292): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(292): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(293): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(293): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(294): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(294): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(295): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(295): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(312): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(312): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(314): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]
C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(314): error C2088: '[': illegal for union (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]

fengyuentau · 2023-10-30T03:03:07Z

@asmorkalov This build is actually failed but somehow the workflow did not catch a failed signal and it continued: https://github.com/opencv/opencv/actions/runs/6682987045/job/18158738007?pr=24466. It seems if: ${{ always() && steps.build-opencv.outcome == 'success' }} from the workflow file is not always working?

asmorkalov · 2023-10-30T05:57:48Z

Windows:

C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value
C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(288): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator
C:/GHA-OCV-3/_work/opencv/opencv/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(289): error C2676: binary '[': '__m256' does not define this operator or a conversion to a type acceptable to the predefined operator

modules/dnn/perf/perf_layer.cpp

modules/dnn/src/layers/softmax_layer.cpp

asmorkalov · 2023-10-30T08:53:44Z

Windows:

C:/build/precommit_windows64/4.x/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp(276): error C2105: '--' needs l-value (compiling source file C:\build\precommit_windows64\build\modules\dnn\layers\cpu_kernels\softmax_kernels.avx.cpp) [C:\build\precommit_windows64\build\modules\dnn\opencv_dnn_AVX.vcxproj]

WanliZhong · 2023-10-30T08:56:30Z

Thanks @asmorkalov. I found the code will throw error C2105: '--' needs l-value on windows, but I think I don't use -- operator. Let me try to solve it.

asmorkalov · 2023-10-30T09:12:07Z

I just tried armv7 configuration locally. It produces the following warning (ubuntu 16.04):

n file included from /home/ubuntu/Projects/opencv-build/modules/dnn/layers/cpu_kernels/softmax_kernels.neon.cpp:3:0:
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp: In function ‘float cv::dnn::opt_NEON::_calculate_axis(float*, size_t, size_t)’:
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp:247:26: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float maxVal = vmax[0];
                          ^
/home/ubuntu/Projects/opencv/modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp:269:19: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     float s = vs[0] + vs[1] + vs[2] + vs[3];
                   ^
[ 95%] Linking CXX shared library ../../lib/libopencv_dnn.so

WanliZhong · 2023-10-30T09:17:47Z

@asmorkalov That's because the operators [idx], +, -, *, / are not overrided on some platform. I can solve it by copying the result to an array then do this operation.

asmorkalov · 2023-10-30T09:42:35Z

Armv7 (Jetson-tk1) perf results with and without NEON:

Geometric mean (ms)

             Name of Test              dnn-baseline-1 dnn-NEON-1   dnn-NEON-1  
                                                                       vs      
                                                                 dnn-baseline-1
                                                                   (x-factor)  
Softmax_large::Layer_Softmax::OCV/CPU     4610.644     1452.926       3.17     
Softmax_middle::Layer_Softmax::OCV/CPU     27.993       8.684         3.22     
Softmax_small::Layer_Softmax::OCV/CPU      2.483        1.013         2.45

modules/dnn/src/layers/cpu_kernels/softmax.cpp

asmorkalov · 2023-10-30T10:59:45Z

Jetson Tk1 with 2 GBs of RAM:

Note: Google Test filter = Layer_Softmax*
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Layer_Softmax
[ RUN      ] Layer_Softmax.Softmax_small/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=1.15   median=1.14   min=1.13   stddev=0.01 (1.1%))
[       OK ] Layer_Softmax.Softmax_small/0 (22 ms)
[ RUN      ] Layer_Softmax.Softmax_middle/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=100   mean=8.56   median=8.43   min=8.32   stddev=0.52 (6.0%))
[       OK ] Layer_Softmax.Softmax_middle/0 (933 ms)
[ RUN      ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU
/home/ubuntu/Projects/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ubuntu/Projects/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 398131200 bytes in function 'OutOfMemoryError'

params    =     OCV/CPU
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU (1760 ms)
[----------] 3 tests from Layer_Softmax (2717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (2719 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU

WanliZhong · 2023-10-30T11:16:53Z

Jetson Tk1 with 2 GBs of RAM:

Note: Google Test filter = Layer_Softmax*
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Layer_Softmax
[ RUN      ] Layer_Softmax.Softmax_small/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=13   mean=1.15   median=1.14   min=1.13   stddev=0.01 (1.1%))
[       OK ] Layer_Softmax.Softmax_small/0 (22 ms)
[ RUN      ] Layer_Softmax.Softmax_middle/0, where GetParam() = OCV/CPU
[ PERFSTAT ]    (samples=100   mean=8.56   median=8.43   min=8.32   stddev=0.52 (6.0%))
[       OK ] Layer_Softmax.Softmax_middle/0 (933 ms)
[ RUN      ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU
/home/ubuntu/Projects/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ubuntu/Projects/opencv/modules/core/src/alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 398131200 bytes in function 'OutOfMemoryError'

params    =     OCV/CPU
termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 100
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU (1760 ms)
[----------] 3 tests from Layer_Softmax (2717 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (2719 ms total)
[  PASSED  ] 2 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Layer_Softmax.Softmax_large/0, where GetParam() = OCV/CPU

The performance test has a large input with 16x1080x1920x3 and takes 398131200 bytes, it's too large. I think I need to create a smaller one for "large" case.

WanliZhong · 2023-10-30T14:56:39Z

The error on windows because a marco was defined as -2.12194440e-4 and use it as --2.12194440e-4. Others complier will treat it as a positive number, but VS2019 on windows will treat it as --variable , so the error occurred. 😂

modules/dnn/test/test_onnx_importer.cpp

modules/dnn/src/layers/softmax_layer.cpp

modules/dnn/src/layers/cpu_kernels/softmax.hpp

modules/dnn/perf/perf_layer.cpp

modules/dnn/src/layers/cpu_kernels/softmax.cpp

modules/dnn/src/layers/cpu_kernels/softmax_kernels.simd.hpp

vpisarev · 2023-11-01T07:04:17Z

@WanliZhong, excellent job, great acceleration numbers! As we discussed, please, refactor the code to reduce code duplication. Then we will gladly merge it.

WanliZhong · 2023-11-02T06:32:25Z

Update: As discuss with Vadim, I only use the universal intrinsics to accelerate the softmax layer. The results show that even faster than implementing it individually on each platform.

Note: Added performance tests on different axis. The test results show some cases are slower than before, especially with small size softmax and 0 or 1 axis.

modules/dnn/src/layers/softmax_layer.cpp

WanliZhong · 2023-11-02T14:16:53Z

I have no idea why this error occur in some platforms.

/home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:78:32: error: 'cv::hal_baseline::v_float32x4::<unnamed enum> cv::hal_baseline::v_float32x4::nlanes' is private within this context
   78 |     size_t nlanes = v_float32::nlanes;
      |                                ^~~~~~
In file included from /home/ci/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:221,
                 from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.hpp:15,
                 from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:13:
/home/ci/opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:301:12: note: declared private here
  301 |     enum { nlanes = 4 };
      |            ^~~~~~

asmorkalov · 2023-11-03T06:20:28Z

OpenCV migrated to new Universal Intrinsics approach to support scalable intrinsics like RISC-V RVV. The size of vector is not defined in compile time and may be different in runtime. You need to replace:

v_float32::nlanes -> VTraits<v_float32>::vlanes() for loops and other places, where it's applicable
v_float32::nlanes -> VTraits<v_float32>::max_nlanes for local arrays. It defines maximal possible vector size.

modules/dnn/src/layers/cpu_kernels/softmax.cpp

Enable softmax layer vectorization on RISC-V RVV #24510 Related: #24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD

Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD

Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD

Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

WanliZhong added optimization category: dnn category: dnn (onnx) ONNX suport issues in DNN module labels Oct 28, 2023

WanliZhong added this to the 4.9.0 milestone Oct 28, 2023

WanliZhong requested review from vpisarev, fengyuentau and dkurt October 28, 2023 17:16

This comment was marked as resolved.

Sign in to view

fengyuentau reviewed Oct 30, 2023

View reviewed changes

dkurt reviewed Oct 30, 2023

View reviewed changes

modules/dnn/perf/perf_layer.cpp Outdated Show resolved Hide resolved

modules/dnn/perf/perf_layer.cpp Outdated Show resolved Hide resolved

modules/dnn/perf/perf_layer.cpp Show resolved Hide resolved

modules/dnn/src/layers/softmax_layer.cpp Outdated Show resolved Hide resolved

dkurt reviewed Oct 30, 2023

View reviewed changes

modules/dnn/src/layers/cpu_kernels/softmax.cpp Outdated Show resolved Hide resolved

fengyuentau reviewed Oct 31, 2023

View reviewed changes

improve and refactor softmax layer

790da1b

WanliZhong force-pushed the refactor_softmax branch from cbf0474 to 790da1b Compare November 2, 2023 06:26

dkurt reviewed Nov 2, 2023

View reviewed changes

modules/dnn/src/layers/softmax_layer.cpp Outdated Show resolved Hide resolved

fix building error

4c729bd

WanliZhong added 3 commits November 2, 2023 20:44

compatible region layer

928b3f4

fix axisStep when disable SIMD

8ab200d

fix dynamic array

e4b37d3

try to fix error

c6a349a

WanliZhong added 3 commits November 3, 2023 15:05

use nlanes from VTraits

399e92a

move axisBias to srcOffset

e9a8b31

fix bug caused by axisBias

fc77182

fengyuentau reviewed Nov 3, 2023

View reviewed changes

modules/dnn/src/layers/cpu_kernels/softmax.cpp Outdated Show resolved Hide resolved

modules/dnn/src/layers/cpu_kernels/softmax.cpp Outdated Show resolved Hide resolved

WanliZhong added 2 commits November 4, 2023 15:00

remove macro

c49a332

replace #ifdef with #if for CV_SIMD

ac9e410

vpisarev approved these changes Nov 6, 2023

View reviewed changes

vpisarev merged commit ed52f7f into opencv:4.x Nov 6, 2023
26 checks passed

asmorkalov mentioned this pull request Nov 8, 2023

Enable softmax layer vectorization on RISC-V RVV #24510

Merged

6 tasks

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve and refactor softmax layer #24466

Improve and refactor softmax layer #24466

WanliZhong commented Oct 28, 2023 •

edited

Loading

This comment was marked as resolved.

WanliZhong commented Oct 29, 2023 •

edited

Loading

fengyuentau commented Oct 30, 2023

fengyuentau commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023 •

edited

Loading

asmorkalov commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023 •

edited

Loading

WanliZhong commented Oct 30, 2023 •

edited

Loading

vpisarev commented Nov 1, 2023

WanliZhong commented Nov 2, 2023

WanliZhong commented Nov 2, 2023

asmorkalov commented Nov 3, 2023 •

edited

Loading

Improve and refactor softmax layer #24466

Improve and refactor softmax layer #24466

Conversation

WanliZhong commented Oct 28, 2023 • edited Loading

This comment was marked as resolved.

WanliZhong commented Oct 29, 2023 • edited Loading

fengyuentau commented Oct 30, 2023

fengyuentau commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023 • edited Loading

asmorkalov commented Oct 30, 2023

asmorkalov commented Oct 30, 2023

WanliZhong commented Oct 30, 2023 • edited Loading

WanliZhong commented Oct 30, 2023 • edited Loading

vpisarev commented Nov 1, 2023

WanliZhong commented Nov 2, 2023

WanliZhong commented Nov 2, 2023

asmorkalov commented Nov 3, 2023 • edited Loading

WanliZhong commented Oct 28, 2023 •

edited

Loading

WanliZhong commented Oct 29, 2023 •

edited

Loading

WanliZhong commented Oct 30, 2023 •

edited

Loading

WanliZhong commented Oct 30, 2023 •

edited

Loading

WanliZhong commented Oct 30, 2023 •

edited

Loading

asmorkalov commented Nov 3, 2023 •

edited

Loading