DNN: optimize the speed of general Depth-wise #23952

zihaomu · 2023-07-09T16:21:51Z

Try to solve the issue: #23941

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

zihaomu · 2023-07-10T02:35:26Z

@WanliZhong, Can you check if this patch fixes your issue?

WanliZhong · 2023-07-10T06:47:08Z

I have tested this patch on arm and x86 (use the min value). Palm detection model has many 5x5 depth wise layers. Handpose and person detection models have several 5x5 depth wise layers. So the effect is evident on palm detection model.

Intel chip:

Model	4.7	4.8	this patch
palm detection	5.91ms	12.1 ms	6.41ms
handpose estimation	4.36ms	3.65ms	3.21ms
person detection	12.4ms	11.0ms	9.07ms

M2 chip:

Model	4.7	4.8	this patch
palm detection	8.35ms	18.76 ms	9.44ms
handpose estimation	4.6ms	7.04ms	4.91ms
person detection	10.83ms	14.06ms	10.74ms

vpisarev · 2023-07-10T07:10:57Z

@asmorkalov, I believe, 4.8.1 should be released as soon as possible with this and a few other fixes that you mentioned (Python-related)

opencv-alalek · 2023-07-10T08:55:44Z

Related performance tests should be updated to avoid similar incidents in the future.

zihaomu · 2023-07-10T14:36:10Z

@wanli will be assigned to add such a performance test.

WanliZhong · 2023-07-11T13:55:03Z

Perf test (use min value) result on MacBook Air M2:

Name of Test	480-1th	481-1th	481-1th vs 480-1th (x-factor)
conv::Conv::(GFLOPS=0.472, K=[5 x 5], IN={1, 32, 96, 96}, OCN=32, G=32, P=[2 x 2], BIAS, OCV/CPU)	3.721	1.921	1.94
conv::Conv::(GFLOPS=0.472, K=[5 x 5], IN={1, 32, 96, 96}, OCN=32, G=32, P=[2 x 2], BIAS, OCV/CPU_FP16)	2.358	1.869	1.26

zihaomu · 2023-07-12T03:15:08Z

Looks like the default CI has a failure irrelated to this PR.

asmorkalov · 2023-07-14T12:59:22Z

Perf results for i5-2500K CPU @ 3.30GHz (No AVX2)

                                                                 Name of Test                                                                     4.x       fix       fix
                                                                                                                                               depthwise depthwise depthwise.
                                                                                                                                                  5x5       5x5       5x5
                                                                                                                                                   1         1         1
                                                                                                                                                                       vs
                                                                                                                                                                      4.x
                                                                                                                                                                   depthwise.
                                                                                                                                                                      5x5
                                                                                                                                                                       1
conv::Conv::(GFLOPS=0.472, K=[5 x 5], IN={1, 32, 96, 96}, OCN=32, G=32, P=[2 x 2], BIAS, OCV/CPU)                                               5.111     1.068      4.79

asmorkalov · 2023-07-14T14:32:47Z

Perf results for ARM v7 (Jetson tk1):

                                                                 Name of Test                                                                     4.x       fix       fix
                                                                                                                                               depthwise depthwise depthwise.
                                                                                                                                                  5x5       5x5       5x5
                                                                                                                                                   1         1         1
                                                                                                                                                                       vs
                                                                                                                                                                      4.x
                                                                                                                                                                   depthwise.
                                                                                                                                                                      5x5
                                                                                                                                                                       1
conv::Conv::(GFLOPS=0.472, K=[5 x 5], IN={1, 32, 96, 96}, OCN=32, G=32, P=[2 x 2], BIAS, OCV/CPU)                                               64.849    18.178      3.57

asmorkalov

👍

DNN: optimize the speed of general Depth-wise #23952 Try to solve the issue: #23941 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

DNN: optimize the speed of general Depth-wise opencv#23952 Try to solve the issue: opencv#23941 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

zihaomu linked an issue Jul 9, 2023 that may be closed by this pull request

Depthwise Convolution layer with 5x5 kernel much slower than 4.7.0 #23941

Closed

4 tasks

zihaomu added category: dnn optimization labels Jul 9, 2023

zihaomu requested a review from WanliZhong July 9, 2023 16:22

zihaomu requested a review from asmorkalov July 10, 2023 02:35

speed up depthwise 5x5

ff7db7f

zihaomu changed the title ~~DNN: optimize the speed of general Depth-wise on ARM platform~~ DNN: optimize the speed of general Depth-wise Jul 10, 2023

zihaomu force-pushed the fix_depth_conv_5x5 branch from 27c0760 to ff7db7f Compare July 10, 2023 06:05

WanliZhong mentioned this pull request Jul 10, 2023

bump version to 4.8.0 opencv/opencv_zoo#190

Merged

2 tasks

vpisarev self-requested a review July 10, 2023 07:08

vpisarev approved these changes Jul 10, 2023

View reviewed changes

opencv-alalek added the pr: needs test New functionality requires minimal tests set label Jul 10, 2023

add perf test for 5x depth wise convolution

8ccd5e5

asmorkalov removed the pr: needs test New functionality requires minimal tests set label Jul 11, 2023

zihaomu mentioned this pull request Jul 14, 2023

OpenCV4.8.0 DNN inference speed reduced by 50%. #23911

Open

4 tasks

asmorkalov approved these changes Jul 14, 2023

View reviewed changes

asmorkalov merged commit 1920993 into opencv:4.x Jul 14, 2023
21 checks passed

asmorkalov mentioned this pull request Jul 27, 2023

(5.x) Merge 4.x #24070

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNN: optimize the speed of general Depth-wise #23952

DNN: optimize the speed of general Depth-wise #23952

zihaomu commented Jul 9, 2023 •

edited

zihaomu commented Jul 10, 2023

WanliZhong commented Jul 10, 2023 •

edited

vpisarev commented Jul 10, 2023

opencv-alalek commented Jul 10, 2023

zihaomu commented Jul 10, 2023

WanliZhong commented Jul 11, 2023 •

edited

zihaomu commented Jul 12, 2023

asmorkalov commented Jul 14, 2023 •

edited

asmorkalov commented Jul 14, 2023

asmorkalov left a comment

DNN: optimize the speed of general Depth-wise #23952

DNN: optimize the speed of general Depth-wise #23952

Conversation

zihaomu commented Jul 9, 2023 • edited

Pull Request Readiness Checklist

zihaomu commented Jul 10, 2023

WanliZhong commented Jul 10, 2023 • edited

vpisarev commented Jul 10, 2023

opencv-alalek commented Jul 10, 2023

zihaomu commented Jul 10, 2023

WanliZhong commented Jul 11, 2023 • edited

zihaomu commented Jul 12, 2023

asmorkalov commented Jul 14, 2023 • edited

asmorkalov commented Jul 14, 2023

asmorkalov left a comment

Choose a reason for hiding this comment

zihaomu commented Jul 9, 2023 •

edited

WanliZhong commented Jul 10, 2023 •

edited

WanliZhong commented Jul 11, 2023 •

edited

asmorkalov commented Jul 14, 2023 •

edited