DNN: add the Winograd fp16 support #23654

zihaomu · 2023-05-22T07:30:02Z

To add the winograd FP16 compute branch for convolution layer of 3x3 stride 1 case.

Test on M1 chip, 4 threads.

Model Name	4.x (Conv(FP16) + Wino(FP 32))	Conv(FP16) + Wino(FP 16)
ReseNet 50	18.5 ms	15.6 ms (18.5% speed up)

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

zihaomu · 2023-05-22T07:31:35Z

modules/dnn/include/opencv2/dnn/dnn.hpp

@@ -1464,6 +1464,9 @@ CV__DNN_INLINE_NS_BEGIN
         /// @sa Net::setPreferableTarget
         CV_WRAP Model& setPreferableTarget(dnn::Target targetId);

+         /// @sa Net::enableWinograd
+         CV_WRAP Model& enableWinograd(bool useWinograd);


Add a new function for dnn::Model, this will fail at default CI API check.

zihaomu · 2023-05-24T07:04:57Z

Meeting notes with Vadim @vpisarev :

need to use the cv::checkHardwareSupport(CV_CPU_FP16);
need to figure out how to check if CPU supports the instruction of vfmaq_lane_f16.

Samson-Mayeem

code works well

Samson-Mayeem

code works well

zihaomu · 2023-05-31T09:09:31Z

Hi @Samson-Mayeem , the code only works well on some partial ARM platforms. I'm working on it to support all ARM platforms.

modules/dnn/src/layers/cpu_kernels/conv_block.neon.cpp

vpisarev · 2023-06-02T08:10:13Z

excellent job, thank you, Zihao! 👍

opencv-alalek · 2023-06-02T09:08:48Z

cmake/checks/cpu_fp16.cpp

    return (int)dst[0];
+#else
+    return 0;


Do we really should just pass compilation check in that case here?

I'm not very familiar with compiling.
The vfmaq_laneq_f16 can only be used when __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined.

The idea is to set the CV_FP16 only when vfmaq_laneq_f16 is usable on ARM platform.

What's your opinion on this part code?

It is a compilation check (check of compiler features and compiler flags).

If compiler is old and doesn't provide __ARM_FEATURE_FP16_VECTOR_ARITHMETIC but provide "basic" FP16 support then this check should pass. Currently it doesn't try to compile any FP16 code (just return 0). Finally it would fail during compilation of OpenCV code if there is no __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.

According to haveFP16_ARM(), perhaps we need a new FP16_ARMv82 compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp besides of NEON and FP16.

At least it should include definition check in a reverse form:

#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC #error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC must be defined" #endif

Not sure if we need runtime check for that feature at all (haveFP16_ARM()).
Anyway runtime checks of CPU features should be performed in system.cpp/HWFeatures (but without direct usage of compiler instructions). Also such detection could be skipped (assumed that baseline is always available) - similar to CV_FP16:

#if CV_FP16 CV_LOG_INFO(NULL, "- FP16 instructions is enabled via build flags"); have[CV_CPU_FP16] = true;

system.cpp doesn't support longjmp() approach for proper detection (as there is baseline compiler flags only).

perhaps we need a new FP16_ARMv82 compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp besides of NEON and FP16.

I agree with this. But I don't have much time to spend on it recently. I think this patch will have no harm since the CV_FP16 macro is not used before this patch.
How about leaving this into the future to optimize?

Not sure if we need runtime check for that feature at all (haveFP16_ARM()).

The runtime check is supplied for opencv binary. Compiler opencv arm binary once, and run every arm chip.

BTW, I found a flag maybe useful, CV_TRY_NEON_DOTPROD. It was defined by following:

ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")

@opencv-alalek, FP16 support without SIMD arithmetic support does not make any sense, since C/C++ is not quite able to handle this type properly. I suggest to modify definition of CV_FP16 on ARM as suggested in this PR.

zihaomu · 2023-06-02T10:01:04Z

Hi @asmorkalov, looks like the ARMv7 builder has an issue with the environment.

In file included from /build/precommit_armv7/4.x/opencv/modules/python/src2/cv2.cpp:5:0:
/build/precommit_armv7/4.x/opencv/modules/python/src2/cv2.hpp:20:20: fatal error: Python.h: No such file or directory
 #include <Python.h>
                    ^
compilation terminated.
make[2]: *** [modules/python2/CMakeFiles/opencv_python2.dir/__/src2/cv2.cpp.o] Error 1
make[2]: Leaving directory `/build/precommit_armv7/build'
make[1]: *** [modules/python2/CMakeFiles/opencv_python2.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

asmorkalov · 2023-06-02T11:28:24Z

@opencv-alalek Could you take a look?

asmorkalov · 2023-06-09T07:11:49Z

@zihaomu Please rebase and fix conflicts after Winograd patch merge.

vpisarev · 2023-06-09T10:12:00Z

the decision is to introduce another symbol for the perfect backward compatibility. I think, it should be something platform-agnostic, like CV_FP16_ARITHM. I will try to do it with alalek's help.

zihaomu · 2023-06-09T10:26:00Z

@zihaomu Please rebase and fix conflicts after Winograd patch merge.

I will update it today later.

…nly when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime

* add Winograd FP16 implementation * fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime * fixed some warnings * hopefully fixed winograd on x64 (and maybe other platforms) --------- Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>

zihaomu commented May 22, 2023

View reviewed changes

asmorkalov added optimization category: dnn labels May 22, 2023

zihaomu requested a review from vpisarev May 22, 2023 07:32

zihaomu force-pushed the wino_fp16 branch 2 times, most recently from 2402b6a to 9d1dccd Compare May 22, 2023 08:33

zihaomu added this to the 4.8.0 milestone May 24, 2023

Samson-Mayeem reviewed May 30, 2023

View reviewed changes

Samson-Mayeem suggested changes May 30, 2023

View reviewed changes

zihaomu commented May 31, 2023

View reviewed changes

modules/dnn/src/layers/cpu_kernels/conv_block.neon.cpp Outdated Show resolved Hide resolved

zihaomu force-pushed the wino_fp16 branch 3 times, most recently from ad5aeb0 to 942a77a Compare June 2, 2023 05:15

vpisarev approved these changes Jun 2, 2023

View reviewed changes

opencv-alalek reviewed Jun 2, 2023

View reviewed changes

asmorkalov changed the title ~~DNN: add the Winograd fp16 supported.~~ DNN: add the Winograd fp16 support Jun 2, 2023

asmorkalov modified the milestones: 4.8.0, 4.9.0 Jun 9, 2023

zihaomu force-pushed the wino_fp16 branch from ee7f864 to 9d319c8 Compare June 9, 2023 14:45

zihaomu and others added 2 commits November 13, 2023 00:38

add Winograd FP16 implementation

02bd3a1

fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher o…

da9a86f

…nly when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime

vpisarev force-pushed the wino_fp16 branch from 9d319c8 to da9a86f Compare November 18, 2023 22:06

vpisarev added 2 commits November 19, 2023 01:37

fixed some warnings

dada2ef

hopefully fixed winograd on x64 (and maybe other platforms)

34bba00

vpisarev merged commit b913e73 into opencv:4.x Nov 20, 2023
26 checks passed

tomoaki0705 mentioned this pull request Dec 6, 2023

ARMv8 CPU features management is broken for some cases #24588

Closed

4 tasks

opencv-alalek mentioned this pull request Dec 9, 2023

build: reenable fp16 compile in old compiler (armv7) #24673

Merged

6 tasks

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNN: add the Winograd fp16 support #23654

DNN: add the Winograd fp16 support #23654

zihaomu commented May 22, 2023 •

edited

Loading

zihaomu May 22, 2023

zihaomu commented May 24, 2023 •

edited

Loading

Samson-Mayeem left a comment

Samson-Mayeem left a comment

zihaomu commented May 31, 2023

vpisarev commented Jun 2, 2023

opencv-alalek Jun 2, 2023

zihaomu Jun 2, 2023 •

edited

Loading

opencv-alalek Jun 2, 2023

zihaomu Jun 4, 2023 •

edited

Loading

vpisarev Jun 6, 2023

zihaomu commented Jun 2, 2023

asmorkalov commented Jun 2, 2023

asmorkalov commented Jun 9, 2023

vpisarev commented Jun 9, 2023

zihaomu commented Jun 9, 2023

DNN: add the Winograd fp16 support #23654

DNN: add the Winograd fp16 support #23654

Conversation

zihaomu commented May 22, 2023 • edited Loading

Pull Request Readiness Checklist

zihaomu May 22, 2023

Choose a reason for hiding this comment

zihaomu commented May 24, 2023 • edited Loading

Samson-Mayeem left a comment

Choose a reason for hiding this comment

Samson-Mayeem left a comment

Choose a reason for hiding this comment

zihaomu commented May 31, 2023

vpisarev commented Jun 2, 2023

opencv-alalek Jun 2, 2023

Choose a reason for hiding this comment

zihaomu Jun 2, 2023 • edited Loading

Choose a reason for hiding this comment

opencv-alalek Jun 2, 2023

Choose a reason for hiding this comment

zihaomu Jun 4, 2023 • edited Loading

Choose a reason for hiding this comment

vpisarev Jun 6, 2023

Choose a reason for hiding this comment

zihaomu commented Jun 2, 2023

asmorkalov commented Jun 2, 2023

asmorkalov commented Jun 9, 2023

vpisarev commented Jun 9, 2023

zihaomu commented Jun 9, 2023

zihaomu commented May 22, 2023 •

edited

Loading

zihaomu commented May 24, 2023 •

edited

Loading

zihaomu Jun 2, 2023 •

edited

Loading

zihaomu Jun 4, 2023 •

edited

Loading