Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNN: add the Winograd fp16 support #23654

Merged
merged 4 commits into from
Nov 20, 2023
Merged

DNN: add the Winograd fp16 support #23654

merged 4 commits into from
Nov 20, 2023

Conversation

zihaomu
Copy link
Member

@zihaomu zihaomu commented May 22, 2023

To add the winograd FP16 compute branch for convolution layer of 3x3 stride 1 case.

Test on M1 chip, 4 threads.

Model Name 4.x (Conv(FP16) + Wino(FP 32)) Conv(FP16) + Wino(FP 16)
ReseNet 50 18.5 ms 15.6 ms (18.5% speed up)

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@@ -1464,6 +1464,9 @@ CV__DNN_INLINE_NS_BEGIN
/// @sa Net::setPreferableTarget
CV_WRAP Model& setPreferableTarget(dnn::Target targetId);

/// @sa Net::enableWinograd
CV_WRAP Model& enableWinograd(bool useWinograd);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new function for dnn::Model, this will fail at default CI API check.

@zihaomu
Copy link
Member Author

zihaomu commented May 24, 2023

Meeting notes with Vadim @vpisarev :

  • need to use the cv::checkHardwareSupport(CV_CPU_FP16);
  • need to figure out how to check if CPU supports the instruction of vfmaq_lane_f16.

@zihaomu zihaomu added this to the 4.8.0 milestone May 24, 2023
Copy link

@Samson-Mayeem Samson-Mayeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code works well

Copy link

@Samson-Mayeem Samson-Mayeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code works well

@zihaomu
Copy link
Member Author

zihaomu commented May 31, 2023

Hi @Samson-Mayeem , the code only works well on some partial ARM platforms. I'm working on it to support all ARM platforms.

@zihaomu zihaomu force-pushed the wino_fp16 branch 3 times, most recently from ad5aeb0 to 942a77a Compare June 2, 2023 05:15
@vpisarev
Copy link
Contributor

vpisarev commented Jun 2, 2023

excellent job, thank you, Zihao! 👍

return (int)dst[0];
#else
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really should just pass compilation check in that case here?

Copy link
Member Author

@zihaomu zihaomu Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with compiling.
The vfmaq_laneq_f16 can only be used when __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined.

The idea is to set the CV_FP16 only when vfmaq_laneq_f16 is usable on ARM platform.

What's your opinion on this part code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a compilation check (check of compiler features and compiler flags).

If compiler is old and doesn't provide __ARM_FEATURE_FP16_VECTOR_ARITHMETIC but provide "basic" FP16 support then this check should pass. Currently it doesn't try to compile any FP16 code (just return 0). Finally it would fail during compilation of OpenCV code if there is no __ARM_FEATURE_FP16_VECTOR_ARITHMETIC.

According to haveFP16_ARM(), perhaps we need a new FP16_ARMv82 compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp besides of NEON and FP16.

At least it should include definition check in a reverse form:

#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
#error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC must be defined"
#endif

Not sure if we need runtime check for that feature at all (haveFP16_ARM()).
Anyway runtime checks of CPU features should be performed in system.cpp/HWFeatures (but without direct usage of compiler instructions). Also such detection could be skipped (assumed that baseline is always available) - similar to CV_FP16:

        #if CV_FP16
        CV_LOG_INFO(NULL, "- FP16 instructions is enabled via build flags");
        have[CV_CPU_FP16] = true;

system.cpp doesn't support longjmp() approach for proper detection (as there is baseline compiler flags only).

Copy link
Member Author

@zihaomu zihaomu Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we need a new FP16_ARMv82 compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp besides of NEON and FP16.

I agree with this. But I don't have much time to spend on it recently. I think this patch will have no harm since the CV_FP16 macro is not used before this patch.
How about leaving this into the future to optimize?

Not sure if we need runtime check for that feature at all (haveFP16_ARM()).

The runtime check is supplied for opencv binary. Compiler opencv arm binary once, and run every arm chip.

BTW, I found a flag maybe useful, CV_TRY_NEON_DOTPROD. It was defined by following:

ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@opencv-alalek, FP16 support without SIMD arithmetic support does not make any sense, since C/C++ is not quite able to handle this type properly. I suggest to modify definition of CV_FP16 on ARM as suggested in this PR.

@asmorkalov asmorkalov changed the title DNN: add the Winograd fp16 supported. DNN: add the Winograd fp16 support Jun 2, 2023
@zihaomu
Copy link
Member Author

zihaomu commented Jun 2, 2023

Hi @asmorkalov, looks like the ARMv7 builder has an issue with the environment.

In file included from /build/precommit_armv7/4.x/opencv/modules/python/src2/cv2.cpp:5:0:
/build/precommit_armv7/4.x/opencv/modules/python/src2/cv2.hpp:20:20: fatal error: Python.h: No such file or directory
 #include <Python.h>
                    ^
compilation terminated.
make[2]: *** [modules/python2/CMakeFiles/opencv_python2.dir/__/src2/cv2.cpp.o] Error 1
make[2]: Leaving directory `/build/precommit_armv7/build'
make[1]: *** [modules/python2/CMakeFiles/opencv_python2.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

@asmorkalov
Copy link
Contributor

@opencv-alalek Could you take a look?

@asmorkalov
Copy link
Contributor

@zihaomu Please rebase and fix conflicts after Winograd patch merge.

@asmorkalov asmorkalov modified the milestones: 4.8.0, 4.9.0 Jun 9, 2023
@vpisarev
Copy link
Contributor

vpisarev commented Jun 9, 2023

the decision is to introduce another symbol for the perfect backward compatibility. I think, it should be something platform-agnostic, like CV_FP16_ARITHM. I will try to do it with alalek's help.

@zihaomu
Copy link
Member Author

zihaomu commented Jun 9, 2023

@zihaomu Please rebase and fix conflicts after Winograd patch merge.

I will update it today later.

zihaomu and others added 2 commits November 13, 2023 00:38
…nly when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime
@vpisarev vpisarev merged commit b913e73 into opencv:4.x Nov 20, 2023
26 checks passed
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
* add Winograd FP16 implementation

* fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime

* fixed some warnings

* hopefully fixed winograd on x64 (and maybe other platforms)

---------

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
* add Winograd FP16 implementation

* fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime

* fixed some warnings

* hopefully fixed winograd on x64 (and maybe other platforms)

---------

Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants