-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNN: add the Winograd fp16 support #23654
Conversation
@@ -1464,6 +1464,9 @@ CV__DNN_INLINE_NS_BEGIN | |||
/// @sa Net::setPreferableTarget | |||
CV_WRAP Model& setPreferableTarget(dnn::Target targetId); | |||
|
|||
/// @sa Net::enableWinograd | |||
CV_WRAP Model& enableWinograd(bool useWinograd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new function for dnn::Model
, this will fail at default CI API check.
2402b6a
to
9d1dccd
Compare
Meeting notes with Vadim @vpisarev :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code works well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code works well
Hi @Samson-Mayeem , the code only works well on some partial ARM platforms. I'm working on it to support all ARM platforms. |
ad5aeb0
to
942a77a
Compare
excellent job, thank you, Zihao! 👍 |
cmake/checks/cpu_fp16.cpp
Outdated
return (int)dst[0]; | ||
#else | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really should just pass compilation check in that case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with compiling.
The vfmaq_laneq_f16
can only be used when __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
is defined.
The idea is to set the CV_FP16
only when vfmaq_laneq_f16
is usable on ARM platform.
What's your opinion on this part code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a compilation check (check of compiler features and compiler flags).
If compiler is old and doesn't provide __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
but provide "basic" FP16 support then this check should pass. Currently it doesn't try to compile any FP16 code (just return 0
). Finally it would fail during compilation of OpenCV code if there is no __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
.
According to haveFP16_ARM()
, perhaps we need a new FP16_ARMv82
compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp
besides of NEON
and FP16
.
At least it should include definition check in a reverse form:
#ifndef __ARM_FEATURE_FP16_VECTOR_ARITHMETIC
#error "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC must be defined"
#endif
Not sure if we need runtime check for that feature at all (haveFP16_ARM()
).
Anyway runtime checks of CPU features should be performed in system.cpp/HWFeatures (but without direct usage of compiler instructions). Also such detection could be skipped (assumed that baseline is always available) - similar to CV_FP16
:
#if CV_FP16
CV_LOG_INFO(NULL, "- FP16 instructions is enabled via build flags");
have[CV_CPU_FP16] = true;
system.cpp
doesn't support longjmp()
approach for proper detection (as there is baseline compiler flags only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we need a new FP16_ARMv82 compiler feature with separate cmake/checks/cpu_fp16_armv82.cpp besides of NEON and FP16.
I agree with this. But I don't have much time to spend on it recently. I think this patch will have no harm since the CV_FP16
macro is not used before this patch.
How about leaving this into the future to optimize?
Not sure if we need runtime check for that feature at all (haveFP16_ARM()).
The runtime check is supplied for opencv binary. Compiler opencv arm binary once, and run every arm chip.
BTW, I found a flag maybe useful, CV_TRY_NEON_DOTPROD
. It was defined by following:
ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@opencv-alalek, FP16 support without SIMD arithmetic support does not make any sense, since C/C++ is not quite able to handle this type properly. I suggest to modify definition of CV_FP16 on ARM as suggested in this PR.
Hi @asmorkalov, looks like the ARMv7 builder has an issue with the environment.
|
@opencv-alalek Could you take a look? |
@zihaomu Please rebase and fix conflicts after Winograd patch merge. |
the decision is to introduce another symbol for the perfect backward compatibility. I think, it should be something platform-agnostic, like CV_FP16_ARITHM. I will try to do it with alalek's help. |
I will update it today later. |
…nly when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime
* add Winograd FP16 implementation * fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime * fixed some warnings * hopefully fixed winograd on x64 (and maybe other platforms) --------- Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
* add Winograd FP16 implementation * fixed dispatching of FP16 code paths in dnn; use dynamic dispatcher only when NEON_FP16 is enabled in the build and the feature is present in the host CPU at runtime * fixed some warnings * hopefully fixed winograd on x64 (and maybe other platforms) --------- Co-authored-by: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
To add the winograd FP16 compute branch for convolution layer of 3x3 stride 1 case.
Test on M1 chip, 4 threads.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.