-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GAPI Fluid: SIMD Div kernel. #20914
GAPI Fluid: SIMD Div kernel. #20914
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (besides assert for unsupported types comment in div_hal)
@@ -743,6 +782,7 @@ GAPI_FLUID_KERNEL(GFluidDiv, cv::gapi::core::GDiv, false) | |||
BINARY_(uchar , short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | |||
BINARY_(uchar , float, float, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | |||
BINARY_( short, short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); | |||
BINARY_(ushort, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not for this PR:
maybe in future it would be nice to have something:
GAPI_KERNEL(...)
...
#define KERNEL_XXX_SUPPORTED_TYPES ushort, int, char,...
static void run(...) {
run_dispatcher<KERNEL_XXX_SUPPORTED_TYPES>(...);
}
template <class ....Types>
run_dispatcher(...) {
bool expander[] { (run_impl<Type>(), true)...}
}
template<class SupportedType>
run_impl(...) {
BINARY_(SupportedType, ...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably yes, however for this purpose it's need to rework the entire Fluid backend. It will take a certain amount of time. As you know the allocation of resources is handled by @dmatveev , so this issue should be discussed with him.
525c9a3
to
eace9c7
Compare
cv::divide(in_mat1, in_mat2, out_mat_ocv, dtype); | ||
|
||
cv::divide(in_mat1, in_mat2, out_mat_ocv, scale, dtype); | ||
//out_mat_ocv.size(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//out_mat_ocv.size().
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
@@ -0,0 +1,427 @@ | |||
// This file is part of OpenCV project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.simd.hpp
These files are used for dynamic dispatching between multiple instruction sets.
Check:
- documentation / tutorial here: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options
- some of these PRs
- how it is used
gfluidimgproc_func.simd.hpp
in G-API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Reworked. I've applied dynamic dispatching for gfluidcore_func.simd.hpp
the same as it has been applied for gfluidimgproc_func.simd.hpp
eace9c7
to
54e5744
Compare
6e7af53
to
fd6a2fe
Compare
fd6a2fe
to
d69ae64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you for update 👍
CV_ALWAYS_INLINE void v_store_select(short* dst, const v_int16&, const v_int16&, | ||
const v_int32& res1, const v_int32& res2) | ||
{ | ||
vx_store(dst, v_pack(res1, res2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v_store_select
v_pack
Do we need select somewhere here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Reworked. This lines were done deliberately as a workaround for a known issue in OCV.
Since one more workaround has been added to the DivPerfTestFluid
test, this workaround can be reworked.
template<> struct vector_type_of<ushort> { using type = v_uint16; }; | ||
template<> struct vector_type_of<short> { using type = v_int16; }; | ||
|
||
CV_ALWAYS_INLINE v_float32 v_load_f32(const float* in) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v_load_f32
It makes sense to add "gapi_v_" prefix to avoid confusion with OpenCV SIMD functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. However "gapi_v_" too long. I replaced "v" prefix to "vg" prefix.
//This condition need to workaround bug in OpenCV. | ||
//It reinitializes divider matrix without zero values. | ||
if (dtype == CV_16S && type != CV_16S) | ||
cv::randu(in_mat2, cv::Scalar::all(1), cv::Scalar::all(255)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, There is similar problem in cv::divide()
with dst=8S (e.g., src=8U).
Probably accurate condition should be dtype != type
( && dtype != CV_32F
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, fluid's Div kernel doesn't support 8S type at all.
For supported types:
Since bug observed only when DST_Type == 16S and SRC_Type != 16S
(both condition together), I think accurate condition is
(dtype == 16S) && (dtype != type)
This condition has already been applied in the last PR's update.
For all other combinations of DST and SRC types, we must check handling of division by zero cases with this test.
d69ae64
to
e54fafe
Compare
@@ -770,7 +770,10 @@ GAPI_EXPORTS GMat mulC(const GScalar& multiplier, const GMat& src, int ddepth = | |||
The function divides one matrix by another: | |||
\f[\texttt{dst(I) = saturate(src1(I)*scale/src2(I))}\f] | |||
|
|||
When src2(I) is zero, dst(I) will also be zero. Different channels of | |||
For integer types when src2(I) is zero, dst(I) will also be zero. | |||
Floating point case returns Inf/NaN (according to IEEE). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR resolve #13147 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I haven't heard about this issue earlier. However, I can say with confidence that during testing with DivPerfTestFluid
test, behavior of the Div fluid's kernel is fully consistent with the behavior of cv::divide()
operation. All CI checks successfully passed. The exception is the behavior is caused with known issue in OpenCV (dtype== CV_16S && type != dtype
). For mentioned OCV's issue workaround has already added to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all CI checks successfully passed, I can assume that mentioned issue in PR #13147 was successfully resolved.
static inline | ||
typename std::enable_if<!std::is_same<DST, float>::value, DST>::type | ||
div(SRC1 x, SRC2 y, float scale=1) | ||
{ | ||
// like OpenCV: returns 0, if y=0 | ||
// like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0 | ||
auto result = y? scale * x / y: 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, We saw tests failures for "Linux Debug" builder only.
This is strange because SIMD optimized builders should fail too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto result = scale * x / y; | ||
return saturate<DST>(result, rintf); | ||
} | ||
|
||
template<typename DST, typename SRC1, typename SRC2> | ||
static inline DST divr(SRC1 x, SRC2 y, float scale=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
divr
has the same flow, but it is still not updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within current task this operation wasn't and won't be updated. I've added SIMD for Div kernel only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, divr()
function is declared but not used anywhere in gfluidcore.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but not sure if everything is ok about .simd.hpp
GAPI Fluid: SIMD Div kernel. * HAL implementation for Div kernel * Removed dbg lines * Applied comments. * Reworked * Final version
SIMD for Divide kernel.
Performance report:
FluidDivSIMD.xlsx