Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAPI Fluid: SIMD Div kernel. #20914

Merged
merged 5 commits into from
Nov 15, 2021
Merged

GAPI Fluid: SIMD Div kernel. #20914

merged 5 commits into from
Nov 15, 2021

Conversation

anna-khakimova
Copy link
Member

@anna-khakimova anna-khakimova commented Oct 20, 2021

SIMD for Divide kernel.

Performance report:

FluidDivSIMD.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

Copy link
Contributor

@sivanov-work sivanov-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (besides assert for unsupported types comment in div_hal)

modules/gapi/perf/cpu/gapi_core_perf_tests_fluid.cpp Outdated Show resolved Hide resolved
modules/gapi/src/backends/fluid/gfluidcore.cpp Outdated Show resolved Hide resolved
@@ -743,6 +782,7 @@ GAPI_FLUID_KERNEL(GFluidDiv, cv::gapi::core::GDiv, false)
BINARY_(uchar , short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale);
BINARY_(uchar , float, float, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale);
BINARY_( short, short, short, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale);
BINARY_(ushort, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_DIVIDE, scale);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not for this PR:
maybe in future it would be nice to have something:

GAPI_KERNEL(...)
...
#define KERNEL_XXX_SUPPORTED_TYPES     ushort, int, char,...

static void run(...) {
   run_dispatcher<KERNEL_XXX_SUPPORTED_TYPES>(...);
}

template <class ....Types>
run_dispatcher(...) {
bool expander[] { (run_impl<Type>(), true)...}
}

template<class SupportedType>
run_impl(...) {
BINARY_(SupportedType, ...);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes, however for this purpose it's need to rework the entire Fluid backend. It will take a certain amount of time. As you know the allocation of resources is handled by @dmatveev , so this issue should be discussed with him.

@anna-khakimova anna-khakimova changed the title GAPI Fluid: Reworked Div kernel. GAPI Fluid: SIMD Div kernel. Oct 28, 2021
cv::divide(in_mat1, in_mat2, out_mat_ocv, dtype);

cv::divide(in_mat1, in_mat2, out_mat_ocv, scale, dtype);
//out_mat_ocv.size().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//out_mat_ocv.size().

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -0,0 +1,427 @@
// This file is part of OpenCV project.
Copy link
Member

@alalek alalek Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.simd.hpp

These files are used for dynamic dispatching between multiple instruction sets.


Check:

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Reworked. I've applied dynamic dispatching for gfluidcore_func.simd.hpp the same as it has been applied for gfluidimgproc_func.simd.hpp

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for update 👍

Comment on lines 116 to 119
CV_ALWAYS_INLINE void v_store_select(short* dst, const v_int16&, const v_int16&,
const v_int32& res1, const v_int32& res2)
{
vx_store(dst, v_pack(res1, res2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_store_select
v_pack

Do we need select somewhere here?

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Reworked. This lines were done deliberately as a workaround for a known issue in OCV.
Since one more workaround has been added to the DivPerfTestFluid test, this workaround can be reworked.

template<> struct vector_type_of<ushort> { using type = v_uint16; };
template<> struct vector_type_of<short> { using type = v_int16; };

CV_ALWAYS_INLINE v_float32 v_load_f32(const float* in)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_load_f32

It makes sense to add "gapi_v_" prefix to avoid confusion with OpenCV SIMD functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. However "gapi_v_" too long. I replaced "v" prefix to "vg" prefix.

Comment on lines 332 to 335
//This condition need to workaround bug in OpenCV.
//It reinitializes divider matrix without zero values.
if (dtype == CV_16S && type != CV_16S)
cv::randu(in_mat2, cv::Scalar::all(1), cv::Scalar::all(255));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, There is similar problem in cv::divide() with dst=8S (e.g., src=8U).

Probably accurate condition should be dtype != type ( && dtype != CV_32F)

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, fluid's Div kernel doesn't support 8S type at all.

For supported types:
Since bug observed only when DST_Type == 16S and SRC_Type != 16S (both condition together), I think accurate condition is
(dtype == 16S) && (dtype != type)

This condition has already been applied in the last PR's update.

For all other combinations of DST and SRC types, we must check handling of division by zero cases with this test.

@@ -770,7 +770,10 @@ GAPI_EXPORTS GMat mulC(const GScalar& multiplier, const GMat& src, int ddepth =
The function divides one matrix by another:
\f[\texttt{dst(I) = saturate(src1(I)*scale/src2(I))}\f]

When src2(I) is zero, dst(I) will also be zero. Different channels of
For integer types when src2(I) is zero, dst(I) will also be zero.
Floating point case returns Inf/NaN (according to IEEE).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR resolve #13147 ?

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I haven't heard about this issue earlier. However, I can say with confidence that during testing with DivPerfTestFluid test, behavior of the Div fluid's kernel is fully consistent with the behavior of cv::divide() operation. All CI checks successfully passed. The exception is the behavior is caused with known issue in OpenCV (dtype== CV_16S && type != dtype). For mentioned OCV's issue workaround has already added to test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all CI checks successfully passed, I can assume that mentioned issue in PR #13147 was successfully resolved.

Comment on lines +85 to 90
static inline
typename std::enable_if<!std::is_same<DST, float>::value, DST>::type
div(SRC1 x, SRC2 y, float scale=1)
{
// like OpenCV: returns 0, if y=0
// like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0
auto result = y? scale * x / y: 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, We saw tests failures for "Linux Debug" builder only.
This is strange because SIMD optimized builders should fail too.

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the beginning I saw test failures not for only "Linux Debug". I added workaround for SIMD part long time ago. We've already discussed this workaround here. When we have a call, I've already told you about this workaround. It's already removed with applying this comment.

auto result = scale * x / y;
return saturate<DST>(result, rintf);
}

template<typename DST, typename SRC1, typename SRC2>
static inline DST divr(SRC1 x, SRC2 y, float scale=1)
Copy link
Member

@alalek alalek Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

divr

has the same flow, but it is still not updated.

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within current task this operation wasn't and won't be updated. I've added SIMD for Div kernel only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, divr() function is declared but not used anywhere in gfluidcore.cpp

@dmatveev dmatveev self-assigned this Nov 12, 2021
@dmatveev dmatveev added this to the 4.5.5 milestone Nov 12, 2021
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but not sure if everything is ok about .simd.hpp

@alalek alalek merged commit b19697e into opencv:4.x Nov 15, 2021
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
GAPI Fluid: SIMD Div kernel.

* HAL implementation for Div kernel

* Removed dbg lines

* Applied comments.

* Reworked

* Final version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants