[GSoC] New universal intrinsic backend for RVV #22179

hanliutong · 2022-07-01T01:05:25Z

This is a patch of my GSoC project whose goal is to make the existing Universal Intrinsic compatible with variable-length backends. Thereby improving the performance of the RISC-V Vector (RVV) backend.

In this patch,

We added a new rvv backend. Unlike the current implementation, we used the rvv type directly instead of the wrapper class, which will improve performance. Currently only the necessary functions are included, mainly v_setall, v_load, v_lut for compatibility with vx_XXX and v_min ,v_max for the MedianBlur example.
To be compatible with other backends that use wrapper classes, we had to modify the existing universal intrinsic interfaces, including Vtraits and function compatibility layers. The documentation has not been updated yet, we will update it when changes of API are finalized.
We modified some of the existing SIMD code to ensure compilation, mainly by adding the packaging of the CV_SIMD macro
We enabled the new rvv backend implementation for the Imgproc module's MedianBlur as an example of how to reuse existing UI-optimized code blocks.

Run the opencv_test_imgproc, the test passed and result are promising: 7040 ms by new implentation vs. 42094 ms by current one.
$ qemu-riscv64 -cpu rv64,x-v=true ./bin/opencv_test_imgproc --gtest_filter="Imgproc_MedianBlur*"

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

modules/core/include/opencv2/core/hal/intrin.hpp

modules/core/include/opencv2/core/hal/intrin_riscv_vector.hpp

vpisarev · 2022-07-05T14:40:45Z

modules/core/include/opencv2/core/hal/intrin_riscv_vector.hpp

+#define OPENCV_HAL_IMPL_RVV_INIT_INTEGER(_Tpvec, _Tp, suffix1, suffix2, vl) \
+inline v_##_Tpvec v_setzero_##suffix1() \
+{ \
+    return vmv_v_x_##suffix2##m1(0, vl); \


setting all lanes of a register to 0's or to specific constant are very common operations that are usually implemented very efficiently. E.g. on assembly level the first operation is done in the same way for all data types:

XOR DST, DST, DST

Is the vector size really needed for those intrinsics?

Maybe we should not use static inline int vlanes() { return vsetvlmax_e8m1(); } because according to my experiment, it will generate redundant ‘setvl’ instructions, the compiler faithfully translates intrinsic to instruction without any optimization.

At fisrt version, we use static variables static int nlanes, which doesn't produce redundant instructions, but needs to be initialized in some cpp file, where we encountered link time errors.

And if we can use static inline variables, things become easy, we can initialized the static variables in the class like static inline int nlanes = vsetvlmax_e8m1(); . However we need c++ 17 for inline variables.

You can see my experiment here: https://godbolt.org/z/Kjsea8x7f

@hanliutong,
we need to think about it. On the one hand, it's a good idea to use static inline variables. This way we will minimize the number of changes in SIMD loops that we have and, as you said, it will eliminate irrelevant calls to vsetvlmax_...().
On the other hand, OpenCV 4.x requires an earlier C++ version, C++ 11 actually. So we need to offer some alternative for C++ 11 users as well. Even users of modern versions of GCC and clang do not always set -std=c++17 in their projects.
Also, what still worries me is that we access nlanes/vlanes() in each single intrinsic implementation. See, for example, implementation of v_min/v_max intrinsics.

I wonder, if the following solution will work on RISC-V platform:

template<> struct VTraits<v_uint8> { static inline int vlanes() { static int nlanes = vsetvlmax_e8m1(); return nlanes; } };

I tried it briefly on my machine, M1 mac, where vsetvlmax_e8m1() was replaced with rand() and it works well. That is, the initialization runs once and then the value is just reused. Can you try it?

it would also be interesting to compare efficiency of the two approaches (always call vsetvlmax_...() or call it once and store in private static variable) on median blur and deep learning convolution loops

According to your latest comment, I have updated the source file, and yes, I also think using static variable in intrin_rvv_scalable.hpp file is the best solution

The experimental results are promising：

always call vsetvl static variable

median blur 7377 ms 7003 ms

convolution 3079 ms 2413 ms

modules/imgproc/src/median_blur.simd.hpp

vpisarev · 2022-07-16T12:29:34Z

@hanliutong, I realized that our possible solution with CV_SIMD_SCALABLE_INIT() can only work if this macro is defined before intrin_rvv_scalable.hpp is included. Probably, the best way is to define

static const int __cv_rvv_u8_nlanes = vlsetvlmax_e8m1();

in the beginning of intrin_rvv_scalable.hpp. In this case CV_SIMD_SCALABLE_INIT() is not required.

vpisarev · 2022-07-18T10:07:01Z

@alalek, I think, the patch is ready; the further work in this GSoC depends on it. Could you please merge it?

alalek · 2022-07-19T03:16:49Z

@alalek, I think, the patch is ready; the further work in this GSoC depends on it. Could you please merge it?

It is not ready.

Some checks were not successful
11 successful, 2 failing, and 1 skipped checks

hanliutong · 2022-07-19T07:33:27Z

Hi @alalek, I have modified the code according to the error message, could you please help me rerun the workflow.

1 workflow awaiting approval
First-time contributors need a maintainer to approve running workflows.
1 pending check

I also found that there are some unnecessary warnings here about initializing other data types with int.
What do we need to do about this warnings? Is there a macro in OpenCV to suppress warnings

alalek · 2022-07-19T21:47:11Z

This patch break "Linux Debug" builds.
Please fix it ASAP.

[GSoC] New universal intrinsic backend for RVV * Add new rvv backend (partially implemented). * Modify the framework of Universal Intrinsic. * Add CV_SIMD macro guards to current UI code. * Use vlanes() instead of nlanes. * Modify the UI test. * Enable the new RVV (scalable) backend. * Remove whitespace. * Rename and some others modify. * Update intrin.hpp but still not work on AVX/SSE * Update conditional compilation macros. * Use static variable for vlanes. * Use max_nlanes for array defining.

fengyuentau · 2024-04-23T07:50:40Z

modules/core/test/test_intrin_utils.hpp

-        EXPECT_NE((size_t)0, (size_t)&data.u.d % CV_SIMD_WIDTH);
-        EXPECT_EQ((size_t)0, (size_t)&out.a.d % CV_SIMD_WIDTH);
-        EXPECT_NE((size_t)0, (size_t)&out.u.d % CV_SIMD_WIDTH);
+        EXPECT_EQ((size_t)0, (size_t)&data.a.d % VTraits<R>::max_nlanes);


Looks like a wrong change. CV_SIMD_WIDTH was replaced with sizeof(typename VTraits<R>::lane_type) * VTraits<R>::vlanes() in other places but not these four lines. Details see #25196 (comment).

hanliutong added 7 commits June 30, 2022 18:59

Add new rvv backend (partially implemented).

4a7c465

Modify the framework of Universal Intrinsic.

f4884dd

Add CV_SIMD macro guards to current UI code.

7b67f3a

Use vlanes() instead of nlanes.

02b203b

Modify the UI test.

b58109c

Enable the new RVV (scalable) backend.

e9fc981

Remove whitespace.

d20bb8d

hanliutong force-pushed the new-rvv branch from df2a53c to d20bb8d Compare July 1, 2022 07:02

asmorkalov added feature platform: riscv labels Jul 4, 2022

vpisarev reviewed Jul 5, 2022

View reviewed changes

modules/core/include/opencv2/core/hal/intrin.hpp Outdated Show resolved Hide resolved

vpisarev reviewed Jul 5, 2022

View reviewed changes

modules/core/include/opencv2/core/hal/intrin_riscv_vector.hpp Outdated Show resolved Hide resolved

vpisarev reviewed Jul 5, 2022

View reviewed changes

vpisarev reviewed Jul 6, 2022

View reviewed changes

modules/imgproc/src/median_blur.simd.hpp Outdated Show resolved Hide resolved

hanliutong added 2 commits July 6, 2022 08:43

Rename and some others modify.

a8875ee

Update intrin.hpp but still not work on AVX/SSE

405d43b

hanliutong force-pushed the new-rvv branch from a3c62c1 to beee73b Compare July 18, 2022 04:09

hanliutong added 2 commits July 18, 2022 06:54

Update conditional compilation macros.

1af0e0f

Use static variable for vlanes.

8a8f1b4

hanliutong force-pushed the new-rvv branch from beee73b to 8a8f1b4 Compare July 18, 2022 06:55

vpisarev self-requested a review July 18, 2022 10:06

vpisarev approved these changes Jul 18, 2022

View reviewed changes

Use max_nlanes for array defining.

a59d211

alalek assigned vpisarev Jul 19, 2022

alalek merged commit 0ef8039 into opencv:4.x Jul 19, 2022

hanliutong mentioned this pull request Jul 21, 2022

[GSoC] Fix compile errors. #22278

Merged

6 tasks

hanliutong deleted the new-rvv branch July 21, 2022 09:18

hanliutong mentioned this pull request Aug 8, 2022

[GSoC] Add more universal intrinsic implementations for RVV. #22353

Merged

6 tasks

alalek mentioned this pull request Aug 21, 2022

(5.x) Merge 4.x #22408

Merged

hanliutong mentioned this pull request Aug 26, 2022

[GSoC] Add remaining universal intrinsic implementations for RVV. #22429

Merged

6 tasks

asmorkalov mentioned this pull request Sep 16, 2022

!!! NOT FOR REVIEW !!! [SVE] Example HAL-compatible SVE code for Linear Resize. #20640

Closed

6 tasks

asmorkalov mentioned this pull request Oct 11, 2022

WIP - Convert scalars-to-vector constructors in SIMD backend to lookup table constructors #20688

Closed

6 tasks

hanliutong mentioned this pull request Feb 28, 2023

Fix HAL compatibility layer #23310

Merged

6 tasks

hanliutong mentioned this pull request Jun 28, 2023

Rewrite SIMD code by using new Universal Intrinsic API : an example. #23885

Merged

6 tasks

hanliutong mentioned this pull request Oct 7, 2023

Clean up the obsolete API of Universal Intrinsic #24371

Merged

6 tasks

asmorkalov mentioned this pull request Oct 31, 2023

About the performance of opencv of the sizeless instruction(Riscv vector, SVE .etc) #21780

Closed

4 tasks

fengyuentau reviewed Apr 23, 2024

View reviewed changes

fengyuentau mentioned this pull request Apr 23, 2024

core: add universal intrinsics for fp16 #25196

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GSoC] New universal intrinsic backend for RVV #22179

[GSoC] New universal intrinsic backend for RVV #22179

hanliutong commented Jul 1, 2022

vpisarev Jul 5, 2022 •

edited

hanliutong Jul 6, 2022

vpisarev Jul 12, 2022 •

edited

vpisarev Jul 12, 2022

hanliutong Jul 18, 2022

vpisarev commented Jul 16, 2022

vpisarev commented Jul 18, 2022

alalek commented Jul 19, 2022

hanliutong commented Jul 19, 2022

alalek commented Jul 19, 2022

fengyuentau Apr 23, 2024 •

edited

[GSoC] New universal intrinsic backend for RVV #22179

[GSoC] New universal intrinsic backend for RVV #22179

Conversation

hanliutong commented Jul 1, 2022

Pull Request Readiness Checklist

vpisarev Jul 5, 2022 • edited

Choose a reason for hiding this comment

hanliutong Jul 6, 2022

Choose a reason for hiding this comment

vpisarev Jul 12, 2022 • edited

Choose a reason for hiding this comment

vpisarev Jul 12, 2022

Choose a reason for hiding this comment

hanliutong Jul 18, 2022

Choose a reason for hiding this comment

vpisarev commented Jul 16, 2022

vpisarev commented Jul 18, 2022

alalek commented Jul 19, 2022

hanliutong commented Jul 19, 2022

alalek commented Jul 19, 2022

fengyuentau Apr 23, 2024 • edited

Choose a reason for hiding this comment

vpisarev Jul 5, 2022 •

edited

vpisarev Jul 12, 2022 •

edited

fengyuentau Apr 23, 2024 •

edited