New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] New universal intrinsic backend for RVV #22179
Conversation
#define OPENCV_HAL_IMPL_RVV_INIT_INTEGER(_Tpvec, _Tp, suffix1, suffix2, vl) \ | ||
inline v_##_Tpvec v_setzero_##suffix1() \ | ||
{ \ | ||
return vmv_v_x_##suffix2##m1(0, vl); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting all lanes of a register to 0's or to specific constant are very common operations that are usually implemented very efficiently. E.g. on assembly level the first operation is done in the same way for all data types:
XOR DST, DST, DST
Is the vector size really needed for those intrinsics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should not use static inline int vlanes() { return vsetvlmax_e8m1(); }
because according to my experiment, it will generate redundant ‘setvl’ instructions, the compiler faithfully translates intrinsic to instruction without any optimization.
At fisrt version, we use static variables static int nlanes
, which doesn't produce redundant instructions, but needs to be initialized in some cpp file, where we encountered link time errors.
And if we can use static inline variables, things become easy, we can initialized the static variables in the class like static inline int nlanes = vsetvlmax_e8m1();
. However we need c++ 17 for inline variables.
You can see my experiment here: https://godbolt.org/z/Kjsea8x7f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanliutong,
we need to think about it. On the one hand, it's a good idea to use static inline variables. This way we will minimize the number of changes in SIMD loops that we have and, as you said, it will eliminate irrelevant calls to vsetvlmax_...()
.
On the other hand, OpenCV 4.x requires an earlier C++ version, C++ 11 actually. So we need to offer some alternative for C++ 11 users as well. Even users of modern versions of GCC and clang do not always set -std=c++17
in their projects.
Also, what still worries me is that we access nlanes/vlanes() in each single intrinsic implementation. See, for example, implementation of v_min/v_max intrinsics.
I wonder, if the following solution will work on RISC-V platform:
template<> struct VTraits<v_uint8> {
static inline int vlanes() { static int nlanes = vsetvlmax_e8m1(); return nlanes; }
};
I tried it briefly on my machine, M1 mac, where vsetvlmax_e8m1() was replaced with rand() and it works well. That is, the initialization runs once and then the value is just reused. Can you try it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would also be interesting to compare efficiency of the two approaches (always call vsetvlmax_...() or call it once and store in private static variable) on median blur and deep learning convolution loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to your latest comment, I have updated the source file, and yes, I also think using static variable in intrin_rvv_scalable.hpp file is the best solution
The experimental results are promising:
always call vsetvl | static variable | |
---|---|---|
median blur | 7377 ms | 7003 ms |
convolution | 3079 ms | 2413 ms |
@hanliutong, I realized that our possible solution with
in the beginning of intrin_rvv_scalable.hpp. In this case CV_SIMD_SCALABLE_INIT() is not required. |
@alalek, I think, the patch is ready; the further work in this GSoC depends on it. Could you please merge it? |
It is not ready.
|
Hi @alalek, I have modified the code according to the error message, could you please help me rerun the workflow.
I also found that there are some unnecessary warnings here about initializing other data types with int. |
This patch break "Linux Debug" builds. |
[GSoC] New universal intrinsic backend for RVV * Add new rvv backend (partially implemented). * Modify the framework of Universal Intrinsic. * Add CV_SIMD macro guards to current UI code. * Use vlanes() instead of nlanes. * Modify the UI test. * Enable the new RVV (scalable) backend. * Remove whitespace. * Rename and some others modify. * Update intrin.hpp but still not work on AVX/SSE * Update conditional compilation macros. * Use static variable for vlanes. * Use max_nlanes for array defining.
EXPECT_NE((size_t)0, (size_t)&data.u.d % CV_SIMD_WIDTH); | ||
EXPECT_EQ((size_t)0, (size_t)&out.a.d % CV_SIMD_WIDTH); | ||
EXPECT_NE((size_t)0, (size_t)&out.u.d % CV_SIMD_WIDTH); | ||
EXPECT_EQ((size_t)0, (size_t)&data.a.d % VTraits<R>::max_nlanes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a wrong change. CV_SIMD_WIDTH
was replaced with sizeof(typename VTraits<R>::lane_type) * VTraits<R>::vlanes()
in other places but not these four lines. Details see #25196 (comment).
This is a patch of my GSoC project whose goal is to make the existing Universal Intrinsic compatible with variable-length backends. Thereby improving the performance of the RISC-V Vector (RVV) backend.
In this patch,
Run the opencv_test_imgproc, the test passed and result are promising: 7040 ms by new implentation vs. 42094 ms by current one.
$ qemu-riscv64 -cpu rv64,x-v=true ./bin/opencv_test_imgproc --gtest_filter="Imgproc_MedianBlur*"
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.