-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX] Fix neon2rvv failing tests #309
Conversation
neon2rvv.h
Outdated
@@ -3846,127 +3845,127 @@ FORCE_INLINE uint64x2_t vrsraq_n_u64(uint64x2_t __a, uint64x2_t __b, const int _ | |||
} | |||
|
|||
FORCE_INLINE int8x8_t vsri_n_s8(int8x8_t __a, int8x8_t __b, const int __c) { | |||
uint8_t mask = UINT8_MAX >> __c; | |||
uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using
+uint8_t mask = (uint64_t) UINT8_MAX >> __c;
-uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c;
I think in this way we may avoid branching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm except the cast won't work for the 64-bit data types, better to keep things uniform between all implementations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm I feel ok for this part, because there are already some exception happened in 64 bits cases. Therefore if the feedback is good enough I am willing to change this.
How do you think about it? If you think this is unnecessary then I will directly merge this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, made the change for 8/16/32 versions.
Thank you so much for the contribution, and helping me to resolve my confusion! Except the question I raised above, all the rest of changes are amazing |
Hello, made the following changes to fix failing tests:
vcnt_u8
- correctly load entire look-up tablevsri
- right shifting an n-bit value by n bits is undefined behaviour, in such cases mask should be set to 0 to copy all bits from input __avst1_lane
- strided store causes errors,vmv_x_s
can be used to copy first vector element to scalar register insteadvdup
- Noticed possible confusion, the use ofvmv_s_x
is to modify the first element in the vector regiser,vmv_v_x
should be used to broadcast copy to all vector elements. Also fixed return-type in forward declaration forvdupq
, and corrected usage ofvdupq
tovdup
inside 64-bit vector intrinsics.Test suite executed with QEMU and Spike.
Using GCC(d6479050ecef10fd5e67b4da989229e4cfac53ee) all tests passing on optimization level 0 to 3.
Using Clang(6008cd40b7bbdc66555550c2e38648d5ce99cc78) all tests passing on optimization level 0 and 1. However on -O2 Clang initially crashed - raised an issue. After fix compilation no longer crashes, but almost half of the tests fail. Further investigation needed.