[BUGFIX] Fix neon2rvv failing tests #309

OMaghiarIMG · 2024-02-28T12:04:02Z

Hello, made the following changes to fix failing tests:

vcnt_u8 - correctly load entire look-up table
vsri - right shifting an n-bit value by n bits is undefined behaviour, in such cases mask should be set to 0 to copy all bits from input __a
vst1_lane - strided store causes errors, vmv_x_s can be used to copy first vector element to scalar register instead
vdup - Noticed possible confusion, the use of vmv_s_x is to modify the first element in the vector regiser, vmv_v_x should be used to broadcast copy to all vector elements. Also fixed return-type in forward declaration for vdupq, and corrected usage of vdupq to vdup inside 64-bit vector intrinsics.

Test suite executed with QEMU and Spike.
Using GCC(d6479050ecef10fd5e67b4da989229e4cfac53ee) all tests passing on optimization level 0 to 3.

Using Clang(6008cd40b7bbdc66555550c2e38648d5ce99cc78) all tests passing on optimization level 0 and 1. However on -O2 Clang initially crashed - raised an issue. After fix compilation no longer crashes, but almost half of the tests fail. Further investigation needed.

howjmay · 2024-02-28T13:25:47Z

neon2rvv.h

@@ -3846,127 +3845,127 @@ FORCE_INLINE uint64x2_t vrsraq_n_u64(uint64x2_t __a, uint64x2_t __b, const int _
 }

 FORCE_INLINE int8x8_t vsri_n_s8(int8x8_t __a, int8x8_t __b, const int __c) {
-  uint8_t mask = UINT8_MAX >> __c;
+  uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c;


how about using

+uint8_t mask = (uint64_t) UINT8_MAX >> __c; -uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c;

I think in this way we may avoid branching?

Hmm except the cast won't work for the 64-bit data types, better to keep things uniform between all implementations?

Umm I feel ok for this part, because there are already some exception happened in 64 bits cases. Therefore if the feedback is good enough I am willing to change this.
How do you think about it? If you think this is unnecessary then I will directly merge this PR

Okay, made the change for 8/16/32 versions.

howjmay · 2024-02-28T13:29:00Z

Thank you so much for the contribution, and helping me to resolve my confusion! Except the question I raised above, all the rest of changes are amazing

OMaghiarIMG added 5 commits February 28, 2024 09:28

Fix vcnt_u8 lookup table load

9082b1e

Fix vsri undefined behaviour

7285a06

Fix vst1_lane store

913fbf7

Fix vdup vmv broadcast

751639d

Fix vdupq declaration and usage

d8bdc4e

OMaghiarIMG mentioned this pull request Feb 28, 2024

Support verifying test implementation on both ARM and x86 #31

Open

howjmay reviewed Feb 28, 2024

View reviewed changes

Set cast to uint64_t for vsri mask shift

8141666

howjmay merged commit c656929 into howjmay:main Feb 29, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUGFIX] Fix neon2rvv failing tests #309

[BUGFIX] Fix neon2rvv failing tests #309

OMaghiarIMG commented Feb 28, 2024

howjmay Feb 28, 2024

OMaghiarIMG Feb 28, 2024

howjmay Feb 28, 2024

OMaghiarIMG Feb 29, 2024

howjmay commented Feb 28, 2024 •

edited

Loading

[BUGFIX] Fix neon2rvv failing tests #309

[BUGFIX] Fix neon2rvv failing tests #309

Conversation

OMaghiarIMG commented Feb 28, 2024

howjmay Feb 28, 2024

Choose a reason for hiding this comment

OMaghiarIMG Feb 28, 2024

Choose a reason for hiding this comment

howjmay Feb 28, 2024

Choose a reason for hiding this comment

OMaghiarIMG Feb 29, 2024

Choose a reason for hiding this comment

howjmay commented Feb 28, 2024 • edited Loading

howjmay commented Feb 28, 2024 •

edited

Loading