Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Fix neon2rvv failing tests #309

Merged
merged 6 commits into from
Feb 29, 2024

Conversation

OMaghiarIMG
Copy link
Contributor

Hello, made the following changes to fix failing tests:

  • vcnt_u8 - correctly load entire look-up table
  • vsri - right shifting an n-bit value by n bits is undefined behaviour, in such cases mask should be set to 0 to copy all bits from input __a
  • vst1_lane - strided store causes errors, vmv_x_s can be used to copy first vector element to scalar register instead
  • vdup - Noticed possible confusion, the use of vmv_s_x is to modify the first element in the vector regiser, vmv_v_x should be used to broadcast copy to all vector elements. Also fixed return-type in forward declaration for vdupq, and corrected usage of vdupq to vdup inside 64-bit vector intrinsics.

Test suite executed with QEMU and Spike.
Using GCC(d6479050ecef10fd5e67b4da989229e4cfac53ee) all tests passing on optimization level 0 to 3.

Using Clang(6008cd40b7bbdc66555550c2e38648d5ce99cc78) all tests passing on optimization level 0 and 1. However on -O2 Clang initially crashed - raised an issue. After fix compilation no longer crashes, but almost half of the tests fail. Further investigation needed.

neon2rvv.h Outdated
@@ -3846,127 +3845,127 @@ FORCE_INLINE uint64x2_t vrsraq_n_u64(uint64x2_t __a, uint64x2_t __b, const int _
}

FORCE_INLINE int8x8_t vsri_n_s8(int8x8_t __a, int8x8_t __b, const int __c) {
uint8_t mask = UINT8_MAX >> __c;
uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using

+uint8_t mask = (uint64_t) UINT8_MAX >> __c;
-uint8_t mask = (__c == 8) ? 0 : UINT8_MAX >> __c;

I think in this way we may avoid branching?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm except the cast won't work for the 64-bit data types, better to keep things uniform between all implementations?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm I feel ok for this part, because there are already some exception happened in 64 bits cases. Therefore if the feedback is good enough I am willing to change this.
How do you think about it? If you think this is unnecessary then I will directly merge this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, made the change for 8/16/32 versions.

@howjmay
Copy link
Owner

howjmay commented Feb 28, 2024

Thank you so much for the contribution, and helping me to resolve my confusion! Except the question I raised above, all the rest of changes are amazing

@howjmay howjmay merged commit c656929 into howjmay:main Feb 29, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants