Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Support for the RISC-V Vector Extension in ARM NEON #1130

Merged
merged 25 commits into from
Mar 14, 2024

Conversation

eric900115
Copy link
Collaborator

@eric900115 eric900115 commented Feb 14, 2024

Hi everyone,

This is Eric from National Tsing Hua University (NTHU) pllab. This PR includes the initialization of the conversion of Neon to RISC-V Vector Extension (RVV) for SIMDe.

NTHU pllab and Andes Technology have collaborated to convert NEON intrinsics to the RISC-V Vector Extension, and we have converted all NEON intrinsics to RVV intrinsics. This PR marks the beginning of our work. We will soon upstream all of our work.

We made a few changes in the SIMDe repo to suit our needs:

  • We modified type.h to enable RVV types.
  • We modified files related to memcpy due to memory pollution issues in our implementation.
  • We modified files related to memory load and store for RVV implementation.
  • The initial conversion from Neon to RVV can be viewed in add.h and mul.h.
  • For CI, we've added clang-qemu-rvv in the CI.yml file for RVV testing. The testing runs on a Docker container with Ubuntu 23.10 because we need QEMU 8.0 (which is not available in Ubuntu 22.04). Later, when Ubuntu 24.04 is released on GitHub Actions, testing can be hosted on the Ubuntu 24.04 GitHub Actions machine.

We have included clang-qemu-rvv testing for the following RISC-V V Extension architectures, both with and without ZVFH enabled:

  • vlen = 128 & elen = 64
  • vlen = 256 & elen = 64
  • vlen = 512 & elen = 64

To compile SIMDe with support for the conversion from NEON to the RISC-V Vector Extension, please use Clang-17 and include the flag -mrvv-vector-bits=<vector_length_of_vector_machine> during compilation. Replace <vector_length_of_vector_machine> with the actual vector length of RISC-V vector machine.

mr-c and others added 17 commits February 8, 2024 14:46
* gh-actions macos: skip coverage report
fix : ci.yml

feat : modify types.h for risc-v vector extension

feat : modify simde utilities for rvv

fix : type.h copyright

fix : ci files name

feat : modify load & store for risc-v v extension

feat : modify load & store for risc-v vector

fix : ci file

fix : reinterpret (due to rvv mem pollution)

feat : add and mul neon to rvv

fix : copyright

feat : modify ci.yml

feat : remove TODO
Copy link
Collaborator

@mr-c mr-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting! Has any testing been done on real RISCV64 RVV 1.0 hardware?

Comment on lines 45 to 49
simde_memcpy(&r_, ptr, sizeof(r_));
#if defined(SIMDE_RISCV_V_NATIVE) && SIMDE_ARCH_RISCV_ZVFH
r_.sv64 = __riscv_vle16_v_f16m1((_Float16 *)ptr , 4);
#else
simde_memcpy(&r_, ptr, 8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the sizeof version stay in the non-RISCV_V branches?

Is that not working for the native RVV types a compiler bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sizeof can remain in the non-RVV branch. I will modify them.

Update : I've already reverted sizeof in SIMDe implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no bugs when compiling RVV types. However, the size of simde_xxx_private unions in type.h may not be as expected. The size of these unions can vary due to the differing lengths of RVV vector machines.

To be more specific, the size of simde_uint16x4_private will be 64 bits on vector machines other than RVV. However, for RVV with a VLEN (vector length) of 512, the union size of simde_uint16x4_private may expand to 512 bits, due to the limitations of types in RVV.

@eric900115
Copy link
Collaborator Author

Very exciting! Has any testing been done on real RISCV64 RVV 1.0 hardware?

For testing, we have only tested the code using QEMU and the Spike simulator without real RISC-V RVV 1.0 hardware.

@camel-cdr
Copy link

camel-cdr commented Feb 16, 2024

Amazing work!
I ran a quick benchmark on the kendryte k230 (thead C908) with this neon mandelbrot code and my handwritten rvv mandelbrot code (slightly adjusted to fit the neon, godbolt link):

rvv LMUL=2:  287907470 cycles
rvv LMUL=1:  419831245 cycles
neon:        536969360 cycles
scalar:     1695304921 cycles

(this was run with 256 iterations and generated a 1440x1080 image)

This is a 3.1x speedup, and close to the hand-optimized rvv LMUL=1 implementation!

rvv LMUL=2 is faster, but we can't really expect SIMDe the vector length. A future avx2 implementation might be able to generate such code. See C910 and C908 for a comparison of rvv implementations.

Edit: give me a minute, the numbers should be roughly correct, but I'm revising the neon code slightly done

@mr-c
Copy link
Collaborator

mr-c commented Feb 16, 2024

Thanks @camel-cdr ! Can you run all the SIMDe tests from this PR on the k230?

@camel-cdr
Copy link

Thanks @camel-cdr ! Can you run all the SIMDe tests from this PR on the k230?

I'm currently working on that, however I run into problems with the glibc version on the k230. I used a freestanding build for the benchmark.

@camel-cdr
Copy link

I couldn't figure out how to get the glibc versions to align.

Comment on lines +562 to +564
#elif defined(SIMDE_RISCV_V_NATIVE) && defined(__riscv_v_fixed_vlen)
//FIXME : SIMDE_NATURAL_VECTOR_SIZE == __riscv_v_fixed_vlen
#define SIMDE_NATURAL_VECTOR_SIZE (128)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need fixing before merging? If not, then lets make an issue

@mr-c
Copy link
Collaborator

mr-c commented Feb 17, 2024

To compile SIMDe with support for the conversion from NEON to the RISC-V Vector Extension, please use Clang-17 and include the flag -mrvv-vector-bits=<vector_length_of_vector_machine> during compilation.

What's the plan for GCC support?

What about portable binaries, when will we not have to specify -mrvv-vector-bits?

@eric900115
Copy link
Collaborator Author

Can you add some text to the README.md about using SIMDe on RISC-V?

Sure !

@eric900115
Copy link
Collaborator Author

What about portable binaries, when will we not have to specify -mrvv-vector-bits?

Creating portable binaries for RVV (RISC-V Vector Extension) is not feasible, as explained in the discussion at https://news.ycombinator.com/item?id=37706070. To summarize, the vector size in RVV is determined at compile time, making it impossible to create binaries that can be ported seamlessly between RVV machines with different vector lengths.

@mr-c
Copy link
Collaborator

mr-c commented Feb 17, 2024

Would a binary for a smaller vector size work on a CPU with a larger vector size?

Maybe a future RISC-V profile will mandate a larger vector size.

I guess for Debian and others that want to maximize the performance of SIMDe using apps, we will have to compile multiple times based on the vector widths that are commercially available. Which is what we already do to support the various x86-64 SIMD intrinsics (https://wiki.debian.org/SIMDEverywhere and https://packages.debian.org/source/testing/subarch-select)

@camel-cdr
Copy link

camel-cdr commented Feb 17, 2024

@eric900115 For neon RVV codegen can be 100% portable. If we require the standard V extension (VLEN>=128 and ELEN=64), then we can use LMUL=1 on all implementations, because even for e.g. LMUL=512 a single vector registed does alteast contain 128 bits. We just need to vsetivli to the fixed element count properly.
I'm not sure if -mrvv-vector-bits=128 guarantees to work with VLEN>=128, but it would certainly be possible.
I played arround with using fixed element count load/stores to implement something like this, when the fixed width support didn't exist, but at the time compilers couldn't do the load/store elimination, so it was kindof useless.

#include <riscv_vector.h>
#include <stddef.h>
#include <stdint.h>

typedef struct { uint8_t arr[16]; } V128;

static
V128 vadd8(V128 a, V128 b)
{
    vuint8m1_t A = __riscv_vle8_v_u8m1((void*)&a,16);
    vuint8m1_t B = __riscv_vle8_v_u8m1((void*)&b,16);
    vuint8m1_t C = __riscv_vadd_vv_u8m1(A, B, 16);
    V128 c;
    __riscv_vse8_v_u8m1((void*)&c, C, 16);
   return c;
}


V128 test1(V128 a, V128 b, V128 c)
{
    return vadd8(vadd8(a, b), vadd8(c, c));
}

V128 test2(V128 a, V128 b, V128 c)
{
    vuint8m1_t A = __riscv_vle8_v_u8m1((void*)&a,16);
    vuint8m1_t B = __riscv_vle8_v_u8m1((void*)&b,16);
    vuint8m1_t C = __riscv_vle8_v_u8m1((void*)&c,16);
    V128 r;
    __riscv_vse8_v_u8m1((void*)&r, __riscv_vadd_vv_u8m1(__riscv_vadd_vv_u8m1(A, B, 16), __riscv_vadd_vv_u8m1(C, C, 16), 16), 16);
    return r;
}

@mr-c
Copy link
Collaborator

mr-c commented Feb 17, 2024

I couldn't figure out how to get the glibc versions to align.

Here's my meson setup --cross ... config for using https://packages.debian.org/unstable/clang-18 and running on the official Debian image:

[binaries]
c = 'clang-18'
cpp = 'clang++-18'
ar = 'llvm-ar-18'
strip = 'llvm-strip-18'
objcopy = 'llvm-objcopy-18'
ld = 'riscv64-linux-gnu-ld'

[properties]
c_args   = ['--target=riscv64-linux-gnu', '-isystem=/usr/riscv64-linux-gnu/include', '-Wextra', '-Werror', '-march=rv64imafdcv_zihintpause_zfh_zba_zbb_zbc_zbs_zicsr_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b', '-O3', '-mrvv-vector-bits=128']
cpp_args = ['--target=riscv64-linux-gnu', '-isystem=/usr/riscv64-linux-gnu/include', '-Wextra', '-Werror', '-march=rv64imafdcv_zihintpause_zfh_zba_zbb_zbc_zbs_zicsr_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b', '-O3', '-mrvv-vector-bits=128']
c_link_args = ['--target=riscv64-linux-gnu', '-static', '-static-libgcc']
cpp_link_args = ['--target=riscv64-linux-gnu', '-static', '-static-libgcc', '-static-libstdc++']

[host_machine]
system = 'linux'
cpu_family = 'riscv64'
cpu = 'thead-c906'
endian = 'little'

@camel-cdr
Copy link

Here's my meson setup --cross ... config for using https://packages.debian.org/unstable/clang-18 and running on the official Debian image:

Thanks it worked. I didn't know about the debian image, and was using the k230_sdk thingy.

Running for i in *native*; do ./$i; done in the arm/neon directory results in the following errors:

../test/arm/neon/fma_lane.c:1163: assertion failed: r1[0] ~= simde_vld1q_f32(test_vec[i].r1)[0] (-382857.250000 ~= 503169.843750)
test/arm/neon/fma_lane.cpp:1163: assertion failed: r1[0] ~= simde_vld1q_f32(test_vec[i].r1)[0] (-382857.250000 ~= 503169.843750)
../test/arm/neon/fms_lane.c:873: assertion failed: r1[0] ~= simde_vld1q_f32(test_vec[i].r1)[0] (-506717.906250 ~= 554470.187500)
test/arm/neon/fms_lane.cpp:873: assertion failed: r1[0] ~= simde_vld1q_f32(test_vec[i].r1)[0] (-506717.906250 ~= 554470.187500)
../test/arm/neon/mul_lane.c:865: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (132874.984375 ~= 347499.312500)
test/arm/neon/mul_lane.cpp:865: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (132874.984375 ~= 347499.312500)
../test/arm/neon/mulx_lane.c:315: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (132874.984375 ~= 347499.312500)
test/arm/neon/mulx_lane.cpp:315: assertion failed: r[0] ~= simde_vld1q_f32(test_vec[i].r)[0] (132874.984375 ~= 347499.312500)
../test/arm/neon/qrdmlah.c:193: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (17480 == 3378)
test/arm/neon/qrdmlah.cpp:193: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (17480 == 3378)
../test/arm/neon/qrdmlah_lane.c:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
test/arm/neon/qrdmlah_lane.cpp:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
../test/arm/neon/qrdmlsh.c:197: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (9556 == -32768)
test/arm/neon/qrdmlsh.cpp:197: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (9556 == -32768)
../test/arm/neon/qrdmlsh_lane.c:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
../test/arm/neon/qrdmulh_lane.c:264: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r0)[0] (0 == 14610)
test/arm/neon/qrdmulh_lane.cpp:264: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r0)[0] (0 == 14610)
../test/arm/neon/uqadd.c:339: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (181 == 32767)
test/arm/neon/uqadd.cpp:339: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (181 == 32767)

@mr-c
Copy link
Collaborator

mr-c commented Feb 18, 2024

@camel-cdr Thanks!

Yeah, I'm now also seeing those errors.

I wonder why the qemu setup in this PR isn't reproducing them?

I hope it isn't a hardware error! :-)

@mr-c
Copy link
Collaborator

mr-c commented Feb 18, 2024

Also, hello @eric900115 and @camel-cdr from the Debian Med Sprint in Berlin. Maybe you can join us in person next year? https://wiki.debian.org/Sprints/2023/DebianMed2024

@camel-cdr
Copy link

Sounds interesting, Berlin is only 2-3 hours away from me. But I'm not really involved with Debian (except for running it).

Btw, do you know how Debian deals with compiler bugs? I just ran into an gcc-13.2.0 codegen bug, that causes a valid program to not work vsetvli a5,a1,e8,m8,ta,ma should be vsetvli a5,a1,e8,m8,tu,ma. It's been fixed on trunk, but is this a thing that would be back-ported?

@mr-c
Copy link
Collaborator

mr-c commented Feb 18, 2024

Btw, do you know how Debian deals with compiler bugs? I just ran into an gcc-13.2.0 codegen bug, that causes a valid program to not work vsetvli a5,a1,e8,m8,ta,ma should be vsetvli a5,a1,e8,m8,tu,ma. It's been fixed on trunk, but is this a thing that would be back-ported?

I would personally respond positively to a reportbug gcc-13 with a link to the upstream fix, but I don't know that team so I can't make promises.

Sounds interesting, Berlin is only 2-3 hours away from me. But I'm not really involved with Debian (except for running it).

Anyone is welcome! We appreciate the user perspective!

@eric900115
Copy link
Collaborator Author

@camel-cdr Thanks!

Yeah, I'm now also seeing those errors.

I wonder why the qemu setup in this PR isn't reproducing them?

I hope it isn't a hardware error! :-)

I am also wondering. I'll try to use qemu with same configuration for testing (testing with thread-c906 CPU).

@mr-c
Copy link
Collaborator

mr-c commented Feb 20, 2024

@camel-cdr Do you also get failures on the k230 with the current master branch of SIMDe?

I'm seeing failures in

  • arm/neon/{q,}abs/{emul,native}/{c,cpp}
  • arm/neon/qrdml{a,s}h_lane/{emul,native}/{c,cpp}
  • arm/neon/st3/{emul,native}/{c,cpp}

So I guess there are some clang and/or CPU errors .. ?

@camel-cdr
Copy link

@mr-c Yes, I get similar errors when testing master:

../test/arm/neon/abs.c:711: assertion failed: r[0] == simde_vld1q_s32(test_vec[i].r)[0] (0 == -2147483648)
test/arm/neon/abs.cpp:711: assertion failed: r[0] == simde_vld1q_s32(test_vec[i].r)[0] (0 == -2147483648)
timeout qabs-native-c
timeout qabs-native-cpp
../test/arm/neon/qrdmlah_lane.c:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
../test/arm/neon/qrdmlah_lane.c:678: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (26528 == 18592)
../test/arm/neon/qrdmlah_lane.c:901: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-1308 == 32767)
../test/arm/neon/qrdmlah_lane.c:1128: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-13600 == 25250)
test/arm/neon/qrdmlah_lane.cpp:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
test/arm/neon/qrdmlah_lane.cpp:678: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (26528 == 18592)
test/arm/neon/qrdmlah_lane.cpp:901: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-1308 == 32767)
test/arm/neon/qrdmlah_lane.cpp:1128: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-13600 == 25250)
../test/arm/neon/qrdmlsh_lane.c:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
../test/arm/neon/qrdmlsh_lane.c:1128: assertion failed: r[3] == simde_vld1q_s16(test_vec[i].r)[3] (26847 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:1128: assertion failed: r[3] == simde_vld1q_s16(test_vec[i].r)[3] (26847 == -32768)

I'm somewhat inclined to believe it's a clang miss-compilation, because I had a gcc-13.2 miss-compilation yesterday, I suppose we need to investigate this somehow.

@mr-c
Copy link
Collaborator

mr-c commented Feb 22, 2024

Hey @camel-cdr ; in #1141 I fixed some of the NEON abs functions. Maybe you have time to re-run the tests?

@camel-cdr
Copy link

@mr-c Here we go, looks like the abs errors are gone, great work.

../test/arm/neon/qrdmlah_lane.c:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
../test/arm/neon/qrdmlah_lane.c:678: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (26528 == 18592)
../test/arm/neon/qrdmlah_lane.c:901: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-1308 == 32767)
../test/arm/neon/qrdmlah_lane.c:1128: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-13600 == 25250)
test/arm/neon/qrdmlah_lane.cpp:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
test/arm/neon/qrdmlah_lane.cpp:678: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (26528 == 18592)
test/arm/neon/qrdmlah_lane.cpp:901: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-1308 == 32767)
test/arm/neon/qrdmlah_lane.cpp:1128: assertion failed: r[0] == simde_vld1q_s16(test_vec[i].r)[0] (-13600 == 25250)
../test/arm/neon/qrdmlsh_lane.c:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
../test/arm/neon/qrdmlsh_lane.c:1128: assertion failed: r[3] == simde_vld1q_s16(test_vec[i].r)[3] (26847 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:1128: assertion failed: r[3] == simde_vld1q_s16(test_vec[i].r)[3] (26847 == -32768)

@eric900115
Copy link
Collaborator Author

I have modified mul_lane and mulx_lane. Hope the error in fms_lane, fma_lane, mul_lane, and mulx_lane will be eliminated.

@OMaghiarIMG
Copy link

Hello @eric900115, this is really good stuff.
I have a question, you mentioned you converted all Neon intrinsics to RVV, does that exclude bf16 and cryptography instructions which may not be easily replicated with base V? I think trunk LLVM contains experimental intrinsics for Zvfbfwma and Vector crypto.

Is there anything you might need help with?

@mr-c
Copy link
Collaborator

mr-c commented Mar 8, 2024

Thanks @camel-cdr ; can you retest the latest?

@camel-cdr
Copy link

@mr-c the errors are still there, but the values are different now:

../test/arm/neon/qrdmlah.c:193: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (17480 == 3378)
test/arm/neon/qrdmlah.cpp:193: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (17480 == 3378)
../test/arm/neon/qrdmlah_lane.c:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
test/arm/neon/qrdmlah_lane.cpp:475: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (-18972 == -13752)
../test/arm/neon/qrdmlsh.c:197: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (9556 == -32768)
test/arm/neon/qrdmlsh.cpp:197: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (9556 == -32768)
../test/arm/neon/qrdmlsh_lane.c:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
test/arm/neon/qrdmlsh_lane.cpp:475: assertion failed: r[2] == simde_vld1_s16(test_vec[i].r)[2] (30372 == -32768)
../test/arm/neon/qrdmulh_lane.c:264: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r0)[0] (0 == 14610)
test/arm/neon/qrdmulh_lane.cpp:264: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r0)[0] (0 == 14610)
../test/arm/neon/uqadd.c:339: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (181 == 32767)
test/arm/neon/uqadd.cpp:339: assertion failed: r[0] == simde_vld1_s16(test_vec[i].r)[0] (181 == 32767)

@mr-c
Copy link
Collaborator

mr-c commented Mar 11, 2024

@camel-cdr are those errors from the emul or native tests?

@camel-cdr
Copy link

@mr-c it was the native tests, I ran it via: for i in *native*; do ./$i; done > /dev/null

@eric900115
Copy link
Collaborator Author

eric900115 commented Mar 13, 2024

@OMaghiarIMG

Hi! Yes, we excluded BF16 and cryptography for conversion.

For the conversion from NEON to RVV, if the performance (instruction counts) of using single or multiple RVV intrinsics is better than automatic vectorization, then we use RVV intrinsics for implementation. Otherwise, we use loop automatic vectorization from SIMDe.

@mr-c mr-c merged commit b4e805a into simd-everywhere:master Mar 14, 2024
81 of 85 checks passed
@mr-c
Copy link
Collaborator

mr-c commented Mar 14, 2024

Thank you @eric900115 ! Now that SIMDe 0.8.0 is released we can focus the next development cycle on RVV 1.0 implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants