-
Notifications
You must be signed in to change notification settings - Fork 223
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Auto vectorized vs packed_simd in arrow #255
Comments
yeap, it has been a battle. I actually have not used |
Ok, so is that possible to change the nonnull case branch to auto vectorized version? |
it is faster; it is simpler => definitely :) |
Hi, @jorgecarleitao . I have a question about the comptibility of vectorization here. Since there are many kinds of SIMD instruction sets(e.g. SSE, AVX, FMA), which are coupled with microarchitecture(e.g. Intel Skylake, AMD Zen2). If we only do simple cross compilation, that is, only specifying target architecture, we may not utilize with SIMD well. AFAIK, this issue is usually solved by function multiversioning. In C++ world, there are some approaches like GCC And I noticed that there is a Is it possible to support this in arrow2? |
Hey @leiysky Yes, we use the See for an example here: |
Nice! I only read the code here, and find there seems no special handling. https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/arithmetics/basic/add.rs Sorry for my misunderstanding. |
I had some doubt before, I think it may not work in the platform without |
To utilize |
Multiversioning allows you to define targets(e.g. |
Rust can cross compile to a different target architecture if you like, but this code only generates it when compiling it for the specified target. E.g. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I found that if the primitive array has no null values, Auto vectorized can outperforms manual simds.
codes:
Currently I did not use Godbolt to see the assembly codes...
The text was updated successfully, but these errors were encountered: