Conversation
|
I'm seeing non-trivial improvements from this, up to 8% end-to-end for my FFT implementation, so I'd really appreciate this getting reviewed and published as a point release. |
|
Haven't forgotten about this, just haven't gotten to it yet. Will try to do soon, if no one else gets to it before me. |
interleave() matching std::simd API, faster than zip_low/zip_high on AVX2interleave method
| pub(crate) fn handle_deinterleave( | ||
| &self, | ||
| method_sig: TokenStream, | ||
| vec_ty: &VecType, | ||
| ) -> TokenStream { | ||
| let unzip_low = generic_op_name("unzip_low", vec_ty); | ||
| let unzip_high = generic_op_name("unzip_high", vec_ty); | ||
| quote! { | ||
| #method_sig { | ||
| (self.#unzip_low(a, b), self.#unzip_high(a, b)) | ||
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
WHy canw e not apply the same optimization for deinterleave?
There was a problem hiding this comment.
That's an oversight on my part. I was focused on interleaving, since that's the hot path in my code, and forgot to optimize deinterleave. I'll do so shortly.
There was a problem hiding this comment.
It's not big deal, I was just wondering.
LaurenzV
left a comment
There was a problem hiding this comment.
Seems fine to me, since no one else has commented I presume no one has any objections. Thanks!
Adds a new method with API matching std::simd's
interleavemethod.The primary motivation is performance: on AVX2,
zip_lowfollowed byzip_highrequires 6 instructions, while a combinedinterleavefunction only needs 4 instructions. (With AVX-512 we'd be able to to do it in 2 instructions on 256-bit vectors, but that's not supported by fearless_simd yet and #201 seems stalled).This also improves API compatibility with
std::simdas a nice bonus.AI use disclosure: this work was assisted by Claude Opus 4.5 for the initial commit and 4.6 for the rest. I have manually reviewed the code and take full responsibility for it.