-
Couldn't load subscription status.
- Fork 60
Description
Proposal
I propose adding fma_mul and fma_div methods to the Complex type. These methods would leverage fused multiply-add (FMA) operations for the calculation.
Motivation
Using FMA can offer significant performance benefits on hardware with native support, but it comes with important trade-offs:
-
Performance Variance: On modern CPUs that support FMA instructions (e.g., AArch64), these methods can be faster. However, without native hardware support, the compiler may fall back to a slow software library call (
fmaf). -
Numerical Differences: FMA computes
a * b + cwith a single rounding operation. This means the results from an FMA-based method are not guaranteed to be bit-for-bit identical to the standard methods.
Implementation
This Compiler Explorer link clearly illustrates the performance dichotomy between architectures and compiler settings: https://godbolt.org/z/joW4eqvT9
If this approach is ok, I would be happy to implement it.