FFT vmulComplex optimization using multiply-add/subtract

The vmulComplex functions in XDSP.h could be further optimized like this:

// (r1, i1) * (r2, i2) = (r1r2 - i1i2, r1i2 + r2i1)
XMVECTOR vr1r2 = XMVectorMultiply(r1, r2);
XMVECTOR vr1i2 = XMVectorMultiply(r1, i2);
rResult = XMVectorNegativeMultiplySubtract(i1, i2, vr1r2); // real: (r1*r2 - i1*i2)
iResult = XMVectorMultiplyAdd(r2, i1, vr1i2); // imaginary: (r1*i2 + r2*i1)

On SSE2 it makes no difference, but when compiling for ARM it does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFT vmulComplex optimization using multiply-add/subtract #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FFT vmulComplex optimization using multiply-add/subtract #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions