Skip to content

FFT vmulComplex optimization using multiply-add/subtract #48

@ianier

Description

@ianier

The vmulComplex functions in XDSP.h could be further optimized like this:

// (r1, i1) * (r2, i2) = (r1r2 - i1i2, r1i2 + r2i1)
XMVECTOR vr1r2 = XMVectorMultiply(r1, r2);
XMVECTOR vr1i2 = XMVectorMultiply(r1, i2);
rResult = XMVectorNegativeMultiplySubtract(i1, i2, vr1r2); // real: (r1r2 - i1i2)
iResult = XMVectorMultiplyAdd(r2, i1, vr1i2); // imaginary: (r1i2 + r2i1)

On SSE2 it makes no difference, but when compiling for ARM it does.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions