-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
volk_32fc_s32f_atan2_32f: Add NaN tests for avx2 and avx2_fma code #731
Conversation
Signed-off-by: Kenji Rikitake <kenji.rikitake@acm.org>
751ec49
to
c122c35
Compare
volk_profile benchmark results: With old v3.1.0 code:
With the code proposed in this pull request:
Processing time increase rate: 204/175 ~= 1.166 |
According to Wikipedia atan2, one would expect a NaN for Before the new AVX2 and AVX2 FMA kernels, we would basically always use the atan2 function from the standard. atan2f(0.0, 0.0) == 0.0; Thus, this fix makes sense to align the ouputs. |
Normally VOLK strives for maximum performance, but I agree that we should avoid returning NaN, because that can be catastrophic in DSP contexts. (For instance, if an IIR filter receives a single NaN sample, then all future outputs become NaN.) |
I must admit I especially don't like the fallback on the generic implementation at the end of loops here. This kernel is used for phase estimations, and minor implementation details (like whether to jump to -pi or +pi for zero x, positive or negative y) makes it hard to debug this and to use it. Could we maybe just for the "leftover" iteration fill a set MM register with the inputs that are available, run the regular SIMD on it, then instead of storing the whole result, just cherry-pick these elements that are actually based on input? |
might be worth checking what sleef does here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This PR fixes situation where a NaN may corrupt a system. We should really avoid that.
The other concerns, e.g. how to handle the tail case, are valid but out of scope.
@jdemel Thanks for merging! |
volk_32fc_s32f_atan2_32f: Add NaN tests for avx2 and avx2_fma code
This adds a NaN test after the division operation of volk_32fc_s32f_atan2_32f avx2 and avx2_fma code.
This patch solves the issue in #730.
Using volk_profile, the processing time will increase by 17% on my environment of Intel NUC NUC10i7FNH with Intel(R) Core(TM) i7-10710U CPU. I believe this is a necessary cost to maintain full compatibility with the past (i.e., VOLK v3.0.0 and previous) implementations.