New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFT based M2L #73
FFT based M2L #73
Conversation
@inducer, this is ready. Note that I have disabled FFT tests for H2DLocal and Y2DLocal because they are inaccurate. The scaling here is not working properly. I'll try to fix that in a follow-up PR if that's okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 This is getting close!
dtype and type are different things in numpy and even though they compare equal, they are hashed differently.
This looks good. Thanks for being patient with me and seeing this through to completion! |
Replaces #36
For FFT based M2L, we first need to take the FFT of the source coeffs. This is given by
sumpy.expansion.local.LocalExpansionBase.m2l_preprocess_exprs
and doneseparately in a new loopy kernel produced by
sumpy.e2e.E2EFromCSRWithFFTPreprocess
.This happens in a separate loopy kernel because the FFT of the inputs needs to happen
only once for each source box. Since the M2L is done for each target box, if we didn't do the
FFT separately, source boxes will be processed using FFT multiple times.
After that the derivatives which are precomputed for translation classes need to be
FFTed as well. This is done in the same loopy kernel that computes the derivatives.
M2L is then just the elementwise multiplication of the FFT of the source coeffs and
the FFT of the derivatives. After adding these for each target box, we do an inverse FFT
for each target box. This needs to happen outside the inner loop of iterating the source
boxes in M2L to avoid redundant calculations. The expressions involved are given by
sumpy.expansion.local.LocalExpansionBase.m2l_postprocess_exprs
.One other major change is that the loopy instructions depend on whether the input to
M2L is real or not. If it's real, FFT and then inverse FFT gives us complex expressions
with imaginary part close to zero. We need to truncate the imaginary part if the input was
real, but keep the imaginary part as it is if the input was complex. Therefore
E2EFromCSE.get_kernel
gained an additional parameterresult_dtype
.