Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT based M2L #73

Merged
merged 130 commits into from Aug 16, 2021
Merged

FFT based M2L #73

merged 130 commits into from Aug 16, 2021

Conversation

isuruf
Copy link
Collaborator

@isuruf isuruf commented Jul 2, 2021

Replaces #36

For FFT based M2L, we first need to take the FFT of the source coeffs. This is given by
sumpy.expansion.local.LocalExpansionBase.m2l_preprocess_exprs and done
separately in a new loopy kernel produced by sumpy.e2e.E2EFromCSRWithFFTPreprocess.
This happens in a separate loopy kernel because the FFT of the inputs needs to happen
only once for each source box. Since the M2L is done for each target box, if we didn't do the
FFT separately, source boxes will be processed using FFT multiple times.

After that the derivatives which are precomputed for translation classes need to be
FFTed as well. This is done in the same loopy kernel that computes the derivatives.

M2L is then just the elementwise multiplication of the FFT of the source coeffs and
the FFT of the derivatives. After adding these for each target box, we do an inverse FFT
for each target box. This needs to happen outside the inner loop of iterating the source
boxes in M2L to avoid redundant calculations. The expressions involved are given by
sumpy.expansion.local.LocalExpansionBase.m2l_postprocess_exprs.

One other major change is that the loopy instructions depend on whether the input to
M2L is real or not. If it's real, FFT and then inverse FFT gives us complex expressions
with imaginary part close to zero. We need to truncate the imaginary part if the input was
real, but keep the imaginary part as it is if the input was complex. Therefore
E2EFromCSE.get_kernel gained an additional parameter result_dtype.

@isuruf
Copy link
Collaborator Author

isuruf commented Jul 9, 2021

@inducer, this is ready.

Note that I have disabled FFT tests for H2DLocal and Y2DLocal because they are inaccurate. The scaling here is not working properly. I'll try to fix that in a follow-up PR if that's okay.

Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 This is getting close!

sumpy/e2e.py Outdated Show resolved Hide resolved
sumpy/e2e.py Outdated Show resolved Hide resolved
sumpy/e2e.py Outdated Show resolved Hide resolved
sumpy/e2e.py Outdated Show resolved Hide resolved
sumpy/expansion/local.py Outdated Show resolved Hide resolved
sumpy/expansion/local.py Outdated Show resolved Hide resolved
sumpy/expansion/local.py Outdated Show resolved Hide resolved
sumpy/expansion/local.py Outdated Show resolved Hide resolved
sumpy/fmm.py Show resolved Hide resolved
sumpy/tools.py Show resolved Hide resolved
@isuruf isuruf requested a review from inducer August 14, 2021 04:37
@inducer
Copy link
Owner

inducer commented Aug 16, 2021

This looks good. Thanks for being patient with me and seeing this through to completion!

@inducer inducer merged commit a9a6806 into inducer:main Aug 16, 2021
@isuruf isuruf deleted the fft branch August 17, 2021 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants