-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements for estimate_shift_2D() and other FFTs #2358
Conversation
Calculate optimal FFT size better
I think that if the input data is real, we should use |
What are the numpy and scipy version on your machine? Do you have the numpy linked against the mkl libraries? |
@ericpre I do - as I write I think the difference is due to scipy being "smarter" with the input than numpy, and that's all. I'll confirm tomorrow. Either way, calculating the optimal size as I have here is the more obvious improvement according to my profiling. Will see what I can do tomorrow. Not sure why CI is failing btw. Looking at https://github.com/scipy/scipy/blob/v1.4.1/scipy/signal/signaltools.py#L377 this is almost certainly the reason for the discrepancy - scipy is smarter with rfftn vs fftn for real data, while we rely only on fftn. I'll test making this change tomorrow to see how it works. |
Yes, this sounds sensible: this recent benchmark (https://github.com/project-gemmi/benchmarking-fft/) show the same thing - there is a consistent factor of 2 between real-to-complex and complex-to-complex. |
Awesome thanks! I'll get it tidied up if it works then :-) |
OK I've tested this now, So I've added a check for this to use the faster version if we can, when >>> %timeit s.estimate_shift2D(show_progressbar=False)
# Before
4.80 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# After (using fftn())
1.92 s ± 40.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# After (using rfftn())
1.44 s ± 40.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Here is an example test error for sub-pixel alignment if we were to use tests/signal/test_2D_tools.py:76: AssertionError
______________ TestSubPixelAlign.test_estimate_subpix[True-stat] _______________
self = <hyperspy.tests.signal.test_2D_tools.TestSubPixelAlign object at 0x7f3be0035a10>
normalize_corr = True, reference = 'stat'
@pytest.mark.parametrize(("normalize_corr", "reference"),
_generate_parameters())
def test_estimate_subpix(self, normalize_corr, reference):
s = self.signal
shifts = s.estimate_shift2D(sub_pixel_factor=200,
normalize_corr=normalize_corr)
np.testing.assert_allclose(shifts, self.shifts, rtol=0.2, atol=0.2,
> verbose=True)
E AssertionError:
E Not equal to tolerance rtol=0.2, atol=0.2
E
E Mismatched elements: 2 / 20 (10%)
E Max absolute difference: 0.875
E Max relative difference: 0.41079812
E x: array([[-0. , -0. ],
E [ 4.07 , 1.255],
E [ 1.93 , 3.255],...
E y: array([[ 0. , 0. ],
E [ 4.3 , 2.13],
E [ 1.65, 3.58],... To be fair, the tests are created by applying shifts using @dnjohnstone any view on this behaviour and which is "better"? |
Indeed, this is a fairly bias ground truth... |
I went through and looked at other places where we used the conservative power-of-two for FFT sizes, and replaced them with a wrapper function to CI failures are unrelated - it's only broken on MacOS |
Would it make sense to also use rfft (when suitable) in Line 3412 in 4be8885
On a similar topic, one thing I found not great is that the |
@ericpre I think you're probably right, but I'm not too sure about potential impact it could have generally - at least inside |
Yes, good point, maybe add an option to choose one of the two? |
@ericpre I added an option to return either just the real part from the IFFT (default behaviour) vs. all. |
Great thanks, this looks good to me. Is there any reason to revert the commit adding the option to use the |
@ericpre |
To summarise: I don't really know why my changes have affected the memory usage in 3.8, but here are the steps I've taken:
Point (1) seems to have helped it get past the first sticking point: 1.0.4485, but I'll let AppVeyor catch up and test the other changes I've made to see if it gets all the way through. My worry is that is actually a much deeper issue with parallel pools, since the correct way to use them is this, and we don't do that.
So that the context manager can properly collect and clean up. An interesting Python 3.8 issue where this is discussed is here (39360). |
It is fluctuating a lot and the one I posted here was possibly the first I ran (which was quite surprising) and it is one of the worse in term of memory usage. |
Yeah I'm looking at it still - might have a fix. |
samfire tests work with float16 dtype also
@ericpre why does |
FYI it looks like switching the dtype to float16 and improving how we close the |
This is now ready for review. FWIW I believe the appveyor issues with samfire have nothing to do with the original changes in the PR to the FFTs. I actually think there is something inherently wrong with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good to me! See comment about putting it back a previous change.
It maybe worth opening an issue about your observation on samfire to keep a record.
As it is currently, BaseSignal will import almost everything, so this doesn't sound unexpected. However, if this is normal behaviour, this is a different question... |
Thanks @ericpre - I've made the change you requested so I think this is good. I'll open a separate issue(s) about samfire and the basesignal import. |
Great thanks! To be on the safe side, I will merge once appveyor have gone through the backlog... but at last now it doesn't hang for 1h on a single build. |
Sounds sensible, I'm happy with that!
Indeed, the last three commits that have run have worked fine. |
Passed on all of the last 5 commits! Success :) |
Description of the change
As part of this pyxem issue, I noticed that
s.align2D()
spends a long time estimating the shifts between images. I've tracked this down to two bottlenecks:scipy.signal.medfilt
forscipy.ndimage.median_filter
.The scipy documentation reference for the old function says itself that "The more general function scipy.ndimage.median_filter has a more efficient implementation of a median filter and therefore runs much faster.", so this seems like a sensible change.
scipy.signal.fftconvolve
.On a related note, I see that in #2295, the code was switched from scipy FFT to numpy FFT. On my machine at least, the numpy FFT correlation implementation is 2x slower than scipy fftconvolve for the examples below.I know why this is now, it is fixed below by usingrfft()
where possible.Progress of the PR
Minimal example of the bug fix or the new feature
If we combine the two changes together:
@dnjohnstone I think this solves most of the issue discussed in pyxem for
center_direct_beam()
, aside from the threading already mentioned.