New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SEC-C style internal chunking of frequency domain correlations #285
Conversation
There is an indexing error now with this that needs to be resolved. This is readily apparent in the |
So it looks like chunking is good, faster, more efficient at loading CPUs, and less memory intensive. It also passes tests and gives the same results. The gist here shows some of the profiling I ran. I ran a range of other dataset sizes and found that an fft-length of 2**13 was always fastest on my machine. I am tempted to auto-set the fft-len to this. |
I tested this on a cluster running python 3.6, ubuntu 16.04 and using gcc 5.4.0 and ran into libgomp errors for |
What does this PR do?
This PR is inspired by the SEC-C paper. It implements internal chunking of the FFTs of the data. Initial testing shows this to be faster than running really large FFTs. It should also be much more memory efficient because the size of FFTs can be reduced. At the moment this is controlled with an
fft_len
kwarg on the fftw correlation functions. This can be passed through from the matched-filter functions.Why was it initiated? Any relevant Issues?
Senobari et al. (2018) make the point well that EQcorrscan can be very costly in memory. Currently the way around this is to use either shorter processing lengths, or group templates. Neither of these options is very efficient. They present a simple alternative whereby many shorter FFTs can be computed for the correlations. This does not effect accuracy (as using an incorrect/different processing length between template and data would), and allows the template FFTs to be cached while looping through chunks of continuous data. A side effect of this is that longer streams of data could be worked on efficiently.
To do:
Document this behaviour;
- [ ] Estimate the most efficientfft_len
on the fly given the memory restrictions of the system;- [ ] Further parallelism could be enabled, e.g. the currentouter_core
parallelism could be changed to work on the loop over chunks of continuous data;Testing using Add correlation speed-test #180 would be good, some graphs demonstrating the different memory and time requirements would be nice.
Wait until Speed-up clustering #266 is merged, which makes changes to the C functions which will require some tweaking to merge with this.
Long-term, this could be memory efficient enough to be ported to the GPU, which could allow for some serious desktop speed-ups.
PR Checklist
develop
base branch selected?CHANGES.md
.- [ ] First time contributors have added your name toCONTRIBUTORS.md
.