Update internals to vectorised SciPy #94

calum-chamberlain · 2017-05-22T02:58:27Z

This PR is a work in progress update to the internals of match-filter. This PR will:

Remove the dependancy on OpenCV (for simpler install);
Change to SciPy internals for more flexibility (will compute fewer ffts);
Compute cross-correlations in a vectorised routine for better efficiency for multiple templates;

To be done:

Remove old functions, or allow them to be used with an optional argument to match_filter;
Work on memory constraints for large, parallel problems - Dask/temporary files?
Run more tests on large datasets.
~~- [ ] Implement multi-threaded fft? (pyfftw)~~
Only use float64 for move_std (use float32 for other calculations to conserve memory)
Remove dependancy on openCV for other functions - rewrite normxcorr2 as a SciPy correlation?
Check whether _spike_test is needed now, if not, remove because it's really slow!

EDIT:

This PR has evolved into a complete overhaul of the internals of EQcorrscan - Scipy was tested and found to be memory inefficient when run in parallel, also memory profiler issues with SLURM have forced a desire to move away from python multiprocessing.
All this has led to the development of C internals using FFTW and openMP.

calum-chamberlain · 2017-05-22T02:59:40Z

Note to @cjhopp and @d-chambers this is not yet ready for a full review, but @cjhopp may want to test this on PAN to see if this gets around the memory issues. So far I have run the tests for synthetic data, but not for real data (haven't had the time at JpGU), so wait for CI etc.

…rrscan into Xarray-internals

cjhopp · 2017-05-22T03:05:42Z

Fingers crossed. I'll have a look and see in the next day or so.

calum-chamberlain · 2017-05-22T04:36:04Z

ci fails look like install issues, but tests don't run at the moment on other machines - running into the bottleneck issue here: bottleneck: 164

Crashes when installing unnecessary packages...

d-chambers · 2017-06-03T16:19:47Z

eqcorrscan/core/match_filter.py

+        templates.std(axis=-1, keepdims=True) * template_length))
+    norm_sum = norm.sum(axis=-1, keepdims=True)
+    stream_fft = np.fft.rfft(stream, fftshape)
+    template_fft = np.fft.rfft(np.flip(norm, axis=-1), fftshape, axis=-1)


Numpy's flip was added in version 1.12.0 so we need to make sure and bump the version in the setup.py

Obspy 1.0.3 does not play nice on Windows, but the current master does, I'm going to keep the appveyor running the osbpy master and travis running the current release until the next obspy release when appveyor should revert to the current obspy release. I think it comes down to their pinning of matplotlib...?

d-chambers · 2017-06-03T16:29:28Z

eqcorrscan/core/match_filter.py

+    #     cccsums = dask.delayed(np.sum)(xcorrs, axis=0)
+    #     if compute:
+    #         cccsums.compute()
+    if cores is None:


Would it make sense to abstract the multiprocessing further up? I know several functions use something similar so maybe we could make a generic pool interface on the module level, then we could have persistent processes/threads in the pool so we wouldn't need to spin them up every time.

calum-chamberlain · 2017-07-19T00:26:03Z

eqcorrscan/tests/match_filter_test.py

-            warnings.warn('The expected result was not achieved, ' +
-                          'but it has the same shape')
-
-    def test_perfect_template_loop(self):


This is tested in the normxcorr2 tests as _template_loop has been removed.

calum-chamberlain · 2017-07-19T00:27:13Z

eqcorrscan/tests/match_filter_test.py

@@ -368,6 +323,19 @@ def test_detection_extraction(self):
        self.assertEqual(len(detections), 4)
        self.assertEqual(len(detection_streams), len(detections))

+    def test_normxcorr(self):


These data have large variations in amplitude late in the continuous data, which leads to floating point error accumulation when using float (32-Bit) in the C routine - overcome by using double (64-Bit float) internally within the C-routine.

calum-chamberlain · 2017-07-19T00:27:50Z

eqcorrscan/tests/match_filter_test.py

@@ -788,14 +763,14 @@ def test_day_long_methods(self):
        # Aftershock sequence, with 1Hz data, lots of good correlations = high
        # MAD!
        day_party = daylong_tribe.detect(
-            stream=st, threshold=4.5, threshold_type='MAD', trig_int=6.0,


This threshold was too low to produce meaningful results.

calum-chamberlain · 2017-07-19T00:28:31Z

eqcorrscan/tests/tutorials_test.py

@@ -17,12 +18,19 @@
 from eqcorrscan.core.match_filter import read_detections


+slow = pytest.mark.skipif(


This is so that we don't always have to run the tutorial tests - they rely on long data downloads, which appveyor is not so great at handling.

calum-chamberlain · 2017-07-19T00:41:31Z

A quick note: tests are passing (despite the apparent fail on appveyor, something is wrong with the interface there), I'm going to work on upping test coverage, but could do with a review. A lot of the changes in here are extra little patches, the main things that need looking at are:

eqcorrscan/lib/multi_corr.c
eqcorrscan/utils/correlate.py

A further note, I have deliberately not written the c functions with python bindings, because I would like them to be employed by other programs if wanted (e.g. matlab or other C funcs).
@cjhopp fancy having a little scan? I'm going to get on with testing on PAN soon, but I think any issues there (memory problems) will result in a further PR with some other extra set of implementations.

cjhopp

All looks good to me. Two minor docstring things.

This PR represents a TON of work and you deserve a hearty THANK YOU, @calum-chamberlain. Applause.

cjhopp · 2017-07-19T14:05:18Z

README.md

-source.  The user should follow the instructions above for OpenCV install.
-
-We have also added subspace detection and correlation derived pick adjustment.
+and writing seismic data, as well as subspace detection, brightness source-scanning


comma at end of line

cjhopp · 2017-07-19T14:58:18Z

eqcorrscan/utils/correlate.py

+        A single Stream object to be correlated with the templates.
+    :type cores: int
+    :param cores:
+        Number of processed to use, if set to None, and dask==False, no


…rrscan into Xarray-internals

calum-chamberlain · 2017-07-20T06:58:58Z

Merged. Will work on openMP loop implementation (for SLURM applications) elsewhere.

calum-chamberlain added 4 commits March 22, 2017 04:41

Start work on changing internals to not cv

577f114

Start working on internals

3b4cae0

Add links to scipy functions

abfd0a3

Tests on synth data work for scipy internals

2ccc6fe

calum-chamberlain requested review from d-chambers and cjhopp May 22, 2017 02:58

calum-chamberlain self-assigned this May 22, 2017

calum-chamberlain added the enhancement label May 22, 2017

calum-chamberlain added this to the 1.0.0 milestone May 22, 2017

calum-chamberlain added 2 commits May 22, 2017 12:00

Merge branch 'develop' into Xarray-internals

c8af6fc

Merge branch 'Xarray-internals' of https://github.com/EQcorrscan/EQco…

97ee390

…rrscan into Xarray-internals

Update CI requirements

4817ad7

calum-chamberlain added 13 commits May 23, 2017 10:49

enforce float64 in internals and float32 externally

bdc9a56

Hack to cope with padding template channels, should be adapted

bb6ba66

Fix some fails, expect more

ec7b524

Adjust download timestamps for NCEDC data

1e61f32

Bump obspy version number for py 3

211902e

Fix channel assignment

796df33

merge conflict

928c5f9

Merge branch 'develop' into Xarray-internals

3f9fd2a

Merge branch 'develop' into Xarray-internals

39bbe74

Remove un-used dependancies

e572c1d

Try install with pip without dependencies

9d7eb5d

Crashes when installing unnecessary packages...

Update appveyor.yml

4b221e0

Update appveyor.yml

77c27f3

d-chambers reviewed Jun 3, 2017

View reviewed changes

calum-chamberlain added 8 commits July 19, 2017 08:45

Try and get the threaded versions working on CI

a6a8271

pyflakes

819f634

Require libxml

6878ede

Typo

80ef087

Add define structures for N_THREADS

04dbbc9

Try pre-processor ifdef

91e8775

link errors

5318a31

link errors

9bcdcce

calum-chamberlain commented Jul 19, 2017

View reviewed changes

Minor textual changes

c5e2a12

calum-chamberlain added the in progress label Jul 19, 2017

calum-chamberlain added 4 commits July 19, 2017 04:22

Include spike_test in match_filter

a86c3e0

parallel fftw runs in 1D to avoid non-threadsafety

7c46ef7

Merge branch 'develop' into Xarray-internals

1492c09

time domain in double precision

8bddade

cjhopp approved these changes Jul 19, 2017

View reviewed changes

calum-chamberlain added 5 commits July 20, 2017 03:56

Cleaning up

e188aed

Merge branch 'develop' into Xarray-internals

17f3bc4

Import naming changes

6fc913e

Merge branch 'Xarray-internals' of https://github.com/eqcorrscan/EQco…

ba88ab1

…rrscan into Xarray-internals

pep8

c4bb36a

calum-chamberlain merged commit 66becdd into develop Jul 20, 2017

calum-chamberlain removed the in progress label Jul 20, 2017

calum-chamberlain deleted the Xarray-internals branch July 20, 2017 06:59

calum-chamberlain mentioned this pull request Jul 20, 2017

OpenCv vs scipy.fftconvolve, is it really worth it? #80

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update internals to vectorised SciPy #94

Update internals to vectorised SciPy #94

calum-chamberlain commented May 22, 2017 •

edited

calum-chamberlain commented May 22, 2017

cjhopp commented May 22, 2017

calum-chamberlain commented May 22, 2017

d-chambers Jun 3, 2017

calum-chamberlain Jun 4, 2017

d-chambers Jun 3, 2017

calum-chamberlain Jul 19, 2017

calum-chamberlain Jul 19, 2017

calum-chamberlain Jul 19, 2017

calum-chamberlain Jul 19, 2017

calum-chamberlain commented Jul 19, 2017 •

edited

cjhopp left a comment

cjhopp Jul 19, 2017

cjhopp Jul 19, 2017

calum-chamberlain commented Jul 20, 2017

		@@ -17,12 +18,19 @@
		from eqcorrscan.core.match_filter import read_detections


		slow = pytest.mark.skipif(

Update internals to vectorised SciPy #94

Update internals to vectorised SciPy #94

Conversation

calum-chamberlain commented May 22, 2017 • edited

calum-chamberlain commented May 22, 2017

cjhopp commented May 22, 2017

calum-chamberlain commented May 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calum-chamberlain commented Jul 19, 2017 • edited

cjhopp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calum-chamberlain commented Jul 20, 2017

calum-chamberlain commented May 22, 2017 •

edited

calum-chamberlain commented Jul 19, 2017 •

edited