Speedup 09: Use shared memory for hypo-dd write_correlations for 20 % speedup #529

flixha · 2022-12-12T11:22:56Z

What does this PR do?

Adds option to move trace data into shared memory for utils.catalog_to_dd.write_correlations for a ~20 % speedup.

Starting each worker for one reference event is slow because the event and stream for all neighbors of the reference event need to be pickled to the worker. Using some shared memory should be able to help here (PR: 20 % speedup by moving trace.data numpy-arrays into shared memory)

Why was it initiated? Any relevant Issues?

For a seismic swarm with ~1600 densely clustered events recorded by ~80 stations, I noticed that I was not able to make full use of all CPU cores when running utils.catalog_to_dd.write_correlations in parallel. I suspect that is due to the large amount of data that has to be shared with each worker process (all event , pick, and stream objects).

Moving numpy-arrays into shared memory is relatively simple, and that gave a ~20 % speedup.
In theory, we we're not changing the trace data from within a worker, so we should be able to just pass the pointer to the shared memory within the worker. But to avoid pool deadlock, unfortunately we still need to copy the data from shared memory into worker memory within the worker process. As that happens in parallel, we're at least saving some time.
Moving event and trace stats objects into shared memory should give more speedup, but that's not as simple as moving the numpy arrays for the data along.

This PR contributes to the summary issue in #522

PR Checklist

develop base branch selected?
This PR is not directly related to an existing issue (which has no PR yet).
All tests still pass.
Any new features or fixed regressions are be covered via new tests.
Any new or changed features have are fully documented.
Significant changes have been added to CHANGES.md.
~~- [ ] First time contributors have added your name to CONTRIBUTORS.md.~~

…ime correlation workers

…ed_memory

calum-chamberlain · 2023-01-01T07:38:53Z

I have always wanted to use shared memory for these kinds of jobs, but never seen the speed-ups to warrant the time spent puzzling over how to do it. Good to see that moving the data into shared memory helps, albeit not as much as it could given the copying. At some point it would be good to write some lower-level code to do this all properly (in C) so that we can actually access memory properly.

The function for moving data to shared memory looks really helpful though and should be a useful starting off point for moving more to shared memory. I wonder if locks can help with avoiding the need to copy data to the worker memory?

flixha · 2023-01-03T10:30:15Z

I'll have to look into how I can make locks work with this shared memory...
Also I see that MacOS and windows don't like the filenames for the shared memory which I'll have to fix.

calum-chamberlain · 2023-01-03T21:52:14Z

I had a look into locks and couldn't see an obvious documented way to solve this problem. It might well be a dead end.

…ed_memory

calum-chamberlain · 2023-03-16T02:04:47Z

Hmm, it seems odd that Windows doesn't like the shared memory filenames still. Any idea what could be going on there?

…t so Windows does not destroy them

flixha · 2023-03-16T10:16:54Z

Hmm, it seems odd that Windows doesn't like the shared memory filenames still. Any idea what could be going on there?

I think I found it finally :-)
It looks like this happened because I only kept the name of the memory reference attached to the traces, rather than a direct reference to the shared memory. Apparently Windows will destroy / unlink the shared memory immediately once there are no more references to it (see description here.

calum-chamberlain

Looks good to me @flixha - Thanks!

flixha added 4 commits December 12, 2022 11:47

add option to keep streams in shared memory for hypodd differential t…

36017b2

…ime correlation workers

fix typo

99fd05b

add more documentation for shared memory use

233e53d

add test for shared memory and changlog entry

17cdb6b

flixha mentioned this pull request Dec 12, 2022

WIP: Speed up a few slowdowns when handling large datasets #522

Open

calum-chamberlain added 2 commits January 1, 2023 17:22

Merge branch 'develop' into speedup_09_hypoDD_write_correlations_shar…

80de651

…ed_memory

Merge branch 'develop' into speedup_09_hypoDD_write_correlations_shar…

83cad25

…ed_memory

Felix and others added 3 commits January 4, 2023 09:28

let multiprocessing figure out shared memory filenames to work on MacOS

def7656

remove unused import

694f0af

Merge branch 'develop' into speedup_09_hypoDD_write_correlations_shar…

e6c5927

…ed_memory

Felix added 2 commits March 16, 2023 10:04

experimental commit: return references to shared memory arrays in lis…

a26d792

…t so Windows does not destroy them

fix merge conflict with develop

7ac9a57

calum-chamberlain approved these changes Mar 16, 2023

View reviewed changes

calum-chamberlain merged commit 82b4c9e into eqcorrscan:develop Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup 09: Use shared memory for hypo-dd write_correlations for 20 % speedup #529

Speedup 09: Use shared memory for hypo-dd write_correlations for 20 % speedup #529

flixha commented Dec 12, 2022

calum-chamberlain commented Jan 1, 2023

flixha commented Jan 3, 2023

calum-chamberlain commented Jan 3, 2023

calum-chamberlain commented Mar 16, 2023

flixha commented Mar 16, 2023

calum-chamberlain left a comment

Speedup 09: Use shared memory for hypo-dd write_correlations for 20 % speedup #529

Speedup 09: Use shared memory for hypo-dd write_correlations for 20 % speedup #529

Conversation

flixha commented Dec 12, 2022

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

calum-chamberlain commented Jan 1, 2023

flixha commented Jan 3, 2023

calum-chamberlain commented Jan 3, 2023

calum-chamberlain commented Mar 16, 2023

flixha commented Mar 16, 2023

calum-chamberlain left a comment

Choose a reason for hiding this comment