Speedup 10: 25 % quicker _get_array_dicts and 10 % quicker _prep_data_for_correlation #536

flixha · 2023-01-10T11:38:04Z

What does this PR do?

Implements:

25 % speedup for utils.correlate._get_array_dicts (from 200 s to 154 seconds total time spent in my test)
10 % speedup for utils.pre_processing._prep_data_for_correlation (from 702 s to 640 s total time spent)
(for comparison to the numbers above: 494 s were spent for correlating data in parallel on the GPU)

Main speedups with:

direct operations on UTCDateTime.__dict__['_UTCDateTime__ns'] (5x quicker for one +/- operation)
index-based access to trace in stream with stream.traces[j] instead of stream[j] (4.5x quicker)

Why was it initiated? Any relevant Issues?

For big datasets run on powerful hardware, the cumulative time spent in utils.correlate._get_array_dicts and utils.pre_processing._prep_data_for_correlation makes up a considerable part (>50 %) of a detection run. This PR tries to optimize the functions with some simple changes; I think further speed ups would need changes that are not so trivial...

This PR is part of the list of speedups in #522

PR Checklist

develop base branch selected?
This PR is not directly related to an existing issue (which has no PR yet).
[] All tests still pass.
~~- [ ] Any new features or fixed regressions are be covered via new tests.~~
~~- [ ] Any new or changed features have are fully documented.~~
Significant changes have been added to CHANGES.md.
~~- [ ] First time contributors have added your name to CONTRIBUTORS.md.~~

…'_UTCDateTime__ns']

…ndex

flixha · 2023-01-10T11:42:10Z

Sorry for the mess with 2 commits with the same message; I thought I had reset / removed the first one locally before pushing. As it's so small I think it's easier to leave it than trying to completely remove that commit from remote; anyways it's fixed by the last commit.

calum-chamberlain

Tiny change that I will push through and merge if this goes green.

eqcorrscan/utils/pre_processing.py

very minor speed advantage

flixha added 5 commits January 9, 2023 17:17

speed up _get_array_dicts with direct access to UTCDateTime.__dict__[…

ebb6a81

…'_UTCDateTime__ns']

speed up trace preparation with direct access to stream.traces with i…

13e696f

…ndex

add changelog entry

a38ab89

fix issue in setting endtime manually (had no effect on results)

25f8392

fix issue in setting endtime manually (had no effect on results)

41d4adc

flixha mentioned this pull request Jan 10, 2023

WIP: Speed up a few slowdowns when handling large datasets #522

Open

Merge branch 'develop' into speedup_10_get_array_dicts

2adabad

calum-chamberlain approved these changes Mar 16, 2023

View reviewed changes

eqcorrscan/utils/pre_processing.py Outdated Show resolved Hide resolved

Use np.copy instead of deepcopy

d1afd5f

very minor speed advantage

calum-chamberlain merged commit cdde8c9 into eqcorrscan:develop Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup 10: 25 % quicker _get_array_dicts and 10 % quicker _prep_data_for_correlation #536

Speedup 10: 25 % quicker _get_array_dicts and 10 % quicker _prep_data_for_correlation #536

flixha commented Jan 10, 2023 •

edited

Loading

flixha commented Jan 10, 2023

calum-chamberlain left a comment

Speedup 10: 25 % quicker _get_array_dicts and 10 % quicker _prep_data_for_correlation #536

Speedup 10: 25 % quicker _get_array_dicts and 10 % quicker _prep_data_for_correlation #536

Conversation

flixha commented Jan 10, 2023 • edited Loading

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

flixha commented Jan 10, 2023

calum-chamberlain left a comment

Choose a reason for hiding this comment

flixha commented Jan 10, 2023 •

edited

Loading