Change annotate logic from windowed to batched pre- and postprocessing #282

yetinam · 2024-03-30T19:16:30Z

Pre- and postprocessing in the annotate function was so far done per window and in numpy. This PR changes the processing to a batched mode running on PyTorch. This also ensures that all processing can happen on the GPU if available. In addition, this now allows to wrap the three processing steps in a thread, allowing to release the GIL during the heavy numeric operations. Overall, this will reduce run times, in particular, on GPU.

As this change improves the efficiency of the asyncio implementation, it removes the (suboptimal) process implementation completely.

To ensure compatibility with the previous implementation, a suite of tests has been added and the former annotate_window_pre/annotate_window_post functions are retained. These functions and tests should be removed in a future PR.

Additional small changes:

added minimal benchmark for annotate performance (results coming in a bit)
increased default batch size from 64 to 256
refactored some functions in seisbench.util

Closes #269

Pre- and postprocessing in the annotate function was so far done per window and in numpy. This PR changes the processing to a batched mode running on PyTorch. This also ensures that all processing can happen on the GPU if available. In addition, this now allows to wrap the three processing steps in a thread, allowing to release the GIL during the heavy numeric operations. Overall, this will reduce run times, in particular, on GPU. As this change improves the efficiency of the asyncio implementation, it removes the (suboptimal) process implementation completely. To ensure compatibility with the previous implementation, a suite of tests has been added and the former `annotate_window_pre`/`annotate_window_post` functions are retained. These functions and tests should be removed in a future PR. Additional small changes: - increased default batch size from 64 to 256 - refactored some functions in seisbench.util

yetinam · 2024-03-30T20:24:59Z

Here are the benchmark results. All experiments were run on a server using 12 CPU cores and a V100 GPU for the GPU experiments. Times are in seconds to annotate 10 stations, each with 1 day of 3 component data at 100 Hz. The batch size was set to 4096 (even though the GPU could handle even more).

Old implementation:
CPU
PhaseNetLight 27.505552291870117
PhaseNet 34.15515637397766
EQTransformer 62.846691608428955
DeepDenoiser 515.2418355941772

GPU
PhaseNetLight 18.10126042366028
PhaseNet 18.107561111450195
EQTransformer 17.852076530456543
DeepDenoiser 394.84921383857727


New implementation:
CPU
PhaseNetLight 19.04576873779297
PhaseNet 25.985053300857544
EQTransformer 55.00685524940491
DeepDenoiser 160.62065076828003

GPU
PhaseNetLight 9.644033908843994
PhaseNet 9.980738639831543
EQTransformer 8.486823320388794
DeepDenoiser 17.370351314544678

Overall, there are speed-ups on both CPU and GPU in all configurations. The magnitude of speed-up vary a lot between the models, depending on how heavy pre- and postprocessing are compared to the model itself. A few quick observations:

Speed-ups on CPU vary from 1.15x (EQTransformer) over 1.32x (PhaseNet) up to 3.2x (DeepDenoiser).
Speed-ups on GPU are larger, varying from 1.81x (PhaseNet) over 2.09x (EQTransformer) up to 22.6x (DeepDenoiser).
Consequently, speed-ups between CPU and GPU implementations have now increased. From 1.9x to 2.6x (PhaseNet), from 3.5x to 6.4x (EQTransformer), from 1.3x to 9.2x (DeepDenoiser).

Notably, PhaseNet, PhaseNetLight and EQTransformer can now annotate a day of data at 100 Hz on GPU in under a second, i.e., around 10e5 times faster than real time.

- remove all annotate_window_pre and annotate_window_post functions - adjusts tests still using these functions - remove compatibility tests (the compatibility only needed to be checked on the CI once) - remove further leftovers from process-based interface

…_checks Clean-up legacy annotate code (Follow-up #282)

yetinam · 2024-03-30T21:37:33Z

Remark: Dropping the process interface also made the tests much quicker.

- remove all annotate_window_pre and annotate_window_post functions - adjusts tests still using these functions - remove compatibility tests (the compatibility only needed to be checked on the CI once) - remove further leftovers from process-based interface

Minimal benchmark script for annotate

6c5d5b4

yetinam added the enhancement New feature or request label Mar 30, 2024

yetinam added this to the v0.7 milestone Mar 30, 2024

yetinam force-pushed the annotate/parallel_batching branch from 0ebcf13 to c648b27 Compare March 30, 2024 19:23

Fix device errors for DeepDenoiser on GPU

7b85ac6

yetinam merged commit 991e0c2 into main Mar 30, 2024
13 checks passed

yetinam deleted the annotate/parallel_batching branch March 30, 2024 20:25

yetinam added a commit that referenced this pull request Mar 30, 2024

Merge pull request #283 from seisbench/annotate/cleanup_compatibility…

54c101a

…_checks Clean-up legacy annotate code (Follow-up #282)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change annotate logic from windowed to batched pre- and postprocessing #282

Change annotate logic from windowed to batched pre- and postprocessing #282

yetinam commented Mar 30, 2024

yetinam commented Mar 30, 2024

yetinam commented Mar 30, 2024

Change annotate logic from windowed to batched pre- and postprocessing #282

Change annotate logic from windowed to batched pre- and postprocessing #282

Conversation

yetinam commented Mar 30, 2024

yetinam commented Mar 30, 2024

yetinam commented Mar 30, 2024