Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change annotate logic from windowed to batched pre- and postprocessing #282

Merged
merged 3 commits into from Mar 30, 2024

Conversation

yetinam
Copy link
Member

@yetinam yetinam commented Mar 30, 2024

Pre- and postprocessing in the annotate function was so far done per window and in numpy. This PR changes the processing to a batched mode running on PyTorch. This also ensures that all processing can happen on the GPU if available. In addition, this now allows to wrap the three processing steps in a thread, allowing to release the GIL during the heavy numeric operations. Overall, this will reduce run times, in particular, on GPU.

As this change improves the efficiency of the asyncio implementation, it removes the (suboptimal) process implementation completely.

To ensure compatibility with the previous implementation, a suite of tests has been added and the former annotate_window_pre/annotate_window_post functions are retained. These functions and tests should be removed in a future PR.

Additional small changes:

  • added minimal benchmark for annotate performance (results coming in a bit)
  • increased default batch size from 64 to 256
  • refactored some functions in seisbench.util

Closes #269

@yetinam yetinam added the enhancement New feature or request label Mar 30, 2024
@yetinam yetinam added this to the v0.7 milestone Mar 30, 2024
Pre- and postprocessing in the annotate function was so far done per window and in numpy. This PR changes the processing to a batched mode running on PyTorch. This also ensures that all processing can happen on the GPU if available. In addition, this now allows to wrap the three processing steps in a thread, allowing to release the GIL during the heavy numeric operations. Overall, this will reduce run times, in particular, on GPU.

As this change improves the efficiency of the asyncio implementation, it removes the (suboptimal) process implementation completely.

To ensure compatibility with the previous implementation, a suite of tests has been added and the former `annotate_window_pre`/`annotate_window_post` functions are retained. These functions and tests should be removed in a future PR.

Additional small changes:
- increased default batch size from 64 to 256
- refactored some functions in seisbench.util
@yetinam
Copy link
Member Author

yetinam commented Mar 30, 2024

Here are the benchmark results. All experiments were run on a server using 12 CPU cores and a V100 GPU for the GPU experiments. Times are in seconds to annotate 10 stations, each with 1 day of 3 component data at 100 Hz. The batch size was set to 4096 (even though the GPU could handle even more).

Old implementation:
CPU
PhaseNetLight 27.505552291870117
PhaseNet 34.15515637397766
EQTransformer 62.846691608428955
DeepDenoiser 515.2418355941772

GPU
PhaseNetLight 18.10126042366028
PhaseNet 18.107561111450195
EQTransformer 17.852076530456543
DeepDenoiser 394.84921383857727


New implementation:
CPU
PhaseNetLight 19.04576873779297
PhaseNet 25.985053300857544
EQTransformer 55.00685524940491
DeepDenoiser 160.62065076828003

GPU
PhaseNetLight 9.644033908843994
PhaseNet 9.980738639831543
EQTransformer 8.486823320388794
DeepDenoiser 17.370351314544678

Overall, there are speed-ups on both CPU and GPU in all configurations. The magnitude of speed-up vary a lot between the models, depending on how heavy pre- and postprocessing are compared to the model itself. A few quick observations:

  • Speed-ups on CPU vary from 1.15x (EQTransformer) over 1.32x (PhaseNet) up to 3.2x (DeepDenoiser).
  • Speed-ups on GPU are larger, varying from 1.81x (PhaseNet) over 2.09x (EQTransformer) up to 22.6x (DeepDenoiser).
  • Consequently, speed-ups between CPU and GPU implementations have now increased. From 1.9x to 2.6x (PhaseNet), from 3.5x to 6.4x (EQTransformer), from 1.3x to 9.2x (DeepDenoiser).

Notably, PhaseNet, PhaseNetLight and EQTransformer can now annotate a day of data at 100 Hz on GPU in under a second, i.e., around 10e5 times faster than real time.

@yetinam yetinam merged commit 991e0c2 into main Mar 30, 2024
13 checks passed
@yetinam yetinam deleted the annotate/parallel_batching branch March 30, 2024 20:25
yetinam added a commit that referenced this pull request Mar 30, 2024
- remove all annotate_window_pre and annotate_window_post functions
- adjusts tests still using these functions
- remove compatibility tests (the compatibility only needed to be checked on the CI once)
- remove further leftovers from process-based interface
yetinam added a commit that referenced this pull request Mar 30, 2024
…_checks

Clean-up legacy annotate code (Follow-up #282)
@yetinam
Copy link
Member Author

yetinam commented Mar 30, 2024

Remark: Dropping the process interface also made the tests much quicker.

JanisHe pushed a commit to JanisHe/seisbench that referenced this pull request Apr 2, 2024
- remove all annotate_window_pre and annotate_window_post functions
- adjusts tests still using these functions
- remove compatibility tests (the compatibility only needed to be checked on the CI once)
- remove further leftovers from process-based interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DeepDenoiser: Use torch for stft
1 participant