Parallel execution of Benchmark #124

juanmc2005 · 2023-03-09T16:52:18Z

This PR addresses issue #85.

Example usage

from diart.inference import Benchmark, Parallelize
from diart import OnlineSpeakerDiarization, PipelineConfig

config = PipelineConfig()
benchmark = Benchmark("/wav/dir", "/rttm/dir")
p_benchmark = Parallelize(benchmark, num_workers=4)
if __name__ == "__main__":  # Needed for multiprocessing
    p_benchmark(OnlineSpeakerDiarization, config)

Changelog

Add --num-workers argument to diart.benchmark
Add diart.inference.Parallelize, a wrapper for Benchmark to replace sequential execution with multiprocessing
Expose some new fine-grained methods in Benchmark so that Parallelize can reuse it
diart.stream now uses rich progress bars
Add diart.progress package with ProgressBar, RichProgressBar and TQDMProgressBar as adapters for each library
Chronometer can now be aware of the progress bar used so that it can print reports with the correct formatting
BasePipeline objects now must be able to communicate their associated configuration class (through the get_config_class() static method)
PipelineConfig.from_namespace() is now PipelineConfig.from_dict() and receives an easily serializable configuration so that workers can instantiate their own pipelines (entire models cannot be sent to child processes)
- This dictionary needs to be documented and formalized, maybe as a data class. Otherwise its use can be confusing
Add parallelization example in README.md
Models are now lazy. They only load weights when required, making them lighter for inter-process communication

Future improvements and limitations

Optimizer is still not compatible with Parallelize because some progress bars break
Replace tqdm with rich as progress bars in both Benchmark and Optimizer (when not running in parallel)
Spawn segmentation and embedding models as services in separate processes so the GPU memory requirements go down from O(num_workers * model_size) to O(model_size)

… in README

…tionary. Refactoring of diart.models

juanmc2005 added 4 commits March 8, 2023 19:38

Initial multithreading implementation with rich in Benchmark

14d9fed

Remove print statements

d5aef25

Add working multiprocessing in Benchmark using tqdm locks

847a5b1

Add multiprocessing as a 'Parallelize' wrapper. Add docs. Add example…

da570c9

… in README

juanmc2005 added the feature New feature or request label Mar 9, 2023

juanmc2005 added this to the Version 0.7 milestone Mar 9, 2023

juanmc2005 added 2 commits March 10, 2023 15:15

Send configuration object to Benchmark child processes instead of dic…

737c3ac

…tionary. Refactoring of diart.models

Refactoring of diart.models to ease custom model usage

cfb2a9f

juanmc2005 merged commit 4b744ed into develop Mar 10, 2023

juanmc2005 deleted the feat/multithread branch March 10, 2023 16:26

juanmc2005 mentioned this pull request Mar 10, 2023

Multithreading in diart.benchmark #85

Closed

juanmc2005 mentioned this pull request Mar 27, 2023

Version 0.7 #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel execution of Benchmark #124

Parallel execution of Benchmark #124

juanmc2005 commented Mar 9, 2023 •

edited

Parallel execution of Benchmark #124

Parallel execution of Benchmark #124

Conversation

juanmc2005 commented Mar 9, 2023 • edited

Example usage

Changelog

Future improvements and limitations

juanmc2005 commented Mar 9, 2023 •

edited