Multithreading in `diart.benchmark` #85

juanmc2005 · 2022-08-31T09:07:42Z

Problem

Running a benchmark on a huge dataset can take a lot of time. One of the main bottlenecks is that files are processed sequentially.

Idea

Make diart.benchmark (and hence diart.tune) run concurrently on many files at once with a predefined number of workers.
It would be great if progress bars could be kept, otherwise we need to find a good solution to show progress.

Another potential problem is having N segmentation and embedding model copies in memory, but since they're stateless there should be a workaround to share them. However I would accept a first version with N models in RAM anyways and think about potential improvements afterwards.

See RxPY concurrency

The text was updated successfully, but these errors were encountered:

juanmc2005 · 2022-09-13T09:24:47Z

For progress bars, see p_tqdm, tqdm with locks

hbredin · 2022-09-13T11:38:40Z

Alternative: rich

juanmc2005 · 2022-09-13T12:03:57Z

There are two options for progress bars:

A single bar where 1 iteration = 1 file (p_tqdm, rich)
Multiple bars where 1 bar = 1 file, and 1 iteration = 1 chunk/batch (tqdm with locks)

I would accept both but strongly prefer the second.
I'm sure there's also a workaround for rich.

juanmc2005 · 2023-03-09T14:14:28Z

I've been working on this lately.

Rich works well with multithreading, but for some reason it's extremely slow to spawn new workers (maybe because of the GIL?).
When moving to multiprocessing, Rich does not work anymore with multiple bars because the instance of Progress can't be shared between processes. The only solution that I found for this was to use tqdm with locks.

Whenever multiprocessing is not needed, rich is used by default. I'm also implementing it in a way that users can manually choose the progress bar they want.

juanmc2005 · 2023-03-10T16:37:13Z

Implemented in #124

juanmc2005 added the feature New feature or request label Aug 31, 2022

juanmc2005 mentioned this issue Mar 9, 2023

Parallel execution of Benchmark #124

Merged

juanmc2005 added this to the Version 0.7 milestone Mar 9, 2023

juanmc2005 closed this as completed Mar 10, 2023

juanmc2005 mentioned this issue Mar 27, 2023

Version 0.7 #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading in `diart.benchmark` #85

Multithreading in `diart.benchmark` #85

juanmc2005 commented Aug 31, 2022

juanmc2005 commented Sep 13, 2022 •

edited

Loading

hbredin commented Sep 13, 2022

juanmc2005 commented Sep 13, 2022

juanmc2005 commented Mar 9, 2023

juanmc2005 commented Mar 10, 2023

Multithreading in diart.benchmark #85

Multithreading in diart.benchmark #85

Comments

juanmc2005 commented Aug 31, 2022

Problem

Idea

juanmc2005 commented Sep 13, 2022 • edited Loading

hbredin commented Sep 13, 2022

juanmc2005 commented Sep 13, 2022

juanmc2005 commented Mar 9, 2023

juanmc2005 commented Mar 10, 2023

Multithreading in `diart.benchmark` #85

Multithreading in `diart.benchmark` #85

juanmc2005 commented Sep 13, 2022 •

edited

Loading