Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

hsyang1222 · 2023-04-17T05:44:55Z

🚀 Feature
Currently, the queue generates training data by a dataloader inside, which fetches subjects from disk and performs transformations in async way with generates patches. As the number of fetching subjects increases, the number of subjects processed asynchronously reduces the possibility and number of blocking, which in turn accelerates learning. At the same time, however, the number of fetched subjects also increases the amount of memory consumed. The user should be able to decide how many subjects to preload.

Motivation
It is currently not possible to determine the 'number of subjects to preload'. You can set max_length as a similar function, but this is a keyword that determines the maximum number patch generating.

Pitch
(1) I think the "number of preloaded subjects" should be much larger than the current default value, and (2) users should be able to set this number to their liking.
The 'number of preloaded subjects' is determined by the prefetch_factor of the dataloader in _get_subjects_iterable. You can read more about it at https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader. Currently it is not set, so it is set to the default value of 2, which means that only 2*num_workers subjects are prefetched. Considering that num_workers is usually 8 or less, the number of subjects prefetched will be 16 or less per GPU.

Alternatives
The only way to handle this besides setting the prefetch_factor separately is to greatly increase the num_workers, resulting in as many subjects being prefetched as desired. However, this is not feasible in low CPU environments and can cause throttling in high CPU environments.

Additional context
Let's think about how big we want the prefetch_factor to be. First, remember that we set max_length to control the memory consumed by the existing queue. Let's call the amount of memory required to store max_length patches U_mp and the amount of memory required to store the subjects we prefetch U_ps. The total amount stored in memory is U_mp + U_ps. To fill U_mp (to load max_length patches), we need max_length/samples_per_volume of subjects. So U_ps = (U_mp/samples_per_volume + other negligibly small capacities). Typically, samples_per_volume is a relatively large value, such as 64, so the value of U_ps is calculated to be small enough.

With these assumptions, setting prefetch_factor as large as the number of max_length patches to store should not be too memory intensive. Therefore, I propose the following : Set the default value of prefetch_factor to max_length/(samples_per_volume*num_workers) instead of the default value=2. This will prefetch as many subjects as needed to store max_length patches.

There should be a keyword to set if the user wants to adjust the prefetch_factor. However, adjusting the prefetch_factor directly would require taking into account the numbers for max_length, samples_per_volume, and num_workers, and knowing how they correlate. Instead, I propose the prefetch_filling_subject_factor keyword, which sets how many times the number of subjects needed to fill max_length should be prefected. For example, if max_length=256 and prefetch_filling_subject_factor=2.0, it will prefetch the number of subjects needed to fill a 256-patch_list twice. The actual prefetch_factor can be calculated inside the queue using max_length, samples_per_volume, and num_workers.

I tested this change in an environment where I loaded and trained 500 subjects of size 512x512x512. I got about a 2x acceleration in training time. Since there will be no prefetched subjects in the first epoch, the effect of the improvement will be noticeable in the second epoch. However, I should mention that my results may be a bit non-typical as I only generate 2 patches per subject.

hsyang1222 · 2023-04-18T00:39:02Z

The performance increase is inconsistent, I'll do more performance evaluation and open the issue

hsyang1222 added the enhancement New feature or request label Apr 17, 2023

hsyang1222 closed this as completed Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

hsyang1222 commented Apr 17, 2023 •

edited

Loading

hsyang1222 commented Apr 18, 2023

Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

Comments

hsyang1222 commented Apr 17, 2023 • edited Loading

hsyang1222 commented Apr 18, 2023

hsyang1222 commented Apr 17, 2023 •

edited

Loading