Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue #1070

Closed
hsyang1222 opened this issue Apr 17, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@hsyang1222
Copy link
Contributor

hsyang1222 commented Apr 17, 2023

🚀 Feature
Currently, the queue generates training data by a dataloader inside, which fetches subjects from disk and performs transformations in async way with generates patches. As the number of fetching subjects increases, the number of subjects processed asynchronously reduces the possibility and number of blocking, which in turn accelerates learning. At the same time, however, the number of fetched subjects also increases the amount of memory consumed. The user should be able to decide how many subjects to preload.

Motivation
It is currently not possible to determine the 'number of subjects to preload'. You can set max_length as a similar function, but this is a keyword that determines the maximum number patch generating.

Pitch
(1) I think the "number of preloaded subjects" should be much larger than the current default value, and (2) users should be able to set this number to their liking.
The 'number of preloaded subjects' is determined by the prefetch_factor of the dataloader in _get_subjects_iterable. You can read more about it at https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader. Currently it is not set, so it is set to the default value of 2, which means that only 2*num_workers subjects are prefetched. Considering that num_workers is usually 8 or less, the number of subjects prefetched will be 16 or less per GPU.

Alternatives
The only way to handle this besides setting the prefetch_factor separately is to greatly increase the num_workers, resulting in as many subjects being prefetched as desired. However, this is not feasible in low CPU environments and can cause throttling in high CPU environments.

Additional context
Let's think about how big we want the prefetch_factor to be. First, remember that we set max_length to control the memory consumed by the existing queue. Let's call the amount of memory required to store max_length patches U_mp and the amount of memory required to store the subjects we prefetch U_ps. The total amount stored in memory is U_mp + U_ps. To fill U_mp (to load max_length patches), we need max_length/samples_per_volume of subjects. So U_ps = (U_mp/samples_per_volume + other negligibly small capacities). Typically, samples_per_volume is a relatively large value, such as 64, so the value of U_ps is calculated to be small enough.

With these assumptions, setting prefetch_factor as large as the number of max_length patches to store should not be too memory intensive. Therefore, I propose the following : Set the default value of prefetch_factor to max_length/(samples_per_volume*num_workers) instead of the default value=2. This will prefetch as many subjects as needed to store max_length patches.

There should be a keyword to set if the user wants to adjust the prefetch_factor. However, adjusting the prefetch_factor directly would require taking into account the numbers for max_length, samples_per_volume, and num_workers, and knowing how they correlate. Instead, I propose the prefetch_filling_subject_factor keyword, which sets how many times the number of subjects needed to fill max_length should be prefected. For example, if max_length=256 and prefetch_filling_subject_factor=2.0, it will prefetch the number of subjects needed to fill a 256-patch_list twice. The actual prefetch_factor can be calculated inside the queue using max_length, samples_per_volume, and num_workers.

I tested this change in an environment where I loaded and trained 500 subjects of size 512x512x512. I got about a 2x acceleration in training time. Since there will be no prefetched subjects in the first epoch, the effect of the improvement will be noticeable in the second epoch. However, I should mention that my results may be a bit non-typical as I only generate 2 patches per subject.

@hsyang1222 hsyang1222 added the enhancement New feature or request label Apr 17, 2023
@hsyang1222 hsyang1222 changed the title Suggestions the prefetch_factor in the queue to minimize the blocking-bottleneck between fetching the subject and transforming Suggestions the modifying default value of prefetch_factor and the argument to set it for minimize the blocking-bottleneck between fetch subject and generate patch in Queue Apr 17, 2023
@hsyang1222
Copy link
Contributor Author

The performance increase is inconsistent, I'll do more performance evaluation and open the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant