-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement InProcessReadingService #1139
Conversation
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None, | ||
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None, | ||
): | ||
if num_workers == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if num_workers == 1
, what is the benefit of using this instead of SingleProcessRS
? Will it actually be faster (probably slower)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using num_workers=1
would create a single worker process that is dedicated to loading data. And, the main process would work on the model training. The use case might be preventing the data part drain the CPU resource.
multiprocessing_context: Optional[str] = None, | ||
worker_prefetch_cnt: int = 10, | ||
main_prefetch_cnt: int = 10, | ||
worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None, | ||
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None, | ||
) -> None: | ||
assert num_workers > 0, "Please use `InProcessReadingService` for num_workers=0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am making a BC-breaking change here. MPRS
won't accept num_workers=0
anymore.
In previous commits, I have been trying to add a __new__
to return InProcessReadingService
when num_workers=0
. However, it doesn't work well with pickling
due to clone and mp.
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
test/dataloader2/test_random.py
Outdated
@@ -26,15 +26,15 @@ def _random_fn(data): | |||
Used to validate the randomness of subprocess-local RNGs are set deterministically. | |||
""" | |||
py_random_num = random.randint(0, 2 ** 32) | |||
np_random_num = np.random.randint(0, 2 ** 32) | |||
np_random_num = np.random.randint(0, 2 ** 32 - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow this problem wasn't revealed until this PR
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM! Let's also add it to this documentation table here:
https://pytorch.org/data/beta/dataloader2.html#readingservice
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
if hasattr(dp, "resume") and callable(dp.resume): | ||
dp.resume() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to skip QueueWrapper
here as well.
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Fixes #1107
Fixes #720
Fixes #616
Changes
InProcessReadingService
(Willing to take any suggestion on naming)MakeInProcessReadingService
as the defaultreading_service
toDataLoader2
.Then,reading_service
always has a value, and remove the logic ofreading_service
is None.MultiProcessingReadingService
num_workers=0
, raise a warning