[DataLoader] Fix Windows worker exit detection, fix test_proper_exit #15665
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, in
test_proper_exit,pidin thekill_pidfunctionpytorch/test/test_dataloader.py
Lines 325 to 329 in fe15d6a
pytorch/test/test_dataloader.py
Lines 641 to 646 in fe15d6a
worker_errorandworker_killcases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too smalldataset size / batch size.In this PR, I, in separate commits:
Install
psutil(a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in Install psutil for dataloader tests pietern/pytorch-dockerfiles#29 install psutil and librosa on non-conda builds pietern/pytorch-dockerfiles#30 Update pytorch docker version to 278 ossci-job-dsl#36 and Bump CircleCI docker version to 278 #15795).Rewrite
test_proper_exitwithpsutilso weis_process_alivepytorch/test/test_dataloader.py
Lines 640 to 653 in fe15d6a
worker_errorandworker_killproperly triggerFix Windows data loader not having any mechanism to detect worker failures.