Skip to content

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Aug 23, 2021

Fixes #63657

Stack from ghstack:

Prevent freezing test for CI.

Confirm this test is flacky due to the limitation of CPU per machine. Did reproduce the hanging workers with ASAN build of PyTorch periodically.

After this PR, the number of worker will be limited based on OS.

Differential Revision: D30494185

cc @ssnl @VitalyFedyunin @ejguan

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 23, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 12a23e2 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ejguan ejguan linked an issue Aug 23, 2021 that may be closed by this pull request
@ejguan ejguan added ci/master module: dataloader Related to torch.utils.data.DataLoader and Sampler and removed ci/master labels Aug 23, 2021
ejguan added a commit that referenced this pull request Aug 23, 2021
@ejguan
Copy link
Contributor Author

ejguan commented Aug 23, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ejguan added a commit that referenced this pull request Aug 24, 2021
@ejguan
Copy link
Contributor Author

ejguan commented Aug 24, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ejguan added a commit that referenced this pull request Aug 25, 2021
@ejguan
Copy link
Contributor Author

ejguan commented Aug 25, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…on system resource"

Fixes #63657


Prevent freezing test for CI.

Confirm this test is flacky due to the limitation of CPU per machine. Did reproduce the hanging workers with ASAN build of PyTorch periodically.

After this PR, the number of worker will be limited based on OS.


Differential Revision: [D30494185](https://our.internmc.facebook.com/intern/diff/D30494185)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Aug 26, 2021
@ejguan
Copy link
Contributor Author

ejguan commented Aug 26, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

for batch_size in (8, 16, 32, 64):
for num_workers in range(1, 6):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: min(6, max_num_workers)

…on system resource"

Fixes #63657


Prevent freezing test for CI.

Confirm this test is flacky due to the limitation of CPU per machine. Did reproduce the hanging workers with ASAN build of PyTorch periodically.

After this PR, the number of worker will be limited based on OS.


Differential Revision: [D30494185](https://our.internmc.facebook.com/intern/diff/D30494185)

[ghstack-poisoned]
ejguan added a commit that referenced this pull request Sep 2, 2021
@ejguan
Copy link
Contributor Author

ejguan commented Sep 2, 2021

@ejguan has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 3cd0a4a.

@facebook-github-bot facebook-github-bot deleted the gh/ejguan/82/head branch September 6, 2021 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: dataloader Related to torch.utils.data.DataLoader and Sampler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DISABLED test_ind_worker_queue (__main__.TestIndividualWorkerQueue)
3 participants