Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing support for NT #110292

Closed
wants to merge 6 commits into from

Conversation

jbschlosser
Copy link
Contributor

@jbschlosser jbschlosser commented Sep 29, 2023

Stack from ghstack (oldest at bottom):

Fixes #110161

Allows NTs to be used in DataLoaders with num_workers > 1.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Sep 29, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 29, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110292

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a514685 with merge base 46a5558 (image):

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jbschlosser added a commit that referenced this pull request Sep 29, 2023
ghstack-source-id: 710581dcefd934ef187785e1fa396a0f0927e488
Pull Request resolved: #110292
@jbschlosser jbschlosser added topic: improvements topic category release notes: nested tensor Changes that have a direct impact on nested tensors and removed release notes: dataloader release notes category labels Sep 29, 2023
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 29, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / test (default, 1, 3, macos-m1-12)

Details for Dev Infra team Raised by workflow job

@cpuhrsch
Copy link
Contributor

Failures look real unfortunately @jbschlosser

Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
jbschlosser added a commit that referenced this pull request Oct 2, 2023
ghstack-source-id: 8aae0acad796757c7f9f5aad6379c56ad901a1dd
Pull Request resolved: #110292
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 2, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@jbschlosser
Copy link
Contributor Author

@pytorchbot revert -m "Address review comments" -c "weird"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-12-py3-arm64 / build

Details for Dev Infra team Raised by workflow job

@jbschlosser
Copy link
Contributor Author

@pytorchbot merge -f "ignore spurious failure"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kit1980
Copy link
Member

kit1980 commented Oct 6, 2023

@pytorchbot revert -m "Causes CUDA memory leaks" -c nosignal

RuntimeError: CUDA driver API confirmed a leak in main.TestDataLoaderDeviceTypeCUDA.test_nested_tensor_multiprocessing_context_forkserver_cuda! Caching allocator allocated memory was 5120 and is now reported as 10240 on device 0. CUDA driver allocated memory was 340459520 and is now 342556672.

https://github.com/pytorch/pytorch/actions/runs/6425541384/job/17449001020

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@jbschlosser your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Oct 6, 2023
This reverts commit f17fe89.

Reverted #110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](#110292 (comment)))
@jbschlosser jbschlosser reopened this Oct 6, 2023
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.

[ghstack-poisoned]
@jbschlosser
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Oct 10, 2023
@facebook-github-bot facebook-github-bot deleted the gh/jbschlosser/91/head branch October 14, 2023 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nested tensor Changes that have a direct impact on nested tensors Reverted topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants