-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[DDP] unittest for when params arent used in backward pass #52384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit a427afe (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 17, 2021
Pull Request resolved: #52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 121895039 Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)
zhaojuanmao
approved these changes
Feb 17, 2021
rohan-varma
added a commit
that referenced
this pull request
Feb 17, 2021
… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * **#52384 [DDP] unittest for when params arent used in backward pass** used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * **#52384 [DDP] unittest for when params arent used in backward pass** Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 18, 2021
used in loss computation * **#52385 [DDP] Enhance warning for find_unused_params** * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation * **#52385 [DDP] Enhance warning for find_unused_params** * #52384 [DDP] unittest for when params arent used in backward pass This warning should specify that we did not find unused params in the _forward_ pass, which is when we log this warning. This is to avoid confusion when we get an error because not all outputs were used to compute loss, which also raises an error about unused parameters (to be fixed in the next diff) Differential Revision: [D26494136](https://our.internmc.facebook.com/intern/diff/D26494136/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 19, 2021
… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]
This pull request has been merged in c29e279. |
rohan-varma
added a commit
that referenced
this pull request
Feb 19, 2021
…arams in forward and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 19, 2021
… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this pull request
Feb 25, 2021
used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]
aocsa
pushed a commit
to Quansight/pytorch
that referenced
this pull request
Mar 15, 2021
…2384) Summary: Pull Request resolved: pytorch#52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
xsacha
pushed a commit
to xsacha/pytorch
that referenced
this pull request
Mar 31, 2021
…2384) Summary: Pull Request resolved: pytorch#52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
used in loss computation
used in loss computation
used in loss computation
used in loss computation
Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
Differential Revision: D26482479