[DDP] unittest for when params arent used in backward pass #52384

rohan-varma · 2021-02-17T20:38:45Z

Stack from ghstack:

Add distributed debug mode func to python #52481 Add distributed debug mode func to python
[DDP] Separate error messages for unused params in forward and not all outputs used in loss computation #52391 [DDP] Separate error messages for unused params in forward and not all outputs
used in loss computation
[DDP] Enhance warning for find_unused_params #52385 [DDP] Enhance warning for find_unused_params
[DDP] unittest for when params arent used in backward pass #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation
[DDP] Enhance warning for find_unused_params #52385 [DDP] Enhance warning for find_unused_params
[DDP] unittest for when params arent used in backward pass #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation
[DDP] Enhance warning for find_unused_params #52385 [DDP] Enhance warning for find_unused_params
[DDP] unittest for when params arent used in backward pass #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation
[DDP] Enhance warning for find_unused_params #52385 [DDP] Enhance warning for find_unused_params
[DDP] unittest for when params arent used in backward pass #52384 [DDP] unittest for when params arent used in backward pass

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.

Differential Revision: D26482479

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]

facebook-github-bot · 2021-02-17T20:38:51Z

💊 CI failures summary and remediations

As of commit a427afe (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-scanned failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.0.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]

Pull Request resolved: #52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 121895039 Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)

… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]

used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * **#52384 [DDP] unittest for when params arent used in backward pass** used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * **#52384 [DDP] unittest for when params arent used in backward pass** Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/) [ghstack-poisoned]

used in loss computation * **#52385 [DDP] Enhance warning for find_unused_params** * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation * **#52385 [DDP] Enhance warning for find_unused_params** * #52384 [DDP] unittest for when params arent used in backward pass This warning should specify that we did not find unused params in the _forward_ pass, which is when we log this warning. This is to avoid confusion when we get an error because not all outputs were used to compute loss, which also raises an error about unused parameters (to be fixed in the next diff) Differential Revision: [D26494136](https://our.internmc.facebook.com/intern/diff/D26494136/) [ghstack-poisoned]

… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]

facebook-github-bot · 2021-02-19T07:36:38Z

This pull request has been merged in c29e279.

…arams in forward and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]

… and not all outputs used in loss computation" used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation** * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass used in loss computation There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/) [ghstack-poisoned]

used in loss computation * #52385 [DDP] Enhance warning for find_unused_params * #52384 [DDP] unittest for when params arent used in backward pass Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. Detailed docs will be added once functionality is enabled. Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/) [ghstack-poisoned]

…2384) Summary: Pull Request resolved: pytorch#52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb

rohan-varma requested review from mrshenli, pritamdamania87 and zhaojuanmao as code owners February 17, 2021 20:38

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Feb 17, 2021

rohan-varma mentioned this pull request Feb 17, 2021

[DDP] Enhance warning for find_unused_params #52385

Closed

zhaojuanmao approved these changes Feb 17, 2021

View reviewed changes

rohan-varma mentioned this pull request Feb 17, 2021

[DDP] Separate error messages for unused params in forward and not all outputs used in loss computation #52391

Closed

rohan-varma mentioned this pull request Feb 19, 2021

Add distributed debug mode func to python #52481

Closed

facebook-github-bot closed this in c29e279 Feb 19, 2021

facebook-github-bot added the Merged label Feb 19, 2021

facebook-github-bot deleted the gh/rohan-varma/237/head branch February 22, 2021 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DDP] unittest for when params arent used in backward pass #52384

[DDP] unittest for when params arent used in backward pass #52384

Uh oh!

rohan-varma commented Feb 17, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 17, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Feb 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[DDP] unittest for when params arent used in backward pass #52384

[DDP] unittest for when params arent used in backward pass #52384

Uh oh!

Conversation

rohan-varma commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

facebook-github-bot commented Feb 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-varma commented Feb 17, 2021 •

edited

Loading

facebook-github-bot commented Feb 17, 2021 •

edited

Loading