Skip to content

Conversation

rohan-varma
Copy link
Contributor

@rohan-varma rohan-varma commented Feb 17, 2021

Stack from ghstack:

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.

Differential Revision: D26482479

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.

Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 17, 2021

💊 CI failures summary and remediations

As of commit a427afe (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Feb 17, 2021
Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.

Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 17, 2021
Pull Request resolved: #52384

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
ghstack-source-id: 121895039

Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)
rohan-varma added a commit that referenced this pull request Feb 17, 2021
… and not all outputs

used in loss computation"

used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

used in loss computation

There are 2 ways DDP can throw the exception refactored here -
1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this.
2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support.

Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known.

This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable.

error msg with `find_unused_params=True`:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely  means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
error msg without `find_unused_params` specified:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```

Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/)

[ghstack-poisoned]
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* **#52384 [DDP] unittest for when params arent used in backward pass**
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* **#52384 [DDP] unittest for when params arent used in backward pass**

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.

Differential Revision: [D26482479](https://our.internmc.facebook.com/intern/diff/D26482479/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 18, 2021
used in loss computation
* **#52385 [DDP] Enhance warning for find_unused_params**
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation
* **#52385 [DDP] Enhance warning for find_unused_params**
* #52384 [DDP] unittest for when params arent used in backward pass

This warning should specify that we did not find unused params in the
_forward_ pass, which is when we log this warning. This is to avoid confusion
when we get an error because not all outputs were used to compute loss, which
also raises an error about unused parameters (to be fixed in the next diff)

Differential Revision: [D26494136](https://our.internmc.facebook.com/intern/diff/D26494136/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 19, 2021
… and not all outputs

used in loss computation"

used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

used in loss computation

There are 2 ways DDP can throw the exception refactored here -
1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this.
2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support.

Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known.

This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable.

error msg with `find_unused_params=True`:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely  means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
error msg without `find_unused_params` specified:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```

Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c29e279.

rohan-varma added a commit that referenced this pull request Feb 19, 2021
…arams in forward and not all outputs

used in loss computation"

used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

used in loss computation

There are 2 ways DDP can throw the exception refactored here -
1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this.
2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support.

Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known.

This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable.

error msg with `find_unused_params=True`:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely  means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
error msg without `find_unused_params` specified:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```

Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 19, 2021
… and not all outputs

used in loss computation"

used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass
used in loss computation**
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

used in loss computation

There are 2 ways DDP can throw the exception refactored here -
1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this.
2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support.

Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known.

This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable.

error msg with `find_unused_params=True`:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely  means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```
error msg without `find_unused_params` specified:
```
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
making sure all `forward` function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
```

Differential Revision: [D26496688](https://our.internmc.facebook.com/intern/diff/D26496688/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/237/head branch February 22, 2021 15:17
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Feb 25, 2021
used in loss computation
* #52385 [DDP] Enhance warning for find_unused_params
* #52384 [DDP] unittest for when params arent used in backward pass

Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs.

Detailed docs will be added once functionality is enabled.

Differential Revision: [D26508972](https://our.internmc.facebook.com/intern/diff/D26508972/)

[ghstack-poisoned]
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
…2384)

Summary:
Pull Request resolved: pytorch#52384

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
ghstack-source-id: 122001930

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26482479

fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…2384)

Summary:
Pull Request resolved: pytorch#52384

Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
ghstack-source-id: 122001930

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D26482479

fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants