Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check mismatched number of parameters in DDP _verify_params_across_processes #73547

Closed
zhaojuanmao opened this issue Feb 28, 2022 · 0 comments
Closed
Assignees
Labels
better-engineering Relatively self-contained tasks for better engineering contributors high priority module: ddp Issues/PRs related distributed data parallel training oncall: distributed Add this issue/PR to distributed oncall triage queue onnx-triaged triaged by ONNX team triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@zhaojuanmao
Copy link
Contributor

zhaojuanmao commented Feb 28, 2022

馃殌 The feature, motivation and pitch

Some use cases may encounter errors like mismatched number of parameters in DDP, _verify_params_across_processes should check this error before checking shapes and sizes of parameters

Alternatives

No response

Additional context

No response

cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang

@zhaojuanmao zhaojuanmao added oncall: distributed Add this issue/PR to distributed oncall triage queue module: ddp Issues/PRs related distributed data parallel training labels Feb 28, 2022
@rohan-varma rohan-varma added better-engineering Relatively self-contained tasks for better engineering contributors high priority labels Mar 1, 2022
@rohan-varma rohan-varma added the pt_distributed_rampup Ramp up tasks for new developers on PT distributed label Mar 1, 2022
@rohan-varma rohan-varma added onnx-triaged triaged by ONNX team triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed pt_distributed_rampup Ramp up tasks for new developers on PT distributed labels Mar 8, 2022
@rohan-varma rohan-varma self-assigned this Mar 9, 2022
rohan-varma added a commit that referenced this issue Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 11, 2022
Pull Request resolved: #74113

Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547
ghstack-source-id: 151159056

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma added a commit that referenced this issue Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 11, 2022
Pull Request resolved: #74113

Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547
ghstack-source-id: 151191152

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma added a commit that referenced this issue Mar 14, 2022
Pull Request resolved: #74113

Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547
ghstack-source-id: 151275647

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma added a commit that referenced this issue Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)

[ghstack-poisoned]
rohan-varma added a commit that referenced this issue Mar 14, 2022
Pull Request resolved: #74113

Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547
ghstack-source-id: 151319259

Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
facebook-github-bot pushed a commit that referenced this issue Mar 15, 2022
Summary:
Pull Request resolved: #74113

Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks.

Closes #73547
ghstack-source-id: 151319259

Test Plan: UT

Reviewed By: mrshenli

Differential Revision: D34772067

fbshipit-source-id: 456933111e9996823f1a220b474998e17fb74210
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors high priority module: ddp Issues/PRs related distributed data parallel training oncall: distributed Add this issue/PR to distributed oncall triage queue onnx-triaged triaged by ONNX team triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

2 participants