-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check mismatched number of parameters in DDP _verify_params_across_processes #73547
Labels
better-engineering
Relatively self-contained tasks for better engineering contributors
high priority
module: ddp
Issues/PRs related distributed data parallel training
oncall: distributed
Add this issue/PR to distributed oncall triage queue
onnx-triaged
triaged by ONNX team
triage review
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Comments
zhaojuanmao
added
oncall: distributed
Add this issue/PR to distributed oncall triage queue
module: ddp
Issues/PRs related distributed data parallel training
labels
Feb 28, 2022
rohan-varma
added
better-engineering
Relatively self-contained tasks for better engineering contributors
high priority
labels
Mar 1, 2022
rohan-varma
added
the
pt_distributed_rampup
Ramp up tasks for new developers on PT distributed
label
Mar 1, 2022
rohan-varma
added
onnx-triaged
triaged by ONNX team
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
and removed
pt_distributed_rampup
Ramp up tasks for new developers on PT distributed
labels
Mar 8, 2022
rohan-varma
added a commit
that referenced
this issue
Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 11, 2022
Pull Request resolved: #74113 Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 ghstack-source-id: 151159056 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma
added a commit
that referenced
this issue
Mar 11, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 11, 2022
Pull Request resolved: #74113 Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 ghstack-source-id: 151191152 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Pull Request resolved: #74113 Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 ghstack-source-id: 151275647 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/) [ghstack-poisoned]
rohan-varma
added a commit
that referenced
this issue
Mar 14, 2022
Pull Request resolved: #74113 Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 ghstack-source-id: 151319259 Differential Revision: [D34772067](https://our.internmc.facebook.com/intern/diff/D34772067/)
facebook-github-bot
pushed a commit
that referenced
this issue
Mar 15, 2022
Summary: Pull Request resolved: #74113 Check mismatch in # of parameters by broadcasting and verifying from rank 0. As a result, non-zero ranks raise an error when # of parameters are mismatched across ranks. Closes #73547 ghstack-source-id: 151319259 Test Plan: UT Reviewed By: mrshenli Differential Revision: D34772067 fbshipit-source-id: 456933111e9996823f1a220b474998e17fb74210
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
better-engineering
Relatively self-contained tasks for better engineering contributors
high priority
module: ddp
Issues/PRs related distributed data parallel training
oncall: distributed
Add this issue/PR to distributed oncall triage queue
onnx-triaged
triaged by ONNX team
triage review
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 The feature, motivation and pitch
Some use cases may encounter errors like mismatched number of parameters in DDP, _verify_params_across_processes should check this error before checking shapes and sizes of parameters
Alternatives
No response
Additional context
No response
cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang
The text was updated successfully, but these errors were encountered: