-
Couldn't load subscription status.
- Fork 325
Add flag for dynamo+ddp optimizations #1221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. [ghstack-poisoned]
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. [ghstack-poisoned]
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. [ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See nits in comments. Approved to unblock development.
torchbenchmark/util/extra_args.py
Outdated
| precision = 'fp16' if not model.dargs.precision == "fp32" else 'fp32' | ||
| model.set_module(enable_torchtrt(precision=precision, model=module, example_inputs=exmaple_inputs)) | ||
|
|
||
| if args.optimize_dynamo_ddp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious what kind of optimizations does torchdynamo do for ddp optimizations?
Also, to prevent code bloating in this file, how about we move this part in torchdynamo.py?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytorch/torchdynamo#628 adds extra graph breaks in dynamo. The idea is that instead of DDP having to wait until the entire backward pass is completed, extra graph breaks should allow autograd hooks to get called earlier, and then you can get better overlap of communication (syncing the gradients once they are ready) and computation (computing the rest of the backward pass)
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005) [ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005) [ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Add a flag that can be used to turn dynamo+ddp optimizations on. This will be used to compare how dynamo+ddp performs with and without the additional graph break strategy for improving dynamo+ddp compute/communication overlap. Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005) [ghstack-poisoned]
|
@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Stack from ghstack:
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.
Differential Revision: D39976005