Skip to content

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Sep 29, 2022

Stack from ghstack:

Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

Differential Revision: D39976005

Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Sep 29, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: 5d3c06a
Pull Request resolved: #1221
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Sep 29, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: d614859
Pull Request resolved: #1221
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Sep 30, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: b590e00
Pull Request resolved: #1221
@davidberard98 davidberard98 requested review from wconstab and xuzhao9 and removed request for wconstab September 30, 2022 17:35
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@davidberard98 davidberard98 marked this pull request as ready for review September 30, 2022 17:45
Copy link
Contributor

@xuzhao9 xuzhao9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See nits in comments. Approved to unblock development.

precision = 'fp16' if not model.dargs.precision == "fp32" else 'fp32'
model.set_module(enable_torchtrt(precision=precision, model=module, example_inputs=exmaple_inputs))

if args.optimize_dynamo_ddp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what kind of optimizations does torchdynamo do for ddp optimizations?
Also, to prevent code bloating in this file, how about we move this part in torchdynamo.py?

Copy link
Contributor Author

@davidberard98 davidberard98 Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytorch/torchdynamo#628 adds extra graph breaks in dynamo. The idea is that instead of DDP having to wait until the entire backward pass is completed, extra graph breaks should allow autograd hooks to get called earlier, and then you can get better overlap of communication (syncing the gradients once they are ready) and computation (computing the rest of the backward pass)

Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Oct 3, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: cd3d9cd
Pull Request resolved: #1221
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Oct 3, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: 74dfb49
Pull Request resolved: #1221
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

Differential Revision: [D39976005](https://our.internmc.facebook.com/intern/diff/D39976005)

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Oct 4, 2022
Add a flag that can be used to turn dynamo+ddp optimizations on. This
will be used to compare how dynamo+ddp performs with and without the
additional graph break strategy for improving dynamo+ddp
compute/communication overlap.

ghstack-source-id: ff80ac9
Pull Request resolved: #1221
@davidberard98
Copy link
Contributor Author

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot deleted the gh/davidberard98/10/head branch October 9, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants