-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Gradient Compression] Add logging for gradient compression stats. #54647
Conversation
Summary: Regularly log stats showing effect of gradient compression when using the PowerSGD DDP communication hook. Test Plan: buck run mode/dev-nosan scripts/wayi/torch:power_sgd Play with the layer sizes of the input model (you can just use linear layers for convenience), and check the log that shows compression stats. For convenience, you can change `logging.info` to `print` locally. You can create some test diffs on top of this diff, to show that the compression stats are correct in different cases. Run with power_sgd script: {F537381542} Diff with example using a simple linear model: D27299934 sample output: {F538486535} Reviewed By: SciPioneer Differential Revision: D27240254 fbshipit-source-id: 59efca40c2642bca8e91d56eb3cadb1b4894a7e5
💊 CI failures summary and remediationsAs of commit 7867dd1 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
This pull request was exported from Phabricator. Differential Revision: D27240254 |
Codecov Report
@@ Coverage Diff @@
## master #54647 +/- ##
==========================================
- Coverage 77.47% 77.05% -0.43%
==========================================
Files 1893 1893
Lines 185963 185980 +17
==========================================
- Hits 144074 143298 -776
- Misses 41889 42682 +793 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your help with PyTorch distributed training and contributing to PyTorch community!
This pull request has been merged in 4bf9055. |
Summary: Regularly log stats showing effect of gradient compression when using the PowerSGD DDP communication hook.
Differential Revision: D27240254