Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gradient Compression] Refactor tensor grouping in PowerSGD #52981

Closed
wants to merge 5 commits into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Feb 28, 2021

Stack from ghstack:

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to _should_compress function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: D26713689

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 28, 2021

💊 CI failures summary and remediations

As of commit f997deb (more details on the Dr. CI page):



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Mar 03 22:20:25 urllib.error.HTTPError: HTTP Error 403: Forbidden
Mar 03 22:20:25   File "/opt/conda/lib/python3.6/urllib/request.py", line 532, in open
Mar 03 22:20:25     response = meth(req, response)
Mar 03 22:20:25   File "/opt/conda/lib/python3.6/urllib/request.py", line 642, in http_response
Mar 03 22:20:25     'http', request, response, code, msg, hdrs)
Mar 03 22:20:25   File "/opt/conda/lib/python3.6/urllib/request.py", line 570, in error
Mar 03 22:20:25     return self._call_chain(*args)
Mar 03 22:20:25   File "/opt/conda/lib/python3.6/urllib/request.py", line 504, in _call_chain
Mar 03 22:20:25     result = func(*args)
Mar 03 22:20:25   File "/opt/conda/lib/python3.6/urllib/request.py", line 650, in http_error_default
Mar 03 22:20:25     raise HTTPError(req.full_url, code, msg, hdrs, fp)
Mar 03 22:20:25 urllib.error.HTTPError: HTTP Error 403: Forbidden
Mar 03 22:20:25 
Mar 03 22:20:25 ----------------------------------------------------------------------
Mar 03 22:20:25 Ran 1 test in 0.725s
Mar 03 22:20:25 
Mar 03 22:20:25 FAILED (errors=1)
Mar 03 22:20:25 Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /tmp/mnist-data/MNIST/raw/train-images-idx3-ubyte.gz
Mar 03 22:20:25 
0it [00:00, ?it/s]
Mar 03 22:20:25 =================== sccache compilation log ===================
Mar 03 22:20:25 + cleanup
Mar 03 22:20:25 + retcode=1

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@wayi1
Copy link
Contributor Author

wayi1 commented Feb 28, 2021

@tvogels Can you take a look? Thanks!

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Feb 28, 2021
Pull Request resolved: #52981

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)
ghstack-source-id: 122703391

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)
@tvogels
Copy link

tvogels commented Feb 28, 2021

Looks good to me.

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)

[ghstack-poisoned]
No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Feb 28, 2021
Pull Request resolved: #52981

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)
ghstack-source-id: 122712478

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)
No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in #52541 (comment)

Differential Revision: [D26713689](https://our.internmc.facebook.com/intern/diff/D26713689/)

[ghstack-poisoned]
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for refactoring this

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in b59075e.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/66/head branch March 7, 2021 15:16
aocsa pushed a commit to Quansight/pytorch that referenced this pull request Mar 15, 2021
…52981)

Summary:
Pull Request resolved: pytorch#52981

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in pytorch#52541 (comment)
ghstack-source-id: 122997032

Test Plan:
waitforbuildbot

Already LGTMed by PowerSGD paper author.

Ads1x (completed):
https://www.internalfb.com/intern/tupperware/details/job/?handle=priv3_global%2Fmast_hpc%2Ftsm_hpc-wayi_ads_10x_POWER_SGD_gpu8_2021-02-28_15-29.trainer&tatwTabs=tasks&task_id=0&task_tab=TASK_LOGS

Detectron2:
1) Before refactoring:
f254353864
Accuracy: 39.972
Overall training speed: 67498 iterations in 6:15:42 (0.3340 s / it)

2) After refactoring:
f254353380
Accuracy: 39.944
Overall training speed: 67498 iterations in 6:09:41 (0.3286 s / it)

Reviewed By: rohan-varma

Differential Revision: D26713689

fbshipit-source-id: 12cfcb65feaa2a2d94e3c7793073031f13828305
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…52981)

Summary:
Pull Request resolved: pytorch#52981

No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function.

Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in pytorch#52541 (comment)
ghstack-source-id: 122997032

Test Plan:
waitforbuildbot

Already LGTMed by PowerSGD paper author.

Ads1x (completed):
https://www.internalfb.com/intern/tupperware/details/job/?handle=priv3_global%2Fmast_hpc%2Ftsm_hpc-wayi_ads_10x_POWER_SGD_gpu8_2021-02-28_15-29.trainer&tatwTabs=tasks&task_id=0&task_tab=TASK_LOGS

Detectron2:
1) Before refactoring:
f254353864
Accuracy: 39.972
Overall training speed: 67498 iterations in 6:15:42 (0.3340 s / it)

2) After refactoring:
f254353380
Accuracy: 39.944
Overall training speed: 67498 iterations in 6:09:41 (0.3286 s / it)

Reviewed By: rohan-varma

Differential Revision: D26713689

fbshipit-source-id: 12cfcb65feaa2a2d94e3c7793073031f13828305
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants