-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerSGD] Add orthogonalization with QR factorization #72043
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 488124f (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Job | Step | Action |
---|---|---|
Test | 🔁 rerun |
This comment was automatically generated by Dr. CI (expand for details).
Please report bugs/suggestions to the (internal) Dr. CI Users group.
Overall looks good to me. One question though; as we discussed offline previously, Thijs experiments showed performance improvement with a rank of 2 as well. Do you know the reason why you couldn't observe the same effect in your experiments? |
Great question; I get results similar to Thijs when monitoring only the _orthogonalize method. On the other hand, I obtain the above results monitoring the whole powerSGD_hook. I am not sure why this is happening. |
There should be now |
@cbalioglu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: ### 🚀 The feature, motivation and pitch Following the discussion in #65813, I added the QR factorization to powerSGD_hook.py Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3. This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods: ![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png) ### Alternatives Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_. ### Additional context _No response_ Pull Request resolved: #72043 Reviewed By: albanD Differential Revision: D34042781 Pulled By: cbalioglu fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462
Hey @younik. |
Summary: ### 🚀 The feature, motivation and pitch Following the discussion in pytorch/pytorch#65813, I added the QR factorization to powerSGD_hook.py Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3. This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods: ![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png) ### Alternatives Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_. ### Additional context _No response_ Pull Request resolved: pytorch/pytorch#72043 Reviewed By: albanD Differential Revision: D34042781 Pulled By: cbalioglu fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462 (cherry picked from commit f64bf3839aad795fc0ad12da15fa2e9a0decf5ab)
Summary: ### 🚀 The feature, motivation and pitch Following the discussion in pytorch/pytorch#65813, I added the QR factorization to powerSGD_hook.py Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3. This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods: ![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png) ### Alternatives Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_. ### Additional context _No response_ Pull Request resolved: pytorch/pytorch#72043 Reviewed By: albanD Differential Revision: D34042781 Pulled By: cbalioglu fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462 (cherry picked from commit f64bf3839aad795fc0ad12da15fa2e9a0decf5ab)
🚀 The feature, motivation and pitch
Following the discussion in #65813, I added the QR factorization to powerSGD_hook.py
Gram-Schmidt orthogonalization can't be fully replaced because torch.linalg.qr doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3.
This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods:
![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png)
Alternatives
Use torch.orgqr(*torch.geqrf(matrix)). From my tests it performances are similar to torch.linalg.qr.
Additional context
No response