New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage TensorPipe's automatic SHM address selection #63028
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 45a6f46 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages: win-vs2019-cuda10.1-py3 / test (default, 1, 1, windows.8xlarge.nvidia.gpu) (1/1)Step: "Run test scripts" (full log | diagnosis details | 🔁 rerun)
|
This pull request was exported from Phabricator. Differential Revision: D30220352 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, test failures are real:
RuntimeError: In addrImplFromLoop at tensorpipe/transport/shm/listener_impl.cc:97 "errorgetsockname: Bad file descriptor (this error originated at tensorpipe/common/socket.cc:129)"
Yep, I'm debugging that, I should have marked this as WIP sorry |
Differential Revision: D30573500 fbshipit-source-id: b0e400d6266ab0a282e13f0cd387198e8d908bc8
Summary: Pull Request resolved: pytorch#63028 TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic. Test Plan: CI Differential Revision: D30220352 fbshipit-source-id: 07642ebc9eb30386ed875530ceda4abdecf3cd23
This pull request was exported from Phabricator. Differential Revision: D30220352 |
Codecov Report
@@ Coverage Diff @@
## master #63028 +/- ##
==========================================
+ Coverage 59.23% 66.75% +7.52%
==========================================
Files 695 695
Lines 90739 90739
==========================================
+ Hits 53751 60577 +6826
+ Misses 36988 30162 -6826 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This pull request has been merged in 48c57b9. |
Summary: TensorPipe until now required PyTorch to come up and provide a unique identifier to use as address for the UNIX domain socket used in the SHM transport. However the Linux kernel can automatically assign an available address (like it does with IP ports), and TensorPipe now supports it, so we can remove that useless PyTorch logic.
Test Plan: CI
Differential Revision: D30220352
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23