-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[ROCm] rccl performance improvement via env var #76985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful links
❌ 12 New Failures, 1 Base FailuresAs of commit 0eae330 (more details on the Dr. CI page): Expand to see more
🕵️ 12 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
|
@pytorchbot rebase this |
|
@pytorchbot merge on green |
|
@pytorchbot rebase this please |
|
@pytorchbot merge this please |
|
Hey @jeffdaily. |
Summary: The env var HSA_FORCE_FINE_GRAIN_PCIE=1 enables P2P communication in RCCL without intermediate buffers. This is necessary on hosts with only PCIe and no P2P high-speed interconnect. Pull Request resolved: #76985 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/614e0459215c677bf686680454520dfaa2867359 Reviewed By: malfet Differential Revision: D36299700 Pulled By: malfet fbshipit-source-id: 6ba81808ba5f787370805c3f125e66fb0458a261
The env var HSA_FORCE_FINE_GRAIN_PCIE=1 enables P2P communication in RCCL without intermediate buffers. This is necessary on hosts with only PCIe and no P2P high-speed interconnect.