New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct P2P GPU <-> GPU communication with torch.to
does not seem to work.
#119638
Comments
Let me add a missing piece of information, he is the result of
|
@morgangiraud I've seen reports of some versions of nvidia driver 545 having broken P2P support. You may want to try upgrading your driver version to |
Thanks for helping me here. I've improved a bit the problem: So if I downgrade to driver 535.* , the driver does not see P2P capabilities anymore and si the code is working but I have a non negligeable troughput penalty Then, I started to look at NCCL to see if I could pinpoint the problem more precisely using the following threads:
Especially I ended up on this exact problem: NVIDIA/nccl#606 (comment) Where If I run NCCL tests with Finally, I duplicated my code to tensorflow and the problem is the same, so this issue is not related directly to pytorch. Anyway, do you know if there is an equivalent to |
@morgangiraud |
Ho all right, so that's why! Do you see any link with the |
I'm not quite sure how |
I see, thanks. I will give a try to the beta drivers |
Well:
So I'm left wondering if I do have P2P capabilities int the end. What is strange is the fact that the Nvidia script |
Driver reporting it supports p2p doesn't always mean that it supports it correctly. That is what I mean by broken p2p support. In your case, it might be that p2p is not supported on your platform but somehow 545 driver thinks it does. Also, |
Yes, you were right. The end result is that the 40 series does not support P2P and driver Thanks a lot for your help! |
No problem. You can link this issue. Also, there's a blog post about 4090 lacking P2P support: https://www.tomshardware.com/news/nvidia-confirms-geforce-cards-lack-p2p-support |
I'm adding the blog post about the post mortem: https://morgangiraud.medium.com/multi-gpu-nvidia-p2p-capabilities-and-debugging-tips-fb7597b4e2b5 |
For anyone coming across this and the great writeup by @morgangiraud – tinygrad just published experimental P2P driver for 4090 https://github.com/tinygrad/open-gpu-kernel-modules, it should be compatible with torch AFAIK |
Yes! I do see some speed improvements but they are not crazy though. |
NCCL all reduce test for the reference: https://pastebin.com/ne4ipn6K 58% speed up in the end. |
🐛 Describe the bug
Hi,
I've been looking at direct GPU <-> GPU communication using the
tensor.to
pytorch function and I've found that it doesn't seem to be able to copy the tensor from one CUDA device to the other directly.I'm sorry if I've missed something obvious but I didn't see anywhere that this shouldn't work as expected
Versions
The text was updated successfully, but these errors were encountered: