Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Peer-to-peer access is unsupported on this platform #39

Closed
Youhe-Jiang opened this issue Aug 12, 2024 · 2 comments
Closed

Error: Peer-to-peer access is unsupported on this platform #39

Youhe-Jiang opened this issue Aug 12, 2024 · 2 comments

Comments

@Youhe-Jiang
Copy link

Hi, the problem is:

(ParaWorker pid=1955026) Error: Peer-to-peer access is unsupported on this platform.
(ParaWorker pid=1955026) In the current version of distserve, it is necessary to use a platform that supports GPU P2P access.
(ParaWorker pid=1955026) Exiting...

I face a problem like this, but I actually checked the P2P connection between the two GPUs, and I tried the following codes for testing the P2P connection between GPUs:

tensor_a = torch.randn(10, device="cuda:0")
try:
    # Attempt to directly copy tensor_a from GPU 0 to GPU 1
    tensor_b = tensor_a.to("cuda:1")
    print("Successfully copied tensor from GPU 0 to GPU 1 using P2P.")
except RuntimeError as e:
    print("Failed to copy tensor from GPU 0 to GPU 1 using P2P. Error:", e)

and the output is:

Successfully copied tensor from GPU 0 to GPU 1 using P2P.

and the GPU topo is:

image image

can you provide any suggestions?

thank you!

@Youhe-Jiang
Copy link
Author

The GPUs I use are RTX 4090s

@Youhe-Jiang
Copy link
Author

Yeah I solved the problem, seems that RTX 4090/PCIe connections cannot support DistServe, we run DistServe successfully on A100 machines with NVLINKs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant