Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need to manually activate the peer2peer communication between GPUs. #68

Closed
nouiz opened this issue Feb 7, 2017 · 6 comments
Closed

Comments

@nouiz
Copy link

nouiz commented Feb 7, 2017

Do we need to enable the peer2peer communication betweens GPU manually or NCCL does it automatically/don't need it?

@sjeaugey
Copy link
Member

sjeaugey commented Feb 7, 2017

NCCL will enable P2P if needed, but will not fail if already enabled.

@nouiz
Copy link
Author

nouiz commented Feb 7, 2017

thanks!

@nouiz nouiz closed this as completed Feb 7, 2017
@svnoesis
Copy link

I am observing that there is no P2P communication seen in nvprof when using BVLC caffe with NCCL for multi-gpu case. In the caffe version without NCCL, I could see the P2P between GPUs. Is there a reason why P2P is not being used by NCCL ?

@sjeaugey
Copy link
Member

P2P is used, but through CUDA kernels. So you will not see explicit P2P cudaMemcpy operations, but CUDA kernels doing computation as well as remote P2P writes.

@pseudotensor
Copy link

Problem is cuda-memcheck will still complain about it already being enabled, which makes it hard to use when debugging nccl applications. cuda-memcheck complains even if no other problems with the application. It repeats this error message for every device communicator being initialized.

NCCL: Using devices

Rank 0 uses device 0 [0x01] GeForce GTX TITAN X

Rank 1 uses device 1 [0x02] GeForce GTX TITAN X

Rank 2 uses device 2 [0x03] GeForce GTX TITAN X

========= CUDA-MEMCHECK
========= Program hit cudaErrorPeerAccessAlreadyEnabled (error 50) due to "peer access is already enabled" on CUDA API call to cud
aDeviceEnablePeerAccess.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.8.0 (cudaDeviceEnablePeerAccess + 0x1a9) [0x38f29]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 [0x56c2]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 (ncclCommInitAll + 0x646) [0x7a66]

@cliffwoolley
Copy link
Collaborator

cliffwoolley commented May 25, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants