Do we need to manually activate the peer2peer communication between GPUs. #68

nouiz · 2017-02-07T18:38:44Z

Do we need to enable the peer2peer communication betweens GPU manually or NCCL does it automatically/don't need it?

sjeaugey · 2017-02-07T18:42:47Z

NCCL will enable P2P if needed, but will not fail if already enabled.

nouiz · 2017-02-07T18:55:14Z

thanks!

svnoesis · 2017-03-19T18:09:08Z

I am observing that there is no P2P communication seen in nvprof when using BVLC caffe with NCCL for multi-gpu case. In the caffe version without NCCL, I could see the P2P between GPUs. Is there a reason why P2P is not being used by NCCL ?

sjeaugey · 2017-03-20T18:00:31Z

P2P is used, but through CUDA kernels. So you will not see explicit P2P cudaMemcpy operations, but CUDA kernels doing computation as well as remote P2P writes.

pseudotensor · 2017-05-25T05:34:28Z

Problem is cuda-memcheck will still complain about it already being enabled, which makes it hard to use when debugging nccl applications. cuda-memcheck complains even if no other problems with the application. It repeats this error message for every device communicator being initialized.

NCCL: Using devices

Rank 0 uses device 0 [0x01] GeForce GTX TITAN X

Rank 1 uses device 1 [0x02] GeForce GTX TITAN X

Rank 2 uses device 2 [0x03] GeForce GTX TITAN X

========= CUDA-MEMCHECK
========= Program hit cudaErrorPeerAccessAlreadyEnabled (error 50) due to "peer access is already enabled" on CUDA API call to cud
aDeviceEnablePeerAccess.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03]
========= Host Frame:/usr/local/cuda/lib64/libcudart.so.8.0 (cudaDeviceEnablePeerAccess + 0x1a9) [0x38f29]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 [0x56c2]
========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 (ncclCommInitAll + 0x646) [0x7a66]

cliffwoolley · 2017-05-25T06:45:07Z

@pseudotensor - You can tell cuda-memcheck to ignore those API error return values with an extra command line flag; see the --help for details.

…

On May 24, 2017 10:34 PM, "pseudotensor" ***@***.***> wrote: Problem is cuda-memcheck will still complain about it already being enabled, which makes it hard to use when debugging nccl applications. cuda-memcheck complains even if no other problems with the application. It repeats this error message for every device communicator being initialized. NCCL: Using devices Rank 0 uses device 0 [0x01] GeForce GTX TITAN X Rank 1 uses device 1 [0x02] GeForce GTX TITAN X Rank 2 uses device 2 [0x03] GeForce GTX TITAN X ========= CUDA-MEMCHECK ========= Program hit cudaErrorPeerAccessAlreadyEnabled (error 50) due to "peer access is already enabled" on CUDA API call to cud aDeviceEnablePeerAccess. ========= Saved host backtrace up to driver entry point at error ========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2eea03] ========= Host Frame:/usr/local/cuda/lib64/libcudart.so.8.0 (cudaDeviceEnablePeerAccess + 0x1a9) [0x38f29] ========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 [0x56c2] ========= Host Frame:/usr/local/cuda/lib64/libnccl.so.1 (ncclCommInitAll + 0x646) [0x7a66] — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#68 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJO93s0OOx2xDJ4CtYOjkKee9zuUt3O5ks5r9RLlgaJpZM4L530H> .

nouiz closed this as completed Feb 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need to manually activate the peer2peer communication between GPUs. #68

Do we need to manually activate the peer2peer communication between GPUs. #68

nouiz commented Feb 7, 2017

sjeaugey commented Feb 7, 2017

nouiz commented Feb 7, 2017

svnoesis commented Mar 19, 2017

sjeaugey commented Mar 20, 2017

pseudotensor commented May 25, 2017

cliffwoolley commented May 25, 2017 via email

Do we need to manually activate the peer2peer communication between GPUs. #68

Do we need to manually activate the peer2peer communication between GPUs. #68

Comments

nouiz commented Feb 7, 2017

sjeaugey commented Feb 7, 2017

nouiz commented Feb 7, 2017

svnoesis commented Mar 19, 2017

sjeaugey commented Mar 20, 2017

pseudotensor commented May 25, 2017

NCCL: Using devices

Rank 0 uses device 0 [0x01] GeForce GTX TITAN X

Rank 1 uses device 1 [0x02] GeForce GTX TITAN X

Rank 2 uses device 2 [0x03] GeForce GTX TITAN X

cliffwoolley commented May 25, 2017 via email