Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recommended NCCL versions? #40

Closed
QiaofengQin opened this issue Dec 9, 2021 · 1 comment
Closed

recommended NCCL versions? #40

QiaofengQin opened this issue Dec 9, 2021 · 1 comment

Comments

@QiaofengQin
Copy link

Hi,

I am running the Collective-Comms benchmark with 2 single-GPU hosts communicating through RoCEv2. I found the latency seems inconsistent between different NCCL versions.

There is always a constant additional latency of around 10 ms when I use NCCL 2.10 as the backend, no matter how large the tensor size is. However, if I recompile the PyTorch with NCCL 2.7 as the backend, the latency is much smaller. I was wondering if specific versions of NCCL are required for this script?

I have attached the output when running the following command with NCCL 2.10:

~/conda_env/bin/mpirun -np 2 -N 1 --host 11.0.0.1,11.0.0.2 ~/conda_env/bin/python /data/param/train/comms/pt/comms.py --master-ip 11.0.0.1 --backend nccl --device cuda --b 8 --e 256M --n 20 --f 2 --z 1 --collective all_reduce

output

I also made a comparison of the latency (in microseconds) among nccl-tests, PARAM with NCCL 2.7 and PARAM with NCCL 2.10. Should we expect a similar performance as nccl-tests in normal cases?

Size (B) nccl-tests PARAM (nccl 2.7) PARAM (nccl 2.10)
8 150 47.1 11991.1
16 158.3 46.1 10393.1
32 143.7 45.6 11396.5
64 113.6 45.5 12362.4
128 112 46.3 11889.3
256 114.7 47.3 9837.3
512 113.8 46.5 11046.1
1024 117.2 47.2 12022.6
2048 132.7 47.9 10952.3
4096 137 62.2 11646.1
8192 121.3 53.5 12466.5
16384 132 57.6 10948.6
32768 121.5 62.9 13337.2
65536 212.3 75 11786
131072 229.8 80.9 10927.1
262144 237.4 95.4 14178.9
524288 275.3 137.1 10649.8
1048576 352.5 212 11185.1
2097152 525.9 364.3 10624.6
4194304 780 1100.6 13422.3
8388608 1418.6 2883.2 12564.5
16777216 2659.3 7575.9 14743.1
33554432 4795 18189.6 18961.8
67108864 9198.5 36459.1 35850.3
134217728 18129 73286.1 71411.7
268435456 36005 151070.3 154358.7

Our system information:

  • OS: Ubuntu 20.04
  • Network Interface: Mellanox mlx5
@louisfeng
Copy link
Contributor

This issue has been stale for too long. Will close for now. If issue still persist, please let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants