Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Ring Communications #43

Closed
ccecka opened this issue Aug 23, 2016 · 1 comment
Closed

Expose Ring Communications #43

ccecka opened this issue Aug 23, 2016 · 1 comment

Comments

@ccecka
Copy link

ccecka commented Aug 23, 2016

This is an excellent and necessary library. My understanding is that each collective communication is implemented via ring communications. If this is the case, a large class of problems (e.g. halo communications) could benefit greatly from exposing the collective ring communication as another primitive.

I imagine this could look similar to MPI's virtual topology:
https://computing.llnl.gov/tutorials/mpi/#Virtual_Topologies
where the ncclComm (or a wrapper-like object) would be exposed as a ring_communicator that could be passed to ring_rank, ring_coord, ring_shift, send, recv, and sendrecv-like functions.

I was going to take a quick crack at this, but thought I would get some feedback from the experts first.

@sjeaugey
Copy link
Member

sjeaugey commented May 6, 2019

While NCCL 1.3 was only relying on a single ring, it has evolved to use multiple rings to use all NVLinks and network cards, so that a rank no longer has a unique index.

More recently, NCCL also started to use trees for small/medium size operations.

So the ring NCCL creates is neither unique nor the best way to communicate for e.g. a halo exchange. Halo exchanges should be handled by point-to-point operations when they are implemented.

Also closing as this is an old request ; feel free to reopen / follow up.

@sjeaugey sjeaugey closed this as completed May 6, 2019
minsii added a commit to minsii/nccl that referenced this issue Mar 16, 2024
Summary:
Pull Request resolved: facebookresearch#43

Introduce user facing API ncclCommDump. It internally dumps NCCL internal state including:
- Basic comm metadata
- Pending, past, current collective kernels via CollTrace
- Past collectives and active network operations from ProxyTrace.

See details in design doc: https://docs.google.com/document/d/1ReXt2IKsjlzCUyi8bN4o5aFlmOYq7_FgduvOkqp95k4/edit?usp=sharing

Reviewed By: YulunW

Differential Revision: D53792058

fbshipit-source-id: 497984035614dff96c15bbfe7d86f74b86930f79
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants