Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

Open
ZhiyiHu1999 opened this issue May 27, 2024 · 1 comment

Comments

@ZhiyiHu1999
Copy link

ZhiyiHu1999 commented May 27, 2024

Hello, I want to ask how collective operations defined in collectives.cc call runRing, runTreeUpDown, and runTreeSplit functions. For example, we have ncclAllReduce() function defined in collectives.cc, how does this function call runRing, runTreeUpDown, and runTreeSplit functions defined in all_reduce.h to run these algorithms. In addition, how does ncclAllReduce() function choose which algorithm to use? ( I barely found a file that includes all_reduce.h, which may strain my confusion). Thanks a lot!

@Hizhaoyuan
Copy link

Hizhaoyuan commented May 28, 2024

The ncclLaunchKernel function plays a pivotal role, being responsible for initiating the execution of NCCL kernels. The implementation of this function relies on CUDA's cudaLaunchKernel API, which is used to enqueue the NCCL kernel for execution.

To thoroughly understand this process, it is essential to delve into the execution mechanism of CUDA kernels. Within the implementation of NCCL, cudaLaunchKernel is the key function that triggers the kernel execution. It accepts a pointer as its first argument, which points to a CUDA kernel function that conforms to a specific signature.

In the source code of NCCL, you may notice that header files such as all_reduce.h and all_gather.h define functions with the device attribute. These functions are restricted to execute on the device side and are called by functions with the global attribute defined in common.cu. The global functions serve as the entry point for CUDA kernels; they are the targets pointed to by the first argument of the cudaLaunchKernel function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants