How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

ZhiyiHu1999 · 2024-05-27T21:48:41Z

Hello, I want to ask how collective operations defined in collectives.cc call runRing, runTreeUpDown, and runTreeSplit functions. For example, we have ncclAllReduce() function defined in collectives.cc, how does this function call runRing, runTreeUpDown, and runTreeSplit functions defined in all_reduce.h to run these algorithms. In addition, how does ncclAllReduce() function choose which algorithm to use? ( I barely found a file that includes all_reduce.h, which may strain my confusion). Thanks a lot!

Hizhaoyuan · 2024-05-28T10:15:26Z

The ncclLaunchKernel function plays a pivotal role, being responsible for initiating the execution of NCCL kernels. The implementation of this function relies on CUDA's cudaLaunchKernel API, which is used to enqueue the NCCL kernel for execution.

To thoroughly understand this process, it is essential to delve into the execution mechanism of CUDA kernels. Within the implementation of NCCL, cudaLaunchKernel is the key function that triggers the kernel execution. It accepts a pointer as its first argument, which points to a CUDA kernel function that conforms to a specific signature.

In the source code of NCCL, you may notice that header files such as all_reduce.h and all_gather.h define functions with the device attribute. These functions are restricted to execute on the device side and are called by functions with the global attribute defined in common.cu. The global functions serve as the entry point for CUDA kernels; they are the targets pointed to by the first argument of the cudaLaunchKernel function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

ZhiyiHu1999 commented May 27, 2024 •

edited

Loading

Hizhaoyuan commented May 28, 2024 •

edited

Loading

How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

How does collective operations call runRing, runTreeUpDown, and runTreeSplit #1300

Comments

ZhiyiHu1999 commented May 27, 2024 • edited Loading

Hizhaoyuan commented May 28, 2024 • edited Loading

ZhiyiHu1999 commented May 27, 2024 •

edited

Loading

Hizhaoyuan commented May 28, 2024 •

edited

Loading