Skip to content

TL/NCCL: add ncclAlltoAll#1244

Merged
Sergei-Lebedev merged 1 commit intoopenucx:masterfrom
yaeliyac:nccl_a2a_new
Jan 16, 2026
Merged

TL/NCCL: add ncclAlltoAll#1244
Sergei-Lebedev merged 1 commit intoopenucx:masterfrom
yaeliyac:nccl_a2a_new

Conversation

@yaeliyac
Copy link
Contributor

What

As described in NCCL 2.28.3 release notes, NCCL added a new alltoall API.
documentation

This PR updates this (while checking valid NCCL version)

@yaeliyac yaeliyac changed the title add ncclAlltoAll TL/NCCL: ncclAlltoAll Dec 17, 2025
@yaeliyac yaeliyac force-pushed the nccl_a2a_new branch 2 times, most recently from cc77646 to cdd9a96 Compare December 24, 2025 13:30
@yaeliyac yaeliyac changed the title TL/NCCL: ncclAlltoAll TL/NCCL: add ncclAlltoAll Dec 24, 2025
@yaeliyac yaeliyac marked this pull request as ready for review December 24, 2025 14:08
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

Updates the NCCL transport layer to use the native ncclAlltoAll API introduced in NCCL 2.28.0, while maintaining backward compatibility with older versions.

  • Uses the new ncclAlltoAll API when NCCL 2.28.0+ is available, which replaces the previous manual implementation using ncclGroupStart/ncclSend/ncclRecv/ncclGroupEnd
  • Adds proper version guard with NCCL_VERSION_CODE >= NCCL_VERSION(2,28,0) to ensure compatibility
  • Removes unused peer variable declaration from the function scope and moves it inline in the fallback loop (C99 style declaration)
  • The native API should provide better performance compared to the manual send/recv implementation

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is well-contained, properly version-guarded, and maintains full backward compatibility. The implementation correctly uses the new NCCL API with appropriate parameters, and the fallback code remains unchanged for older NCCL versions.
  • No files require special attention

Important Files Changed

Filename Overview
src/components/tl/nccl/tl_nccl_coll.c Adds native ncclAlltoAll support for NCCL 2.28.0+ with proper fallback to send/recv pattern for older versions

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@Sergei-Lebedev
Copy link
Contributor

/build

1 similar comment
@janjust
Copy link
Collaborator

janjust commented Jan 14, 2026

/build

@Sergei-Lebedev
Copy link
Contributor

/build

@Sergei-Lebedev Sergei-Lebedev merged commit 76fd8dd into openucx:master Jan 16, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants