Skip to content

TL/UCP: fix onesided a2a alg selection for 1ppn#1247

Merged
Sergei-Lebedev merged 2 commits intoopenucx:masterfrom
wfaderhold21:topic/fix_1ppn_a2a
Jan 16, 2026
Merged

TL/UCP: fix onesided a2a alg selection for 1ppn#1247
Sergei-Lebedev merged 2 commits intoopenucx:masterfrom
wfaderhold21:topic/fix_1ppn_a2a

Conversation

@wfaderhold21
Copy link
Collaborator

What

When using 1 PPN, algorithm will return UCC_ERR_NOT_SUPPORTED. This defaults to the PUT algorithm.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 18, 2025

Greptile Summary

Fixed the onesided alltoall algorithm selection for 1 process per node (PPN) scenarios. Previously, when the node subgroup didn't exist (indicating 1 PPN), the function would return UCC_ERR_NOT_SUPPORTED. The fix introduces a group_size variable initialized to 1, and when the node subgroup doesn't exist with AUTO algorithm selection, it defaults to the PUT algorithm and uses group_size=1 for subsequent calculations instead of dereferencing the non-existent subgroup.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The fix is straightforward and properly handles the edge case of 1 PPN by avoiding dereferencing a non-existent subgroup. The logic correctly defaults to PUT algorithm when AUTO is selected and uses a safe default of group_size=1 for calculations.
  • No files require special attention

Important Files Changed

Filename Overview
src/components/tl/ucp/alltoall/alltoall_onesided.c Fixed 1 PPN algorithm selection by using PUT when node subgroup doesn't exist and defaulting group_size to 1

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 282 (link)

    logic: when sbgp->status == UCC_SBGP_NOT_EXISTS (1 PPN case), sbgp->group_size is uninitialized. this line will compute ratio using garbage value, causing incorrect token calculation

  2. src/components/tl/ucp/alltoall/alltoall_onesided.c, line 290-291 (link)

    logic: when sbgp->status == UCC_SBGP_NOT_EXISTS (1 PPN case), comparing uninitialized sbgp->group_size against CONGESTION_THRESHOLD will cause undefined behavior. should check sbgp status first

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@Sergei-Lebedev
Copy link
Contributor

@wfaderhold21 i think comments from greptile are relevant

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 6, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@wfaderhold21
Copy link
Collaborator Author

/build

@Sergei-Lebedev
Copy link
Contributor

/build

@Sergei-Lebedev Sergei-Lebedev enabled auto-merge (squash) January 16, 2026 13:42
@Sergei-Lebedev
Copy link
Contributor

/build

@Sergei-Lebedev Sergei-Lebedev merged commit bda5ea1 into openucx:master Jan 16, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants