Skip to content

Generalize NormalizedPartitioning class#22246

Merged
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
rjzamora:normalized-partitioning-rewrite
Apr 23, 2026
Merged

Generalize NormalizedPartitioning class#22246
rapids-bot[bot] merged 5 commits intorapidsai:mainfrom
rjzamora:normalized-partitioning-rewrite

Conversation

@rjzamora
Copy link
Copy Markdown
Member

Description

The NormalizedPartitioning is very HashScheme specific. It promotes usage that focuses on the effective hash modulus used for inter-rank and local partitioning. When we adopt OrderScheme (rapidsai/rapidsmpf#853), we will need to compare other descriptions (e.g. sort boundaries). This PR revises the NormalizedPartitioning to reflect the reality of the underlying Partitioning "scheme" more directly.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora self-assigned this Apr 21, 2026
@rjzamora rjzamora requested a review from a team as a code owner April 21, 2026 22:25
@rjzamora rjzamora requested a review from madsbk April 21, 2026 22:25
@rjzamora rjzamora added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 21, 2026
@github-actions github-actions Bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Apr 21, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Apr 21, 2026
Comment on lines -654 to +662
inter_rank_modulus: int
"""The inter-rank modulus."""
inter_rank_indices: tuple[int, ...]
"""The inter-rank column indices."""
local_modulus: int | None
"""The local modulus."""
local_indices: tuple[int, ...]
"""The local column indices."""
inter_rank_scheme: HashScheme | None
local_scheme: HashScheme | None | Literal["inherit"]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the central change here. Everything else is mostly needed to align with this change/simplification.

Comment thread python/cudf_polars/cudf_polars/experimental/rapidsmpf/utils.py
Copy link
Copy Markdown
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, left only some nits. Thanks Rick.

Comment on lines +691 to +699
lhs_loc = self.local_scheme
rhs_loc = other.local_scheme
if type(lhs_loc) is not type(rhs_loc):
return False
if isinstance(lhs_loc, HashScheme) and isinstance(rhs_loc, HashScheme):
return lhs_loc.modulus == rhs_loc.modulus and len(
lhs_loc.column_indices
) == len(rhs_loc.column_indices)
return True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lhs_loc = self.local_scheme
rhs_loc = other.local_scheme
if type(lhs_loc) is not type(rhs_loc):
return False
if isinstance(lhs_loc, HashScheme) and isinstance(rhs_loc, HashScheme):
return lhs_loc.modulus == rhs_loc.modulus and len(
lhs_loc.column_indices
) == len(rhs_loc.column_indices)
return True
if lhs_loc != rhs_loc:
if not (isinstance(lhs_loc, HashScheme) and isinstance(rhs_loc, HashScheme)):
return False
return lhs_loc.modulus == rhs_loc.modulus and len(lhs_loc.column_indices) == len(rhs_loc.column_indices)
return True

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is generally easier to read and doesn't rely on identity of type() in favor of a proper isinstance().

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to resolve the logic, but did it a bit differently.

Comment thread python/cudf_polars/cudf_polars/experimental/rapidsmpf/utils.py
Comment thread python/cudf_polars/tests/experimental/test_metadata.py
Comment thread python/cudf_polars/tests/experimental/test_metadata.py
@rjzamora rjzamora force-pushed the normalized-partitioning-rewrite branch from 5c40703 to 1865865 Compare April 23, 2026 18:00
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Apr 23, 2026
Copy link
Copy Markdown
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Rick!

@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Apr 23, 2026
@rjzamora
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot Bot merged commit 3f57eff into rapidsai:main Apr 23, 2026
108 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in cuDF Python Apr 23, 2026
@rjzamora rjzamora deleted the normalized-partitioning-rewrite branch April 24, 2026 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants