[Data][WIP] Implement Join on hash shuffle v2 by owenowenisme · Pull Request #64538 · ray-project/ray

owenowenisme · 2026-07-04T18:12:09Z

Description

Blocked by #64532 #64438 , shouls rebase after they get merged, this pr is self-contained right now.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

Generalize ShuffleReduceOp / _shuffle_reduce_task to N co-partitioned inputs (pairing per (partition, input), all-inputs-done flush, tables_by_input). This is now based on master after ray-project#64481, so the PR carries only multi-input support and not the streaming-reduce removal. Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

gemini-code-assist

Code Review

This pull request updates the shuffle execution framework to support multi-input shuffle reduce operations, enabling a new V2 shuffle-based join implementation. Key changes include modifying ShuffleReduceOp to accept multiple upstream ShuffleMapOp inputs, updating the reduce task to process lists of tables grouped by input, and planning joins using the new shuffle path. The review feedback highlights a critical issue where incremental_resource_usage underestimates peak memory for multi-input shuffles by only considering the first dependency, which could lead to out-of-memory errors. Additionally, defensive checks are recommended in hash_shuffle_v2.py to prevent potential IndexErrors when accessing tables_by_input.

gemini-code-assist · 2026-07-04T18:18:27Z

+        return ExecutionResources.from_resource_dict(
+            self._reduce_task_remote_args(memory)
        )


In multi-input shuffles (such as joins), the incremental_resource_usage method underestimates the peak memory requirement because it only calculates the average partition size from the first upstream dependency (self.input_dependencies[0]). This ignores the memory requirements of the other inputs (e.g., the right side of the join), which can lead to severe underestimation of the peak memory required by the reduce task and potentially cause node out-of-memory (OOM) errors due to over-scheduling.\n\nTo fix this, we should aggregate the partition bytes from all upstream dependencies. For example:\n\npython\n def incremental_resource_usage(self) -> ExecutionResources:\n \"\"\"Per-task resource ask for the framework's budget allocator.\"\"\"\n from collections import defaultdict\n combined_partition_bytes = defaultdict(int)\n for upstream in self.input_dependencies:\n assert isinstance(upstream, ShuffleMapOp)\n for part_id, bytes_val in upstream.get_partition_bytes().items():\n combined_partition_bytes[part_id] += bytes_val\n \n memory = 0\n sizes = [b for b in combined_partition_bytes.values() if b > 0]\n if sizes:\n avg_bytes = sum(sizes) / len(sizes)\n memory = int(avg_bytes * SHUFFLE_PEAK_MEMORY_MULTIPLIER)\n return ExecutionResources.from_resource_dict(\n self._reduce_task_remote_args(memory)\n )\n

gemini-code-assist · 2026-07-04T18:18:28Z

    Used under the partition = block contract: called once per partition with
    the full shard list so the output is exactly one block.
    """
+    tables = tables_by_input[0]


For defensive programming, we should ensure tables_by_input is not empty before accessing tables_by_input[0] to avoid potential IndexErrors.

Suggested change

tables = tables_by_input[0]

tables = tables_by_input[0] if tables_by_input else []

gemini-code-assist · 2026-07-04T18:18:28Z

+    def _reduce(
+        partition_id: int, tables_by_input: List[List[pa.Table]]
+    ) -> Iterable[pa.Table]:
+        tables = tables_by_input[0]


For defensive programming, we should ensure tables_by_input is not empty before accessing tables_by_input[0] to avoid potential IndexErrors.

Suggested change

tables = tables_by_input[0]

tables = tables_by_input[0] if tables_by_input else []

owenowenisme added 3 commits July 2, 2026 14:54

update

7977eed

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

update

dccf2a9

Signed-off-by: You-Cheng Lin <mses010108@gmail.com>

gemini-code-assist Bot reviewed Jul 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Data][WIP] Implement Join on hash shuffle v2 #64538

[Data][WIP] Implement Join on hash shuffle v2 #64538
owenowenisme wants to merge 3 commits into
ray-project:masterfrom
owenowenisme:data/join-on-shuffle-reduce

owenowenisme commented Jul 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	tables = tables_by_input[0]
	tables = tables_by_input[0] if tables_by_input else []

Uh oh!

Conversation

owenowenisme commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Additional information

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

owenowenisme commented Jul 4, 2026 •

edited

Loading