Skip to content

Cherry pick sort merge fixes 52#36

Merged
xudong963 merged 3 commits intobranch-52from
cherry-pick-sort-merge-fixes-52
Mar 23, 2026
Merged

Cherry pick sort merge fixes 52#36
xudong963 merged 3 commits intobranch-52from
cherry-pick-sort-merge-fixes-52

Conversation

@xudong963
Copy link
Collaborator

No description provided.

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR fixes memory reservation starvation in sort-merge when multiple
sort partitions share a GreedyMemoryPool.

When multiple `ExternalSorter` instances run concurrently and share a
single memory pool, the merge phase starves:

1. Each partition pre-reserves sort_spill_reservation_bytes via
merge_reservation
2. When entering the merge phase, new_empty() was used to create a new
reservation starting at 0 bytes, while the pre-reserved bytes sat idle
in ExternalSorter.merge_reservation
3. Those freed bytes were immediately consumed by other partitions
racing for memory
4. The merge could no longer allocate memory from the pool → OOM /
starvation

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

~~I can't find a deterministic way to reproduce the bug, but it occurs
in our production.~~ Add an end-to-end test to verify the fix

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xudong963 xudong963 force-pushed the cherry-pick-sort-merge-fixes-52 branch from 4cc2ec5 to 3b08f75 Compare March 23, 2026 02:27
The cherry-picked commit from branch-51 used `get_reserved_byte_for_record_batch_size`
(1 param), but branch-52 has `get_reserved_bytes_for_record_batch_size` (2 params).
Update the call site to use the branch-52 function signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xudong963 xudong963 merged commit 795aa28 into branch-52 Mar 23, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants