Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Fix O(n^2) issues in simple_block sort #19543

Merged
merged 6 commits into from
Oct 21, 2021
Merged

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Oct 20, 2021

Why are these changes needed?

ray.data.range(int(100e6)).sort() is much faster after this change. This is a quick fix; we should profile these methods more carefully prior to moving out of experimental.

@@ -286,3 +300,20 @@ def merge_sorted_blocks(
indices = pyarrow.compute.sort_indices(ret, sort_keys=key)
ret = ret.take(indices)
return ret, ArrowBlockAccessor(ret).get_metadata(None)


def _copy_table(table: "pyarrow.Table") -> "pyarrow.Table":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Borrowed from #19534

python/ray/data/impl/arrow_block.py Show resolved Hide resolved
@ericl ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Oct 20, 2021
@ericl ericl merged commit 48ecb1f into ray-project:master Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants