Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data][Split] stable version of split with hints #26778

Merged
merged 7 commits into from
Jul 24, 2022

Conversation

scv119
Copy link
Contributor

@scv119 scv119 commented Jul 20, 2022

Signed-off-by: scv119 scv119@gmail.com

Why are these changes needed?

Introduce a stable version of split with hints with a stable equalizing algorithm:

  1. use the greedy algorithm to generate the initial unbalanced splits.
  2. for each splits, first shave them so the number for rows are below the target_size
  3. based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
  4. merge the shaved splits with the leftover splits.

The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@scv119 scv119 force-pushed the optimize-3 branch 3 times, most recently from f6120b2 to fbf1992 Compare July 20, 2022 09:59
@scv119 scv119 changed the title [Data][Split] Proper fix split with hints [Data][Split] stable version of split with hints Jul 20, 2022
@scv119 scv119 marked this pull request as ready for review July 20, 2022 10:00
BlockRefWithMeta = Tuple[ObjectRef[Block], BlockMetadata]


def _equalize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add unit tests for these?

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall algorithm seems sensible. It would be great to also pull out the greedy split algorithm into a helper function / unit test it, but that can be left for another PR.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 20, 2022
@scv119 scv119 added the do-not-merge Do not merge this PR! label Jul 21, 2022
@scv119 scv119 force-pushed the optimize-3 branch 2 times, most recently from a8c3270 to e923f1f Compare July 23, 2022 23:18
@scv119 scv119 removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. do-not-merge Do not merge this PR! labels Jul 23, 2022
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 23, 2022
Signed-off-by: scv119 <scv119@gmail.com>
Signed-off-by: scv119 <scv119@gmail.com>
Signed-off-by: scv119 <scv119@gmail.com>
Signed-off-by: scv119 <scv119@gmail.com>
Signed-off-by: scv119 <scv119@gmail.com>
Signed-off-by: scv119 <scv119@gmail.com>
@scv119 scv119 added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Jul 24, 2022
@scv119 scv119 merged commit aaab4ab into ray-project:master Jul 24, 2022
yaxife pushed a commit to alipay/ant-ray that referenced this pull request Jul 26, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: nanqi.yxf <nanqi.yxf@antgroup.com>
klwuibm pushed a commit to yuanchi2807/ray that referenced this pull request Jul 27, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: klwuibm <kwu888@gmail.com>
Catch-Bull pushed a commit to alipay/ant-ray that referenced this pull request Jul 27, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: Catch-Bull <burglarralgrub@gmail.com>
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: Rohan138 <rapotdar@purdue.edu>
franklsf95 pushed a commit to franklsf95/ray that referenced this pull request Aug 2, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: Frank Luan <lsf@berkeley.edu>
gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: Scott Graham <scgraham@microsoft.com>
gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
Why are these changes needed?
Introduce a stable version of split with hints with a stable equalizing algorithm:

use the greedy algorithm to generate the initial unbalanced splits.
for each splits, first shave them so the number for rows are below the target_size
based on how many rows needed for each split, do a one time split_at_index to the left over blocks.
merge the shaved splits with the leftover splits.
The guarantee of this algorithm is we at most need to split O(split) number of blocks.

Signed-off-by: Stefan van der Kleij <s.vanderkleij@viroteq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants