Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] add support for multiple group keys in map_groups #40778

Merged
merged 4 commits into from
Nov 19, 2023

Conversation

wingkitlee0
Copy link
Contributor

@wingkitlee0 wingkitlee0 commented Oct 30, 2023

Why are these changes needed?

Need to fix get_key_boundaries in order to use multiple keys in map_groups

Related issue number

Closes #40774

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@wingkitlee0 wingkitlee0 force-pushed the map-groups-multi-keys branch 4 times, most recently from 77ea576 to 71bbb60 Compare November 1, 2023 03:25
@wingkitlee0
Copy link
Contributor Author

FYI @LeonLuttenberger
I am trying to resolve the error I saw in #40774

@wingkitlee0 wingkitlee0 marked this pull request as ready for review November 1, 2023 03:37
@wingkitlee0 wingkitlee0 force-pushed the map-groups-multi-keys branch 2 times, most recently from 4ba55fb to 1dd29f4 Compare November 4, 2023 00:45
Comment on lines 332 to 333
x = np.empty(1, dtype=object)
x[0] = arr[start]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? Also, could we rename x with a more descriptive name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After constructing an array of tuples, the target value x of the np.searchsorted needs to be a single-element tuple ndarray. Otherwise, searchsorted complains about the missing < operator. Alternatively, we can add a small custom tuple class with __lt__ method instead of using built-in tuples for this PR.

python/ray/data/grouped_data.py Outdated Show resolved Hide resolved
python/ray/data/tests/test_all_to_all.py Outdated Show resolved Hide resolved
@wingkitlee0
Copy link
Contributor Author

@bveeramani Thanks for taking a look. I left some comments and will update the PR in a few days.

@bveeramani
Copy link
Member

Thanks! Will take a look sometime next week

Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

Once remaining review comments get resolved and CI passes, we should be good to merge

python/ray/data/grouped_data.py Show resolved Hide resolved
python/ray/data/grouped_data.py Show resolved Hide resolved
python/ray/data/grouped_data.py Outdated Show resolved Hide resolved
Signed-off-by: Kit Lee <wklee4993@gmail.com>
@bveeramani bveeramani merged commit 1a986de into ray-project:master Nov 19, 2023
18 of 19 checks passed
@wingkitlee0 wingkitlee0 deleted the map-groups-multi-keys branch November 19, 2023 16:26
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
…#40778)

Need to fix get_key_boundaries in order to use multiple keys in map_groups

Signed-off-by: Kit Lee <wklee4993@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[data] add support for multiple group keys when using map_groups
2 participants