Skip to content

[Data] [2/3] Async iter_batches: Add interfaces and helper functions#33605

Merged
ericl merged 32 commits into
ray-project:masterfrom
amogkam:async-iter-batches-2
Mar 25, 2023
Merged

[Data] [2/3] Async iter_batches: Add interfaces and helper functions#33605
ericl merged 32 commits into
ray-project:masterfrom
amogkam:async-iter-batches-2

Conversation

@amogkam
Copy link
Copy Markdown
Contributor

@amogkam amogkam commented Mar 22, 2023

Adds the necessary helper functions for each step of iter_batches

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

amogkam added 10 commits March 21, 2023 18:50
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
amogkam added 2 commits March 22, 2023 16:44
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few quick comments, will look in the implementation more later today.

Comment thread python/ray/data/_internal/block_batching/block_batching.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
amogkam added 4 commits March 22, 2023 18:05
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
…atches-2

Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Mar 23, 2023
Copy link
Copy Markdown
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass, one main thing to block on is I really think that the local shuffling has to happen after block resolution, otherwise that breaks the local shuffling semantics in a big way.

Comment thread python/ray/data/_internal/block_batching/interfaces.py Outdated
Comment thread python/ray/data/_internal/block_batching/interfaces.py Outdated
Comment thread python/ray/data/_internal/block_batching/interfaces.py Outdated
Comment thread python/ray/data/_internal/block_batching/interfaces.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/interfaces.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/interfaces.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/interfaces.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
@ericl
Copy link
Copy Markdown
Contributor

ericl commented Mar 24, 2023

Looks a lot cleaner. Remaining concerns are (1) documenting the big picture strategy clearly, and (2) fixing the possible error in over-prefetching.

Signed-off-by: amogkam <amogkamsetty@yahoo.com>
@amogkam amogkam requested a review from ericl March 24, 2023 20:47
amogkam added 3 commits March 24, 2023 13:49
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
amogkam added 4 commits March 24, 2023 15:24
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py Outdated
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/iter_batches.py
Comment thread python/ray/data/_internal/block_batching/block_batching.py
Comment thread python/ray/data/_internal/block_batching/util.py Outdated
Copy link
Copy Markdown
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock since I mostly have nits and I'll soon be offline until Monday!

amogkam added 5 commits March 24, 2023 16:23
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
Signed-off-by: amogkam <amogkamsetty@yahoo.com>
@amogkam
Copy link
Copy Markdown
Contributor Author

amogkam commented Mar 25, 2023

failing tests are also failing on master, going to merge

@amogkam amogkam added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Mar 25, 2023
@ericl ericl merged commit a8e997b into ray-project:master Mar 25, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants