Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time streaming exec scheduling #43112

Merged
merged 6 commits into from Feb 16, 2024

Conversation

omatthew98
Copy link
Contributor

@omatthew98 omatthew98 commented Feb 12, 2024

Why are these changes needed?

Currently we are not timing how much time is spent during scheduling in the streaming executor. This times the total process_time for the scheduling steps / calls to _scheduling_loop_step. This stat is included in DatasetStats and a later PR will include this and other StreamingExecutor stats into the DatasetStatsSummary.

Related issue number

Closes #42797

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Manual testing
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98
Copy link
Contributor Author

Conducted some manual testing with the following snippet (and adding a line to log the time from within StreamingExecutor.run):

def sleep(x):
    time.sleep(0.5)
    return x

num_rows = sys.argv[1] if len(sys.argv) > 1 else 10

ds = ray.data.range(num_rows).map(sleep)

for _ in ds.iter_batches(batch_size=1):
    continue

For num_rows = 100 the total scheduling time was 0.23742400000000297 and for num_rows = 1000 the total scheduling time was 1.063609000000004. More comprehensive testing will be added in followup pr which will update the metrics being reported out through DatasetStatsSummary.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <mowen@anyscale.com>
Signed-off-by: Matthew Owen <mowen@anyscale.com>
@c21 c21 merged commit 9641c72 into ray-project:master Feb 16, 2024
9 checks passed
@can-anyscale
Copy link
Collaborator

Seem to broke linux://python/ray/data:test_streaming_integration

@can-anyscale
Copy link
Collaborator

I'm trying to revert to unblock test failures

can-anyscale added a commit that referenced this pull request Feb 20, 2024
can-anyscale added a commit that referenced this pull request Feb 20, 2024
khluu pushed a commit that referenced this pull request Feb 21, 2024
omatthew98 added a commit that referenced this pull request Feb 26, 2024
omatthew98 added a commit to omatthew98/ray that referenced this pull request Feb 26, 2024
omatthew98 added a commit to omatthew98/ray that referenced this pull request Feb 26, 2024
)" (ray-project#43283)"

This reverts commit 2b92f57.

Signed-off-by: Matthew Owen <mowen@anyscale.com>
c21 pushed a commit that referenced this pull request Feb 26, 2024
…)" (#43433)

This adds an extra `None` check to fix test failures if `self._initial_stats` is not set. This reverts #43283 and restores the changes made in #43112 .

Signed-off-by: Matthew Owen <mowen@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data] Measure overhead from scheduler (streaming executor) in DatasetStats
5 participants