Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Report amount spilled by a Dataset in Dataset.stats #38847

Closed
stephanie-wang opened this issue Aug 24, 2023 · 0 comments · Fixed by #39678
Closed

[data] Report amount spilled by a Dataset in Dataset.stats #38847

stephanie-wang opened this issue Aug 24, 2023 · 0 comments · Fixed by #39678
Assignees
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks ray 2.8

Comments

@stephanie-wang
Copy link
Contributor

stephanie-wang commented Aug 24, 2023

Description

It's useful to know how many bytes were spilled/restored during a Dataset execution. We should report this in Dataset.stats().

Use case

Debugging Dataset spilling behavior, differentiating amount spilled by Dataset vs other/previous jobs.

@stephanie-wang stephanie-wang added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) data Ray Data-related issues ray 2.8 P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 24, 2023
@stephanie-wang stephanie-wang self-assigned this Aug 24, 2023
@scottjlee scottjlee assigned Zandew and unassigned stephanie-wang Sep 6, 2023
stephanie-wang pushed a commit that referenced this issue Sep 13, 2023
Adds bytes spilled/restored to DatasetStats after a plan finishes execution.

Refactors internal_api a bit so that we can get the actual reply instead of a prettified string.

First part of #38847. Next steps would be to report spilling of individual blocks from ray.

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this issue Sep 15, 2023
…#39361)

Adds bytes spilled/restored to DatasetStats after a plan finishes execution.

Refactors internal_api a bit so that we can get the actual reply instead of a prettified string.

First part of ray-project#38847. Next steps would be to report spilling of individual blocks from ray.

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
stephanie-wang pushed a commit that referenced this issue Sep 26, 2023
Records spill stats for individual datasets.

For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done.
Related issue number

Closes #38847

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
simonsays1980 pushed a commit to simonsays1980/ray that referenced this issue Sep 26, 2023
Records spill stats for individual datasets.

For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done.
Related issue number

Closes ray-project#38847

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
vymao pushed a commit to vymao/ray that referenced this issue Oct 11, 2023
…#39361)

Adds bytes spilled/restored to DatasetStats after a plan finishes execution.

Refactors internal_api a bit so that we can get the actual reply instead of a prettified string.

First part of ray-project#38847. Next steps would be to report spilling of individual blocks from ray.

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
Signed-off-by: Victor <vctr.y.m@example.com>
vymao pushed a commit to vymao/ray that referenced this issue Oct 11, 2023
Records spill stats for individual datasets.

For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done.
Related issue number

Closes ray-project#38847

---------

Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks ray 2.8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants