-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Report amount spilled by a Dataset in Dataset.stats #38847
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
ray 2.8
Comments
stephanie-wang
added
enhancement
Request for new feature and/or capability
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
data
Ray Data-related issues
ray 2.8
P1
Issue that should be fixed within a few weeks
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Aug 24, 2023
8 tasks
stephanie-wang
pushed a commit
that referenced
this issue
Sep 13, 2023
Adds bytes spilled/restored to DatasetStats after a plan finishes execution. Refactors internal_api a bit so that we can get the actual reply instead of a prettified string. First part of #38847. Next steps would be to report spilling of individual blocks from ray. --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
8 tasks
simonsays1980
pushed a commit
to simonsays1980/ray
that referenced
this issue
Sep 15, 2023
…#39361) Adds bytes spilled/restored to DatasetStats after a plan finishes execution. Refactors internal_api a bit so that we can get the actual reply instead of a prettified string. First part of ray-project#38847. Next steps would be to report spilling of individual blocks from ray. --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
stephanie-wang
pushed a commit
that referenced
this issue
Sep 26, 2023
Records spill stats for individual datasets. For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done. Related issue number Closes #38847 --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
simonsays1980
pushed a commit
to simonsays1980/ray
that referenced
this issue
Sep 26, 2023
Records spill stats for individual datasets. For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done. Related issue number Closes ray-project#38847 --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com>
vymao
pushed a commit
to vymao/ray
that referenced
this issue
Oct 11, 2023
…#39361) Adds bytes spilled/restored to DatasetStats after a plan finishes execution. Refactors internal_api a bit so that we can get the actual reply instead of a prettified string. First part of ray-project#38847. Next steps would be to report spilling of individual blocks from ray. --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
vymao
pushed a commit
to vymao/ray
that referenced
this issue
Oct 11, 2023
Records spill stats for individual datasets. For now only recorded for MapOperator since it is the only operator that deletes blocks after it's done. Related issue number Closes ray-project#38847 --------- Signed-off-by: Andrew Xue <andrewxue@anyscale.com> Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
data
Ray Data-related issues
enhancement
Request for new feature and/or capability
P1
Issue that should be fixed within a few weeks
ray 2.8
Description
It's useful to know how many bytes were spilled/restored during a Dataset execution. We should report this in Dataset.stats().
Use case
Debugging Dataset spilling behavior, differentiating amount spilled by Dataset vs other/previous jobs.
The text was updated successfully, but these errors were encountered: