-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Account internal inqueue to previous operator's memory usage #42851
Conversation
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Hao Chen <chenh1024@gmail.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
# Spilled memory size in the object store. | ||
obj_store_mem_spilled: int = field( | ||
default=0, metadata={"map_only": True, "export_metric": True} | ||
) | ||
|
||
obj_store_mem_internal_inqueue: int = 0 | ||
obj_store_mem_internal_outqueue: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These metrics currently only work for map ops. I'm wondering if we can also extend them to also support other ops. So 1) we can report them to the dashboard for all ops; 2) we don't have 2 places reporting memory metrics (here and internal_inqueue_memory_usage).
The on_input_received
function is already called on all ops. I think we'll also need to add other callback APIs like on_input_freed
, on_output_generated
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that too.
OpRuntimeMetrics
is tightly coupled with MapOperator
, and this PR introduces a layer of indirection with PhysicalOperator.*_memory_usage()
and OpRuntimeMetrics.obj_store_mem_*
.
One concern I have is that if we rely on metrics directly, we need to ensure that they're updated correctly for all operators (for example, we need to make sure all operators call on_output_generated
, or else obj_store_mem_internal_outqueue
will be incorrect). But, I'm not sure how we can enforce that with the PhysicalOperator
interface.
While I agree multiple abstractions for memory reporting isn't ideal, I think it might be the easiest way to implement this? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting tradeoffs from offline discussion:
Disadvantages of adding methods to PhysicalOperator:
- OpRuntimeMetrics is tightly coupled with MapOperator
- Layer of indirection for MapOperator
- Multiple abstractions for memory reporting
Disadvantages of using OpRuntimeMetrics only:
- Updating metrics is implicitly required (not part of interface)
Given these tradeoffs, I've opened a new PR with the second approach https://github.com/ray-project/ray/pull/42930/files
Signed-off-by: Balaji Veeramani <balaji@anyscale.com>
Closing in favor of #42930 |
Stacked on
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.