[Core] Expose fallback_strategy in TaskInfoEntry and ActorTableData#60659
[Core] Expose fallback_strategy in TaskInfoEntry and ActorTableData#60659edoakes merged 13 commits intoray-project:masterfrom
Conversation
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request successfully exposes the fallback_strategy for tasks and actors across Ray's state API and GCS, involving changes to Protobuf definitions, the C++ backend, and the Python state API. The implementation is largely correct and well-aligned with the stated goals. However, I've identified two issues: one in the C++ backend concerning an incorrect check for the fallback_strategy, and another in the Python state API where a field is missing from a dataclass, which would likely cause a runtime error. My review provides specific suggestions to address these points.
edoakes
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the contribution
Could you add an integration test for the state API to assert that the information is populated correctly?
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
|
@edoakes @MengjinYan @israbbani Gentle ping. |
|
@nadongjun there is a merge conflict |
Signed-off-by: Dongjun Na <kmu5544616@gmail.com>
Head branch was pushed to by a user without write access
|
@edoakes Resolved the merge conflict. |
Description
Add observability for
fallback_strategyin State API and GCS.While Ray currently provides visibility for
label_selector(#53423), there is no mechanism to observe thefallback_strategyfrom outside the system.This PR exposes
fallback_strategyinTaskInfoEntry and ActorTableData. The ability to read and recordfallback_strategyis essential for our custom autoscaler development. When primarylabel_selectorconstraints cannot be met, the autoscaler must access these recordedfallback strategiesto prioritize and allocate alternative devices.Beyond autoscaling, adding this feature will provide a better debugging experience by allowing users to transparently track the entire scheduling intent, including the
fallback_strategyfor both tasks and actors.Related issues
Related to #51564
Additional information
GlobalStateAccessor.get_actor_table
ray list actors --detail