-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][observability] Report idle node information in status and dashboard #39638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good to me! some nits and comments.
So i guess the actual printing or showing on the dashboard part will be in another PR?
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me!
The only thing I wanna double check is just the Snapshot status setting logic (if the idle one will get overridden), and if we could have a test that tests the new node activity thing e2e? I think so far it's either being mocked. If e2e is non trivial, then some manual tests with ray status -v
is also fine, and some unit testing.
And a couple of nits, including
- Write in the PR description for what's being changed here in terms of output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the PR description?
Ideally also show some example outputs of status and dashboard after your change.
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idle time thing being done outside of this PR? If so, let's document it in a TODO.
for (const auto &iter : last_idle_times_) { | ||
if (iter.second == absl::nullopt) { | ||
// If it is a WorkFootprint | ||
if (iter.first.index() == 0) { | ||
switch (std::get<WorkFootprint>(iter.first)) { | ||
case WorkFootprint::NODE_WORKERS: | ||
node_activity << " Node currently has leased workers." << std::endl; | ||
resources_data.add_node_activity("Busy workers on node."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resources_data.add_node_activity("Busy workers on node."); | |
resources_data.add_node_activity("Active workers."); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: vitsai <vitsai@cs.stanford.edu>
…board (ray-project#39638) --------- Signed-off-by: vitsai <vitsai@cs.stanford.edu>
…and dashboard (ray-project#39638)" This reverts commit a2dedf1.
…board (ray-project#39638) --------- Signed-off-by: vitsai <vitsai@cs.stanford.edu> Signed-off-by: Victor <vctr.y.m@example.com>
Plumbs through idle node information to be reflected in ray status both in the CLI and on the dashboard. Does not include additional changes in the cluster tab of the dashboard UI, but does plumb the status field through to the datasource for dashboard consumption.
Why are these changes needed?
Related issue number
#35411
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.