-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Improved autoscaler log messages #12221
Comments
@ericl Would it be useful to include something about the |
cc @mfitton We should definitely port this to our dashboard. |
This would be a game-changing feature for autoscaler debugging/visibility - can't wait until this is on the dashboard! |
cc @maximsmol this would be a good thing to use (especially since the plan is to emit json) |
Hmm now that we're adding these to |
yep! cc @maximsmol who is investigating a prototype
…On Fri, Dec 11, 2020 at 12:38 PM Alex Wu ***@***.***> wrote:
Hmm now that we're adding these to ray status, I wonder if we should also
make some of this information available programmatically... Just a thought.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCRZZIMTUSUANODXN5LIP3SUJ7LRANCNFSM4T5MYIPQ>
.
|
FYI we are currently exposing this programmatically in the dashboard ( |
@ericl @wuisawesome can we close this issue? |
I'd call it 95% done since we don't have the per-file logging doen yet |
What remains to do is the per-node logging, and this task has been delegated to me. Let me just confirm the requirement and a potential way to do it. ?Requirement?: ?Design?: @rkooo567 If anyone has a better idea, let me know. |
I recall the node updater used to be a process, so you could redirect
stdout to a file directly. It seems to be a thread now, so maybe we have to
pass a logger and also capture any child command outputs as well. We could
also try moving it back to a process but that sounds like a bigger change.
…On Thu, Jan 14, 2021, 9:13 PM Dmitri Gekhtman ***@***.***> wrote:
What remains to do is the per-node logging, and this task has been
delegated to me.
Let me just confirm the requirement and a potential way to do it.
?Requirement?:
We need a log file for each node_id, and each NodeUpdater running inside
of the monitor process should log to that file.
?Design?:
One strategy is to put the requisite logging logic in the NodeUpdater --
Give the NodeUpdater a self.logger attribute.
Have the NodeUpdater detect by some means if it's running inside the
monitor process.
If it's not running in the monitor process, then
self.logger = logger, where logger is the logger =
logging.getLogger(__name__) defined at the top of the file.
If it is running in the monitor process, then self.logger is a custom
logger that doesn't belong to the standard logging hierarchy and writes to
a node_id-dependent log file.
@rkooo567 <https://github.com/rkooo567>
Does this sound like a remotely sane strategy?
If anyone has a better idea, let me know.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSVZGHTNCSWKAWBUFELSZ7FIPANCNFSM4T5MYIPQ>
.
|
IIRC you shouldn't be using This logger should be used in both cases of being in a monitor process or not (rather, the stdout redirect should be toggled). |
@richardliaw Ah, yep you're right, it uses the CLI logger. |
cli_logger.print just prints, as far as I can tell |
I'm not sure how you would go about using a logger here since iirc we don't capture any output. My assumption this would be done by redirecting stdout at the process runner level. |
hmm, well, right now stdout and stderr go to monitor.out and monitor.err, For the purposes of this issue, are the "autoscaler log messages" we're talking about the contents of monitor.log? |
ech -- anyways, i see: the goal is to redirect all output of cmds run by NodeUpdater.cmd_runner to the relevant file. |
fyi @clarkzinzow this might be a good starter issue |
^ feel free to change the assignment from me to @clarkzinzow if Clark is interested |
Minor update: we should refer to failed nodes by their id, since ip addresses can be reused when failed nodes are terminated. |
Hmm IDs aren't as useful though, about as useful as an out of date IP. If
we keep failed nodes alive, then IPs should still be relevant right?
So we should still keep the IPs since they're more generally useful.
…On Wed, Feb 10, 2021, 5:21 PM Alex Wu ***@***.***> wrote:
Minor update: we should refer to failed nodes by their id, since ip
addresses can be reused when failed nodes are terminated.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#12221 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSSLLNNWQVZXOMSKS3TS6MWHPANCNFSM4T5MYIPQ>
.
|
ok, we can use IP, but it puts us at the mercy of the k8s ip allocation policy. |
@yiranwang52 what should the |
Closing this since the final piece here is covered in this issue: #13586 |
The current autoscaler output is quite difficult to interpret due to its verbosity and low-level details. This is a proposal to clean it by periodically emitting the following summary table:
Implementation details:
ray status
can also access this information.The text was updated successfully, but these errors were encountered: