-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should always run monitor.py even on non-autoscaling clusters #12629
Comments
Did this change recently? I thought we used to always run |
Ah, we should probably change to instantiate a dummy AutoScaler eventually,
since that will be providing the resource reporting.
…On Fri, Dec 4, 2020, 1:58 PM Alex Wu ***@***.***> wrote:
Did this change recently? I thought we used to always run monitor.py, but
if no autoscaler config was provided, we just wouldn't instantiate an
autoscaler.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#12629 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSSCKRNFQF2C7F2GRULSTFLP7ANCNFSM4UN66RFA>
.
|
Hmmm what metrics are you thinking of reporting? Without a node provider, I think |
The AutoScaler json status (from the logging improvement issue).
…On Fri, Dec 4, 2020, 3:24 PM Alex Wu ***@***.***> wrote:
Hmmm what metrics are you thinking of reporting?
Without a node provider, I think load_metrics is the best that we can
(easily) do.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#12629 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSR3BMGAEW6M67BKLULSTFVTTANCNFSM4UN66RFA>
.
|
Hmmm ok I think we can only display the second half though (which we can/should get from load metrics). Without a real autoscaler, we don't really have a concept of a node type, pending nodes, or failed nodes.
|
Sounds good. We can report a dummy "local node" type in this case.
…On Fri, Dec 4, 2020, 4:48 PM Alex Wu ***@***.***> wrote:
Hmmm ok I think we can only display the second half though (which we
can/should get from load metrics).
Without a real autoscaler, we don't really have a concept of a node type,
pending nodes, or failed nodes.
Resources
------------------------------------------------------------
Usage:
530.0/544.0 CPU
2.0/2.0 GPU
0.0/2.0 AcceleratorType:V100
0.0 GiB/1583.19 GiB memory
0.0 GiB/471.02 GiB object_store_memory
Demands:
{"CPU": 1}: 150 pending tasks
[{"CPU": 4} * 5]: 5 pending placement groups
[{"CPU": 1} * 100]: from request_resources()
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#12629 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSWHHL6ZU2T6QFMZE73STF7PJANCNFSM4UN66RFA>
.
|
To simplify debugging, we should always run monitor.py on all Ray clusters (perhaps with some dummy autoscaling config). This will be critical if we start reporting metrics using the autoscaler as well.
cc @wuisawesome @AmeerHajAli
The text was updated successfully, but these errors were encountered: