Should always run monitor.py even on non-autoscaling clusters #12629

ericl · 2020-12-04T21:18:26Z

To simplify debugging, we should always run monitor.py on all Ray clusters (perhaps with some dummy autoscaling config). This will be critical if we start reporting metrics using the autoscaler as well.

cc @wuisawesome @AmeerHajAli

wuisawesome · 2020-12-04T21:58:09Z

Did this change recently? I thought we used to always run monitor.py, but if no autoscaler config was provided, we just wouldn't instantiate an autoscaler.

ericl · 2020-12-04T22:58:22Z

Ah, we should probably change to instantiate a dummy AutoScaler eventually, since that will be providing the resource reporting.

…

On Fri, Dec 4, 2020, 1:58 PM Alex Wu ***@***.***> wrote: Did this change recently? I thought we used to always run monitor.py, but if no autoscaler config was provided, we just wouldn't instantiate an autoscaler. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12629 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADUSSCKRNFQF2C7F2GRULSTFLP7ANCNFSM4UN66RFA> .

wuisawesome · 2020-12-04T23:24:28Z

Hmmm what metrics are you thinking of reporting?

Without a node provider, I think load_metrics is the best that we can (easily) do.

ericl · 2020-12-05T00:45:54Z

The AutoScaler json status (from the logging improvement issue).

…

On Fri, Dec 4, 2020, 3:24 PM Alex Wu ***@***.***> wrote: Hmmm what metrics are you thinking of reporting? Without a node provider, I think load_metrics is the best that we can (easily) do. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12629 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADUSR3BMGAEW6M67BKLULSTFVTTANCNFSM4UN66RFA> .

wuisawesome · 2020-12-05T00:48:40Z

Hmmm ok I think we can only display the second half though (which we can/should get from load metrics).

Without a real autoscaler, we don't really have a concept of a node type, pending nodes, or failed nodes.

Resources
------------------------------------------------------------
Usage:
 530.0/544.0 CPU
 2.0/2.0 GPU
 0.0/2.0 AcceleratorType:V100
 0.0 GiB/1583.19 GiB memory
 0.0 GiB/471.02 GiB object_store_memory

Demands:
 {"CPU": 1}: 150 pending tasks
 [{"CPU": 4} * 5]: 5 pending placement groups
 [{"CPU": 1} * 100]: from request_resources()

ericl · 2020-12-05T01:16:00Z

Sounds good. We can report a dummy "local node" type in this case.

…

On Fri, Dec 4, 2020, 4:48 PM Alex Wu ***@***.***> wrote: Hmmm ok I think we can only display the second half though (which we can/should get from load metrics). Without a real autoscaler, we don't really have a concept of a node type, pending nodes, or failed nodes. Resources ------------------------------------------------------------ Usage: 530.0/544.0 CPU 2.0/2.0 GPU 0.0/2.0 AcceleratorType:V100 0.0 GiB/1583.19 GiB memory 0.0 GiB/471.02 GiB object_store_memory Demands: {"CPU": 1}: 150 pending tasks [{"CPU": 4} * 5]: 5 pending placement groups [{"CPU": 1} * 100]: from request_resources() — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12629 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAADUSWHHL6ZU2T6QFMZE73STF7PJANCNFSM4UN66RFA> .

ericl added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks labels Dec 4, 2020

ericl added this to the Serverless Autoscaling milestone Dec 4, 2020

ericl mentioned this issue Dec 15, 2020

[Autoscaler] New output log format #12772

Merged

6 tasks

ericl closed this as completed in #12772 Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should always run monitor.py even on non-autoscaling clusters #12629

Should always run monitor.py even on non-autoscaling clusters #12629

ericl commented Dec 4, 2020

wuisawesome commented Dec 4, 2020

ericl commented Dec 4, 2020 via email

wuisawesome commented Dec 4, 2020

ericl commented Dec 5, 2020 via email

wuisawesome commented Dec 5, 2020

ericl commented Dec 5, 2020 via email

Should always run monitor.py even on non-autoscaling clusters #12629

Should always run monitor.py even on non-autoscaling clusters #12629

Comments

ericl commented Dec 4, 2020

wuisawesome commented Dec 4, 2020

ericl commented Dec 4, 2020 via email

wuisawesome commented Dec 4, 2020

ericl commented Dec 5, 2020 via email

wuisawesome commented Dec 5, 2020

ericl commented Dec 5, 2020 via email