Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should always run monitor.py even on non-autoscaling clusters #12629

Closed
ericl opened this issue Dec 4, 2020 · 6 comments · Fixed by #12772
Closed

Should always run monitor.py even on non-autoscaling clusters #12629

ericl opened this issue Dec 4, 2020 · 6 comments · Fixed by #12772
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Milestone

Comments

@ericl
Copy link
Contributor

ericl commented Dec 4, 2020

To simplify debugging, we should always run monitor.py on all Ray clusters (perhaps with some dummy autoscaling config). This will be critical if we start reporting metrics using the autoscaler as well.

cc @wuisawesome @AmeerHajAli

@ericl ericl added enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks labels Dec 4, 2020
@ericl ericl added this to the Serverless Autoscaling milestone Dec 4, 2020
@wuisawesome
Copy link
Contributor

Did this change recently? I thought we used to always run monitor.py, but if no autoscaler config was provided, we just wouldn't instantiate an autoscaler.

@ericl
Copy link
Contributor Author

ericl commented Dec 4, 2020 via email

@wuisawesome
Copy link
Contributor

Hmmm what metrics are you thinking of reporting?

Without a node provider, I think load_metrics is the best that we can (easily) do.

@ericl
Copy link
Contributor Author

ericl commented Dec 5, 2020 via email

@wuisawesome
Copy link
Contributor

Hmmm ok I think we can only display the second half though (which we can/should get from load metrics).

Without a real autoscaler, we don't really have a concept of a node type, pending nodes, or failed nodes.

Resources
------------------------------------------------------------
Usage:
 530.0/544.0 CPU
 2.0/2.0 GPU
 0.0/2.0 AcceleratorType:V100
 0.0 GiB/1583.19 GiB memory
 0.0 GiB/471.02 GiB object_store_memory

Demands:
 {"CPU": 1}: 150 pending tasks
 [{"CPU": 4} * 5]: 5 pending placement groups
 [{"CPU": 1} * 100]: from request_resources()

@ericl
Copy link
Contributor Author

ericl commented Dec 5, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants