Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Autoscaler] Unmanaged node off-by-one error #11430

Closed
2 tasks
wuisawesome opened this issue Oct 16, 2020 · 0 comments · Fixed by #11458
Closed
2 tasks

[Autoscaler] Unmanaged node off-by-one error #11430

wuisawesome opened this issue Oct 16, 2020 · 0 comments · Fixed by #11458
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Milestone

Comments

@wuisawesome
Copy link
Contributor

What is the problem?

Consider a cluster with utilization_fraction: 1.0 and an unmanaged node.

target_num_workers() == 1 because we receive load metrics from the unmanaged node.

But num_workers = self.workers() + num_pending, and self.workers() already filters out the unmanaged node.

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@wuisawesome wuisawesome added bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order labels Oct 16, 2020
@wuisawesome wuisawesome added this to the Serverless Autoscaling milestone Oct 16, 2020
@ericl ericl added P1 Issue that should be fixed within a few weeks and removed P0 Issues that should be fixed in short order labels Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants