-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autoscaler] Too many workers are scaled in kubernetes #11551
Comments
Cc @wuisawesome probably should handle asap |
I'll look into this, but feel free to steal it if you have the bandwidth |
@wuisawesome I vaguely remember seeing this in the past. IIRC the issue is that the autoscaler has no notion of "pending" nodes, so while the node is pending in k8s (which can take awhile), it continues to add more nodes because resource utilization is still high. |
What's weird is that the autoscaler is reporting it has 1/1 target nodes, but 2 nodes connected. There seems to be a connected node that is not managed by the autoscaler, which leads to an off-by-one error. @wuisawesome is this related to that issue you had with an unmanaged node being counted as a node? @edoakes I do see |
huh, ok, that's actually sort of promising because it indicates that there's just some off-by-one or similar bug somewhere instead of a more intrinsic problem. @ericl does this mean you were able to easily reproduce the issue? |
Indeed, I think there is notion of pending nodes in k8s autoscaler.
It does not influence the problem (doesnt make it better or worse) ... I'm not sure about the off-by-one thing... |
@PidgeyBE have you had a chance to verify that the above PR fixes your issue? |
1 similar comment
@PidgeyBE have you had a chance to verify that the above PR fixes your issue? |
@edoakes Not yet. I'll try to do it today! Our build chain is not so flexible for changing ray versions but if the API is stable between 1.0 and the nightly it might go faster. I'll try |
@edoakes I tried with latest nightly wheel, but the autoscaling monitor doesn't seem to start so I can't test...
This looks ok (as in ray 1.0.0), but when I look into the ray head, it seems like the monitor doesnt start:
Also when I start actors, no workers are spawned... |
I will try to reproduce this on a local k8s instance. |
Ok so I tried running this import ray
ray.init(address="auto")
@ray.remote(num_cpus=1)
def spin():
import time
time.sleep(10000)
ray.get(spin.remote()) On nightly wheel+docker image, using minikube on a pretty beefy machine (32 cores). Running:
I saw what @PidgeyBE observed, even though the head node has enough resource, an extra worker is started, and after 10+ seconds, that worker not crashed somehow. And monitor failed with
|
@simon-mo , thanks for reporting this. |
I haven't seen the exact issue of @simon-mo, but have faced several others. I was wondering if k8s autoscaling is tested in CI/CD? |
This is fixed in the new autoscaler (nightly), please re-open if it still happens there. |
What is the problem?
When autoscaling in kubernetes, it happens sometimes (almost 50% in my case) that instead of 1 worker, 2 workers are scaled up.
Reproduction (REQUIRED)
I see this happens because for some reason
NumNodesConnected
becomes equal toNumNodesUsed
.A few seconds later (while the extra node is already being started and the actor is deployed) I see:
I've cut the important part out of the monitor.err logs monitor.err.txt
It seems like there is some race condition that causes the autoscaler to trigger a worker to start, while the actor is deploying...
BR, Pieterjan
cc @edoakes
The text was updated successfully, but these errors were encountered: