Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

Closed
thtieig opened this issue Jan 14, 2016 · 1 comment

Comments

@thtieig
Copy link

thtieig commented Jan 14, 2016

(INFO) Collective decision: -1
(WARNING) Consensus was to scale down - but number of servers in scaling group (10) exceeds the number of healthy nodes in load balancer 135461 (6). NOT scaling down!

This feature is preventing nodes not ready yet to have time to connect to the CLBs, BUT if for some reasons they don't, these servers stays live and stops autoscale to scale down.

We should have a sort of check to see if the server has been ACTIVE but NOT-CONNECTED to the CLBs for more than X time. Maybe... 30 minutes?
This parameter should be a variable so customer can change this accordingly, but I guess 30 minutes should be a safe default.

@eljrax
Copy link
Contributor

eljrax commented Feb 12, 2016

@thtieig as discussed - I think servers failing to provision properly and monitoring for that event is a solution best solved outside of rax-autoscaler.

As a consequence - this is used as a work-around: https://github.com/eljrax/autoscale_setup/tree/master/monitoring

@eljrax eljrax closed this as completed Feb 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants