script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

thtieig · 2016-01-14T09:36:04Z

(INFO) Collective decision: -1
(WARNING) Consensus was to scale down - but number of servers in scaling group (10) exceeds the number of healthy nodes in load balancer 135461 (6). NOT scaling down!

This feature is preventing nodes not ready yet to have time to connect to the CLBs, BUT if for some reasons they don't, these servers stays live and stops autoscale to scale down.

We should have a sort of check to see if the server has been ACTIVE but NOT-CONNECTED to the CLBs for more than X time. Maybe... 30 minutes?
This parameter should be a variable so customer can change this accordingly, but I guess 30 minutes should be a safe default.

eljrax · 2016-02-12T18:19:46Z

@thtieig as discussed - I think servers failing to provision properly and monitoring for that event is a solution best solved outside of rax-autoscaler.

As a consequence - this is used as a work-around: https://github.com/eljrax/autoscale_setup/tree/master/monitoring

Teddy-Schmitz added enhancement help wanted labels Feb 11, 2016

eljrax closed this as completed Feb 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

thtieig commented Jan 14, 2016

eljrax commented Feb 12, 2016

script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

script gets stuck if number of servers in autoscale group is different from the number of servers connected to the CLBs #18

Comments

thtieig commented Jan 14, 2016

eljrax commented Feb 12, 2016