Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance are not shutting down due to "Protection from scale In" #432

Open
pawel-t opened this issue Feb 6, 2024 · 4 comments
Open

Instance are not shutting down due to "Protection from scale In" #432

pawel-t opened this issue Feb 6, 2024 · 4 comments
Labels

Comments

@pawel-t
Copy link

pawel-t commented Feb 6, 2024

Issue Details

Describe the bug
Instances are not shutting down as expected.
I've set following setting in the plugin

Max Idle Minutes Before Scaledown: 10
Minimum Cluster Size: 0
Maximum Cluster Size: 3
Minimum Spare Size: 0
Maximum Total Uses: -1 

I've also tried with turning on and off option No Delay Provision Strategy.
I tried to increase Max to 20 and Min to values bigger than 0.
I tried to change idle to 5, 15 etc
In all of those cases instances were not shutdown after expected idle time.

It even seems that new instances were added when there were multiple free executor on existing.

It seems that this "Protected From Scale In" kicks in automatically.
I didn't set it via IaC nor manually (I tried to setup this up even without IaC just to confirm). ASG starts without this option and then Jenkins via this plugin must change it somehow.

I've rollback to legacy Spot Fleet and issue is not present there.

To Reproduce
Set values it above and play around.

** Logs **
In AWS I can see:

Could not scale to desired capacity because all remaining instances are protected from scale-in.

At 2024-02-06T05:43:51Z a user request update of AutoScalingGroup constraints to min: 0, max: 3, desired: 0 changing the desired capacity from 1 to 0. At 2024-02-06T05:43:54Z group reached equilibrium.

Environment Details

Plugin Version?
3.2.0

Jenkins Version?
2.426.3

Spot Fleet or ASG?
ASG

Label based fleet?
No

Linux or Windows?
Linux

EC2Fleet Configuration as Code
N/A

Anything else unique about your setup?
No

@pawel-t
Copy link
Author

pawel-t commented Feb 8, 2024

I will observe it for few days but changing values for min and spare from 0 to 1 seems to speed it up or maybe it just me.

Seems somehow related to this: #425

@pawel-t
Copy link
Author

pawel-t commented Feb 8, 2024

I got recent case on this. We wanted to scale to keep one instance at a time. It's ASG using Spot.

We were starting in point were min = 1, spare = 1, max = 3
(side note setting it to min 0, spare 0, made it not scale up at all once it go down to 0 instance, we had 0 instance for 1.5h: #425)

I've applied following settings:

Max Idle Minutes Before Scaledown: 5
Minimum Cluster Size : 1
Maximum Cluster Size: 1
Minimum Spare Size: 0 
Maximum Total Uses: -1
Disable Build Resubmit : False
Maximum Init Connection Timeout in sec: 45
Cloud Status Interval in sec: 10

2 hours has passed, on AWS I see that auto-scaling group has following settings:

Min: 1
Max: 1
Desired: 1

Yet there are 2 instances running, both with Scale in protection.

XXXXXXXXXXXX
State: active, label: "YYYYYYY", nodes: 2, target: 1

On Jenkins logs I don't see anything about scaling it down. In Jenkins UI I can see those are idling.

On AWS I can see:

Cancelled	
Could not scale to desired capacity because all remaining instances are protected from scale-in.	
At 2024-02-08T11:03:21Z a user request update of AutoScalingGroup constraints to min: 1, max: 1, desired: 1 changing the desired capacity from 2 to 1. At 2024-02-08T11:03:26Z group reached equilibrium.

@icep87
Copy link

icep87 commented Feb 27, 2024

Have you enabled the Scale-in Protection in you ASG? As this issue is more related to the setup of the ASG than the plugin itself.

@pawel-t
Copy link
Author

pawel-t commented Feb 27, 2024

Have you enabled the Scale-in Protection in you ASG? As this issue is more related to the setup of the ASG than the plugin itself.

I didn't, but plugin indeed. As it was stated in other issue tickets, plugin is doing to prevent ASG to kill instances on it's own. What I can see is, plugin is not fast enough to maintain this replacement on it's own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants