Cluster Autoscaler resets unneeded since time to 0s #5618

wreed4 · 2023-03-24T23:36:23Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

1.21.3
Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.17", GitCommit:"953be8927218ec8067e1af2641e540238ffd7576", GitTreeState:"clean", BuildDate:"2023-03-01T02:23:41Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.17-eks-48e63af", GitCommit:"47b89ea2caa1f7958bc6539d6865820c86b4bf60", GitTreeState:"clean", BuildDate:"2023-01-24T09:34:06Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

aws, EKS

What did you expect to happen?:

unneeded since would continue to increase until the node either is removed or becomes needed

What happened instead?:

unneeded since dropped to 0 for many nodes at once (but not all) even though they were never determined to be needed per the logs, causing the --scale-down-unneeded-time timer to be reset.

How to reproduce it (as minimally and precisely as possible):

I'm unsure what causes this, but I know we have a fairly high churn rate on our cluster, around 300 nodes, mostly default settings with CA, and I did see a Watch on replicasets close in the loop that this happened in.. if that matters.

Anything else we need to know?:

I'm happy to answer more questions, but I'm unsure what else to put here. The logs are far too verbose to copy in entirety, but I'll say, this is the piece of code I'm at that I think might possibly be lying.

// Update stores nodes along with a time at which they were found to be
// unneeded. Previously existing timestamps are preserved.

autoscaler/cluster-autoscaler/core/scaledown/unneeded/nodes.go

Line 77 in 2f1c895

updated[name].since = ts

The text was updated successfully, but these errors were encountered:

vadasambar · 2023-03-28T06:47:50Z

/assign @vadasambar

vadasambar · 2023-03-28T07:08:46Z

A snippet of logs around the problematic behavior might be helpful.

Here's my understanding (feel free to correct me)

I couldn't find core/scaledown/unneeded/nodes.go file in 1.21 version of CA. Maybe you are looking at master or a post 1.21 version of the code.
We are talking about the log defined in

autoscaler/cluster-autoscaler/core/scale_down.go

Line 829 in cb24873

klog.V(2).Infof("%s was unneeded for %s", nodeName, currentTime.Sub(unneededSince).String())

unneededSince timestamp is updated at

autoscaler/cluster-autoscaler/core/scale_down.go

Lines 593 to 604 in cb24873

    
           // Update the timestamp map. 
        
           result := make(map[string]time.Time) 
        
           unneededNodesList := make([]*apiv1.Node, 0, len(nodesToRemove)) 
        
           for _, node := range nodesToRemove { 
        
           	name := node.Node.Name 
        
           	unneededNodesList = append(unneededNodesList, node.Node) 
        
           	if val, found := sd.unneededNodes[name]; !found { 
        
           		result[name] = timestamp 
        
           	} else { 
        
           		result[name] = val 
        
           	} 
        
           }

wreed4 · 2023-03-28T13:53:12Z

Thank you @vadasambar, I have too many tabs open and that one wasn't from 1.21 sorry for the confusion. Thanks for finding the right place.

Here's all the logs having to do with one particular node that this happened on. I'm filtering them because we have hundreds of nodes and there are a LOT of logs, but the filter is just a simple text search of this node name

1 | I0324 06:00:25.374593       1 static_autoscaler.go:497] ip-172-17-76-70.ec2.internal is unneeded since 2023-03-24 06:00:24.071480578 +0000 UTC m=+242752.166823855 duration 0s
2 | I0324 06:00:24.698279       1 cluster.go:187] Fast evaluation: node ip-172-17-76-70.ec2.internal may be removed
3 | I0324 06:00:24.670887       1 cluster.go:148] Fast evaluation: ip-172-17-76-70.ec2.internal for removal
4 | I0324 06:00:24.378929       1 scale_down.go:448] Node ip-172-17-76-70.ec2.internal - memory utilization 0.888549
5 | I0324 06:00:14.017381       1 scale_down.go:829] ip-172-17-76-70.ec2.internal was unneeded for 11m13.651288526s
6 | I0324 06:00:14.015632       1 static_autoscaler.go:497] ip-172-17-76-70.ec2.internal is unneeded since 2023-03-24 05:48:59.326845574 +0000 UTC m=+242067.422188845 duration 11m13.651288526s
7 | I0324 06:00:13.297503       1 scale_down.go:448] Node ip-172-17-76-70.ec2.internal - memory utilization 0.888549
8 | I0324 06:00:13.954747       1 cluster.go:187] Fast evaluation: node ip-172-17-76-70.ec2.internal may be removed
9 | I0324 06:00:13.930523       1 cluster.go:148] Fast evaluation: ip-172-17-76-70.ec2.internal for removal

It looks like something is getting reset in between the 6:00:14 scrape and the 6:00:24 scrape. The only thing that looks like a reset is this line

I0324 06:00:22.671491       1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.ReplicaSet total 977 items received

During the 6:00:24 there were quite a few other nodes marked as unneeded for 0s, some of which were new, but some of which display the same behavior as this node.

vadasambar · 2023-03-29T05:26:04Z

It seems like something is removing the node from uneededNodes and then the node is considered in nodesToRemove because of which the time is getting reset to current time on line 600 in

autoscaler/cluster-autoscaler/core/scale_down.go

Lines 593 to 604 in cb24873

    
           // Update the timestamp map. 
        
           result := make(map[string]time.Time) 
        
           unneededNodesList := make([]*apiv1.Node, 0, len(nodesToRemove)) 
        
           for _, node := range nodesToRemove { 
        
           	name := node.Node.Name 
        
           	unneededNodesList = append(unneededNodesList, node.Node) 
        
           	if val, found := sd.unneededNodes[name]; !found { 
        
           		result[name] = timestamp 
        
           	} else { 
        
           		result[name] = val 
        
           	} 
        
           }

vadasambar · 2023-04-03T06:10:00Z

I think CA might be removing some nodes from the unneeded node map. Because of this, during the next iteration of CA(cluster-autoscaler), the code goes into !found part of condition on line 600 above and the timestamp is updated to current time because of which unneeded time (calculated as current timestamp - unneeded timestamp) becomes 0.

autoscaler/cluster-autoscaler/core/scale_down.go

Lines 975 to 987 in fcd0433

    
           toRemove := nodesToRemove[0] 
        
           utilization := sd.nodeUtilizationMap[toRemove.Node.Name] 
        
           podNames := make([]string, 0, len(toRemove.PodsToReschedule)) 
        
           for _, pod := range toRemove.PodsToReschedule { 
        
           	podNames = append(podNames, pod.Namespace+"/"+pod.Name) 
        
           } 
        
           klog.V(0).Infof("Scale-down: removing node %s, utilization: %v, pods to reschedule: %s", toRemove.Node.Name, utilization, 
        
           	strings.Join(podNames, ",")) 
        
           sd.context.LogRecorder.Eventf(apiv1.EventTypeNormal, "ScaleDown", "Scale-down: removing node %s, utilization: %v, pods to reschedule: %s", 
        
           	toRemove.Node.Name, utilization, strings.Join(podNames, ",")) 
        
           // Nothing super-bad should happen if the node is removed from tracker prematurely. 
        
           simulator.RemoveNodeFromTracker(sd.usageTracker, toRemove.Node.Name, sd.unneededNodes)

RemoveNodeFromTracker on line 987 modifies sd.unneededNodes without completely resetting it (note line 975; only one node is scaled down at a time)

To look a little closer at RemoveNodeFromTracker:

autoscaler/cluster-autoscaler/simulator/tracker.go

Lines 134 to 161 in e028312

    
           // RemoveNodeFromTracker removes node from tracker and also cleans the passed utilization map. 
        
           func RemoveNodeFromTracker(tracker *UsageTracker, node string, utilization map[string]time.Time) { 
        
           	keysToRemove := make([]string, 0) 
        
           	if mainRecord, found := tracker.Get(node); found { 
        
           		if mainRecord.usingTooMany { 
        
           			keysToRemove = getAllKeys(utilization) 
        
           		} else { 
        
           		usingloop: 
        
           			for usedNode := range mainRecord.using { 
        
           				if usedNodeRecord, found := tracker.Get(usedNode); found { 
        
           					if usedNodeRecord.usedByTooMany { 
        
           						keysToRemove = getAllKeys(utilization) 
        
           						break usingloop 
        
           					} else { 
        
           						for anotherNode := range usedNodeRecord.usedBy { 
        
           							keysToRemove = append(keysToRemove, anotherNode) 
        
           						} 
        
           					} 
        
           				} 
        
           			} 
        
           		} 
        
           	} 
        
           	tracker.Unregister(node) 
        
           	delete(utilization, node) 
        
           	for _, key := range keysToRemove { 
        
           		delete(utilization, key) 
        
           	} 
        
           }

utilization is sd.unneededNodes passed by reference (for a lack of a better term) and if you see on line 157, it's removing the node being scaled down (let's call it node A) from the sd.unneededNodes (which is expected) but it's also going a step further and deleting other nodes (line 158~160) from sd.unneededNodes.

Here's my understanding of how UsageTracker works:

node A has a usage record. If pods from node A can be moved to node B, node B is added to node A's record.using
node B also has a usage record. If pods from node A can be moved to node B, node A is added to node B's record.usedBy
when node A is scaled down, node A is removed from sd.unneededNodes
usage tracker goes a step further and also removes some more nodes based on the following logic:
a. node A is using node B
b. node B is using node D
c. node B is being used by node A and node C
d. when node A is deleted, usage tracker sees which nodes node A is using e.g., In our case, it picks node B and gets the usage record for node B. It then gets the record.usedBy for node B and collects all the node names registered in record.usedBy for node B and removes these nodes from sd.unneededNodes

Link to picture

My earlier understanding was usage tracker would remove node B from sd.unneededNodes because node A was using node B (i.e., pods from node A could be moved here) but that doesn't seem true based on the code. Instead, node C is being removed from sd.unneededNodes because pods from node C cannot be moved to node B anymore since pods from node A are slated to move there. In other words, re-evaluate where pods from node C would go if it was scaled down.

I believe node C is the node we care about here. It is getting removed from sd.unneededNodes and hence the unneeded for time is being counted started from 0 all over again.

I am not sure why node B is not removed from sd.unneededNodes by the usage tracker. My guess is, this is by design and we are dealing with node B somewhere else or we don't want to remove it from sd.unneededNodes (I am not sure why though)

vadasambar · 2023-04-06T04:54:54Z

/label cluster-autoscaler

k8s-ci-robot · 2023-04-06T04:54:55Z

@vadasambar: The label(s) /label cluster-autoscaler cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label cluster-autoscaler

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bober02 · 2023-05-03T12:08:55Z

Hi, has there been any progress on this? At large scale deployments this is actually taking us hours to fully scale down, and subsequent cost for unutilised machines

bober02 · 2023-05-03T12:09:46Z

This is also happening in 1.25 version

kevinkrp93 · 2023-05-03T16:30:11Z

One of our customers was facing the same issue too. We replicated the customer scenario twice and found this reset in one of the runs. Is anyone else seeing this?

Cluster Autoscaler Profile:

Customer Scenario:

Replicated Customer Scenario:

kevinkrp93 · 2023-05-03T16:47:26Z

/label cluster-autoscaler

k8s-ci-robot · 2023-05-03T16:47:28Z

@kevinkrp93: The label(s) /label cluster-autoscaler cannot be applied. These labels are supported: api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

/label cluster-autoscaler

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bryce-Soghigian · 2023-05-22T17:05:58Z

/assign Bryce-Soghigian

Bryce-Soghigian · 2023-06-26T14:56:42Z

Not sure this is entirely related, but I found the same bug show up in my testing for this fix, but it goes away with this fix
#5890

Bryce-Soghigian · 2023-07-18T06:58:25Z

I think we can close this issue as it was addressed in my pr, and i have the cherry picks posted #5962 to fix this for all supported k8s versions

k8s-triage-robot · 2024-01-24T18:05:04Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-23T18:11:04Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-04-20T15:53:56Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-04-20T15:54:01Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wreed4 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 24, 2023

k8s-ci-robot assigned vadasambar Mar 28, 2023

vadasambar mentioned this issue Mar 28, 2023

Mar 2023 vadafoss/daily-updates#7

Closed

vadasambar mentioned this issue Apr 3, 2023

Apr 2023 vadafoss/daily-updates#8

Closed

vadasambar removed their assignment Apr 6, 2023

vadasambar mentioned this issue May 4, 2023

May 2023 vadafoss/daily-updates#9

Closed

jbartosik added the area/cluster-autoscaler label May 12, 2023

k8s-ci-robot assigned Bryce-Soghigian May 22, 2023

vadasambar mentioned this issue Jun 1, 2023

Jun 2023 vadafoss/daily-updates#10

Closed

Bryce-Soghigian mentioned this issue Jun 26, 2023

fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value #5890

Merged

vadasambar mentioned this issue Jul 3, 2023

Jul 2023 vadafoss/daily-updates#11

Closed

Bryce-Soghigian mentioned this issue Jul 18, 2023

Node is terminated to early when scale-down-unneeded-time is set to 10m #5952

Closed

vadasambar mentioned this issue Aug 1, 2023

Aug 2023 vadafoss/daily-updates#12

Closed

vadasambar mentioned this issue Sep 1, 2023

Sep 2023 vadafoss/daily-updates#13

Closed

vadasambar mentioned this issue Oct 3, 2023

Oct 2023 vadafoss/daily-updates#14

Closed

vadasambar mentioned this issue Nov 1, 2023

Nov 2023 vadafoss/daily-updates#15

Closed

vadasambar mentioned this issue Dec 4, 2023

Dec 2023 vadafoss/daily-updates#16

Closed

vadasambar mentioned this issue Jan 2, 2024

Jan 2024 vadafoss/daily-updates#17

Closed

Bryce-Soghigian mentioned this issue Jan 4, 2024

REQUEST: New membership for Bryce Soghigian kubernetes/org#4652

Closed

9 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 24, 2024

vadasambar mentioned this issue Feb 2, 2024

Feb 2024 vadafoss/daily-updates#18

Open

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 23, 2024

towca added the area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. label Mar 21, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler resets unneeded since time to 0s #5618

Cluster Autoscaler resets unneeded since time to 0s #5618

wreed4 commented Mar 24, 2023

vadasambar commented Mar 28, 2023

vadasambar commented Mar 28, 2023

wreed4 commented Mar 28, 2023

vadasambar commented Mar 29, 2023 •

edited

Loading

vadasambar commented Apr 3, 2023

vadasambar commented Apr 6, 2023

k8s-ci-robot commented Apr 6, 2023

bober02 commented May 3, 2023

bober02 commented May 3, 2023

kevinkrp93 commented May 3, 2023

kevinkrp93 commented May 3, 2023

k8s-ci-robot commented May 3, 2023

Bryce-Soghigian commented May 22, 2023

Bryce-Soghigian commented Jun 26, 2023

Bryce-Soghigian commented Jul 18, 2023

k8s-triage-robot commented Jan 24, 2024

k8s-triage-robot commented Feb 23, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-ci-robot commented Apr 20, 2024

Cluster Autoscaler resets unneeded since time to 0s #5618

Cluster Autoscaler resets unneeded since time to 0s #5618

Comments

wreed4 commented Mar 24, 2023

vadasambar commented Mar 28, 2023

vadasambar commented Mar 28, 2023

wreed4 commented Mar 28, 2023

vadasambar commented Mar 29, 2023 • edited Loading

vadasambar commented Apr 3, 2023

vadasambar commented Apr 6, 2023

k8s-ci-robot commented Apr 6, 2023

bober02 commented May 3, 2023

bober02 commented May 3, 2023

kevinkrp93 commented May 3, 2023

kevinkrp93 commented May 3, 2023

k8s-ci-robot commented May 3, 2023

Bryce-Soghigian commented May 22, 2023

Bryce-Soghigian commented Jun 26, 2023

Bryce-Soghigian commented Jul 18, 2023

k8s-triage-robot commented Jan 24, 2024

k8s-triage-robot commented Feb 23, 2024

k8s-triage-robot commented Apr 20, 2024

k8s-ci-robot commented Apr 20, 2024

vadasambar commented Mar 29, 2023 •

edited

Loading