Cluster Autoscaler (0.4.0) - Excessive calls to describeautoscalinggroup #2541
Comments
It looks like the cause here is that we have 7 node pools and only 4 under management. So every time it tries to evaluate a node that is outside of the four managed asgs it regenerates the entire cache. I might suggest keeping another pool of nodes that aren't found in the cache after a regeneration and handle those as a separate 'early-out' type case. For now I think I'll pull all asgs under management but the other 3 in a fixed (same min as max) size. Though this makes me nervous as one of those asgs is our etcd pool. |
I guess we might have the similar problem with GCE/GKE. Need to take a look. |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
We're still on 1.5.6 here and thus suck on 0.4.0
We are also in a corporate shared AWS account for just a few more weeks but we've been responsible for hammering the describe asg endpoint. At one count we had about 6200 calls an hour while watching just 2 autoscaling groups with a scan interval of 15s.
After reviewing the logs at verbosity 4 I'm seeing what seems like excessive cache regeneration log inc below.
I'm digging through the code trying to track down what's invalidating the cache so rapidly.
The text was updated successfully, but these errors were encountered: