Node-pool doesn't scale down to 0 on GKE #2377

aaaaahaaaaa · 2019-09-24T10:16:20Z

I can't seem to be able to configure my k8s cluster on GKE in such way that any of my non-default node-pools properly scales down to 0. The kube-system pods seem to be the problem but the documentation mentioning this specific use case doesn't help, and as far as I can tell several people are in the same situation (e.g. kubernetes/kubernetes#69696).

The PDB mentioned here can only be applied to heapster, kube-dns and metric-server. PDBs don't work on pods like fluentd, kube-proxy and prometheus-to-sd. I imagine because they are handled by daemonsets?

Warning NoControllers 95s (x48 over 36m) controllermanager found no controllers for pod "kube-proxy-gke-xxx"

K8s Rev: v1.13.7-gke.8

❯ kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
apiVersion: v1
data:
  status: |+
    Cluster-autoscaler status at 2019-09-24 10:12:30.638916052 +0000 UTC:
    Cluster-wide:
      Health:      Healthy (ready=2 unready=0 notStarted=0 longNotStarted=0 registered=2 longUnregistered=0)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-22 16:52:28.254187732 +0000 UTC m=+28.756695947
      ScaleUp:     NoActivity (ready=2 registered=2)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-24 09:21:58.063983054 +0000 UTC m=+145798.566491267
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-24 08:59:00.15511532 +0000 UTC m=+144420.657623532

    NodeGroups:
      Name:        https://content.googleapis.com/compute/v1/projects/XXX/zones/europe-west1-b/instanceGroups/gke-XXX-default-pool-ed23a39e-grp
      Health:      Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=3))
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-22 16:52:28.254187732 +0000 UTC m=+28.756695947
      ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-22 16:52:28.254187732 +0000 UTC m=+28.756695947
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-22 16:52:28.254187732 +0000 UTC m=+28.756695947

      Name:        https://content.googleapis.com/compute/v1/projects/XXX/zones/europe-west1-b/instanceGroups/gke-XXX-processing-b864ae5d-grp
      Health:      Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=0, maxSize=1))
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
      ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-24 09:21:58.063983054 +0000 UTC m=+145798.566491267
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2019-09-24 10:12:30.31663754 +0000 UTC m=+148830.819145764
                   LastTransitionTime: 2019-09-24 09:21:04.100256228 +0000 UTC m=+145744.602764440

kind: ConfigMap
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/last-updated: 2019-09-24 10:12:30.638916052 +0000
      UTC
  creationTimestamp: "2019-09-20T15:55:43Z"
  name: cluster-autoscaler-status
  namespace: kube-system
  resourceVersion: "1244195"
  selfLink: /api/v1/namespaces/kube-system/configmaps/cluster-autoscaler-status
  uid: 19ef15cd-dbbf-11e9-b868-42010a840216

Node is only running kube-system pods:

kube-system fluentd-gcp-v3.2.0-ldhpk                        2/2  Running  ...
kube-system heapster-v1.6.1-8b4b64566-4krcg                 3/3  Running  ...
kube-system kube-dns-6987857fdb-29h2g                       4/4  Running  ...
kube-system kube-proxy-gke-pricing-processing-b864ae5d-ljgb 1/1  Running  ...
kube-system metrics-server-v0.3.1-57c75779f-c9nhk           2/2  Running  ...
kube-system prometheus-to-sd-8rt98                          1/1  Running  ...

The text was updated successfully, but these errors were encountered:

losipiuk · 2019-09-24T11:02:09Z

The culprit is probably kube-dns please look for cannot be removed in the logs. It should shed some light why node was not eligible for scale down.

aaaaahaaaaa · 2019-09-24T12:17:09Z

@losipiuk The logs from the CA? I believe they aren't accessible on GKE.

losipiuk · 2019-09-24T12:54:12Z

Oh - sorry - I did not notice the GKE part (though you are on GCE). It is hard to be sure what exactly is the problem without seeing cluster logs. If you set PDBs for non-deamonset system pods you should be fine (given there is place for those pods to be run on other nodes). The daemonsets are not blocking node scale-down.

If you

have PDBs set and
verified that scaledown should be possible (node utilization is below 50% and there is place for pods from node elsewhere)
and it is still not happening
then contacting GKE support is your best chance IMO.

aaaaahaaaaa · 2019-09-24T13:00:11Z

Alright, I guess GKE support it is then. Thanks for the help.

xhanin · 2020-04-23T05:35:10Z

@aaaaahaaaaa Did you get a chance to sort this out? I have the same problem, an autoscaling node pool which doesn't scale to 0 whilst scale down conditions should be met.

MaciekPytel · 2020-04-23T08:58:55Z

@xhanin There may be any number of reasons for this, but system pods or pods using local storage are the most common reason (other reasons are in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node).

One option to consider is to put a taint on the nodepool that you want to be able to scale to 0. That way system pods will not be able to run on those nodes, so they won't block scale-down. Downside is you'll need to add a toleration to all the pods that you want to run on this nodepool (this can be automated with mutating admission webhook). This is a very useful pattern if you have a nodepool with particularly expensive nodes.
Alternatively you can create PDBs for all non-daemonset system pods. Note: restarting some system pods can cause various types of disruption to your cluster, which is why CA does not restart them by default (ex. restarting metrics-server will break all HPAs in your cluster for a few minutes). It's up to you to decide which disruptions you're ok with.

CA will log a name of the pod that is blocking scale-down (on GKE logs are not directly accessible, but the same information is exposed via https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler-visibility).

xhanin · 2020-04-25T05:36:31Z

@MaciekPytel Thank you so much for your help! The page documenting how to get visibility events is very helpful, this is exactly what I was looking for.

And the pattern of mutating webhook is very interesting.

I'll further investigate in that direction, thank you again!

superarvind · 2022-08-22T22:02:17Z

It's been observed that, even if taints are defined, system workloads are still trying to run on the custom node pool. Does anyone come across a similar case with GKE?

aaaaahaaaaa closed this as completed Sep 24, 2019

Shaked mentioned this issue Jun 9, 2021

GKE node auto-provisioning not scaling down GoogleCloudPlatform/kubeflow-distribution#288

Open

ghost mentioned this issue Jan 21, 2022

GKE schedules pods like fluentd-gcp that prevent scale down kubernetes/kubernetes#69696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node-pool doesn't scale down to 0 on GKE #2377

Node-pool doesn't scale down to 0 on GKE #2377

aaaaahaaaaa commented Sep 24, 2019 •

edited

losipiuk commented Sep 24, 2019

aaaaahaaaaa commented Sep 24, 2019 •

edited

losipiuk commented Sep 24, 2019

aaaaahaaaaa commented Sep 24, 2019

xhanin commented Apr 23, 2020

MaciekPytel commented Apr 23, 2020

xhanin commented Apr 25, 2020

superarvind commented Aug 22, 2022

Node-pool doesn't scale down to 0 on GKE #2377

Node-pool doesn't scale down to 0 on GKE #2377

Comments

aaaaahaaaaa commented Sep 24, 2019 • edited

losipiuk commented Sep 24, 2019

aaaaahaaaaa commented Sep 24, 2019 • edited

losipiuk commented Sep 24, 2019

aaaaahaaaaa commented Sep 24, 2019

xhanin commented Apr 23, 2020

MaciekPytel commented Apr 23, 2020

xhanin commented Apr 25, 2020

superarvind commented Aug 22, 2022

aaaaahaaaaa commented Sep 24, 2019 •

edited

aaaaahaaaaa commented Sep 24, 2019 •

edited