ebs-csi-controller blocks updating EKS managed node group #758

imbohyun1 · 2021-02-22T02:56:08Z

/kind bug

What happened?
While updating managed node group by changing the version of launch template, I observed that if one of nodes from older node group has ebs-csi-controller, then it failed to evict pods with PodEvictionFailure after it hit the max retries to evict ebs-csi-controller pod.

"Errors": [
  {
    "ErrorCode": "PodEvictionFailure",
    "ErrorMessage": "Reached max retries while trying to evict pods from nodes in node group ng-upgrade",
    "ResourceIds": [
      "ip-192-168-48-49.ap-northeast-2.compute.internal"
    ]
  }
]

Looking into the issue, it appeared that kube-scheduler kept rescheduling the ebs-csi-controller pod to the node right after the pod was evicted. While looking at the ebs-csi-controller's manifest (ebs-csi-controller deployment: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/deploy/kubernetes/base/controller.yaml), I could see the tolerations setting for the ebs-csi-controller is not preventing from being rescheduled even though the node has been tainted as eks.amazonaws.com/nodegroup=unschedulable:NoSchedule.

  tolerations:
    - operator: Exists

According to Kubernetes document, "An empty key with operator Exists matches all keys, values and effects which means this will tolerate everything."

What you expected to happen?
I expected that ebs-csi-controller was going to be evicted like other pods to end up with updating managed node group successfully.
To resolve this, I had to remove the tolerations from the ebs-csi-controller or modify tolerations to be applied only to specific effects as below.

  tolerations:
    - operator: Exists
      effect: NoExecute
      tolerationSeconds: 300

How to reproduce it (as minimally and precisely as possible)?

Create a node group with launch template.
Deploy EBS CSI Driver on the cluster by following guidance of official document.
Update managed node group with new version of launch template
The node running ebs-csi-controller will not be drained. and It will fail with PodEvictionFailure.

Anything else we need to know?:

Environment

Kubernetes version (use kubectl version): v1.18
Driver version: v0.9.0

The text was updated successfully, but these errors were encountered:

andrewgeller · 2021-02-26T13:56:11Z

Experienced the same and this seems to be done on purpose. "tolerateAllTaints" was added to the helm chart with a default value of true. There is some discussion in 594.

I am installing with Kustomize, pulling in github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-0.9, so I will patch it after the fact, but will follow this issue to understand why the default of tolerating everything is the right behavior.

wongma7 · 2021-02-26T17:41:30Z

Let's tone it down to:

  tolerations:
  - key: CriticalAddonsOnly
    operator: Exists
  - effect: NoExecute
    operator: Exists

Thoughts?

These are the tolerations I see from a default kops install for kube control plane components and they seem reasonable to me. The csi controller is basically taking/replacing some of kube-controller-manager's responsibility so it makes sense for it to have the same uptime guarantees. Clearly tolerating all taints is excessive.

CriticalAddonsOnly:
since csi controller is as critical as kcm, this makes sense to me. it is doing cluster-wide volume operations
NoExecute:
This one I am not too sure about. You could argue that , when draining nodes, it's important the controller stay up to detach volumes instead of getting instantly evicted. We can set tolerationSeconds here so at least the controller pod will eventually be evicted

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 22, 2021

wongma7 mentioned this issue Mar 4, 2021

add toleration time to NoExecute effect #775

Closed

imbohyun1 mentioned this issue Mar 5, 2021

tolerationSeconds for tolerations for controller #588

Closed

wongma7 mentioned this issue Mar 16, 2021

Reduce scope of default tolerations kubernetes-sigs/aws-efs-csi-driver#363

Closed

krmichel mentioned this issue Mar 25, 2021

Clean up helm chart + kustomize overlays #797

Closed

krmichel mentioned this issue Apr 28, 2021

Cleanup helm chart #856

Merged

k8s-ci-robot closed this as completed in #856 May 24, 2021

gtxu mentioned this issue Sep 12, 2022

Allow all taint for toleration csi-node by default #1381

Merged

This was referenced Oct 27, 2022

Don't apply match all toleration if specifying custom ones DrFaust92/terraform-kubernetes-ebs-csi-driver#91

Merged

Update csi terraform module and specify custom tolerations convox/convox#514

Merged

mothershipper mentioned this issue Feb 1, 2023

tolerationSeconds on controller works oddly with k8s scheduler near capacity limits #1500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ebs-csi-controller blocks updating EKS managed node group #758

ebs-csi-controller blocks updating EKS managed node group #758

imbohyun1 commented Feb 22, 2021 •

edited

andrewgeller commented Feb 26, 2021 •

edited

wongma7 commented Feb 26, 2021 •

edited

ebs-csi-controller blocks updating EKS managed node group #758

ebs-csi-controller blocks updating EKS managed node group #758

Comments

imbohyun1 commented Feb 22, 2021 • edited

andrewgeller commented Feb 26, 2021 • edited

wongma7 commented Feb 26, 2021 • edited

imbohyun1 commented Feb 22, 2021 •

edited

andrewgeller commented Feb 26, 2021 •

edited

wongma7 commented Feb 26, 2021 •

edited