Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume hang on Karpenter Node Consolidation/Termination #1665

Closed
martysweet opened this issue Jun 30, 2023 · 8 comments
Closed

Volume hang on Karpenter Node Consolidation/Termination #1665

martysweet opened this issue Jun 30, 2023 · 8 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@martysweet
Copy link

/kind bug

As discussed in #1302, some users experience a considerable delay when using PersistentVolumes in combination with Karpenter.

This ticket is to capture that specific issue, highlight where the error is coming from, and discuss possible ways forward.

What happened?

During a Karpenter scaling event, it’s possible that the Kubelet cannot update its Node.Status.VolumesAttached and Node.Status.VolumesInUse values, either because EC2 TerminateInstances has been executed or the Node object has been removed. The Kubelet VolumeManager processes jobs asynchronously, so a terminated pod does not mean all the volumes have been fully unmounted and detached from the node.

Due to how the controllermanager attachdetach-controller uses this node information, the VolumeAttachment objects are not cleaned up immediately, often waiting for around 6+ minutes. Once they are cleaned up by the control plane, the EBS-CSI Controller does its job as expected, typically throwing the error “Already detached” in the logs (as the EC2 instance had detached the volumes on termination).

DetachDisk: called on non-attached volume

What you expected to happen?

If a node is deleted and the underlying EBS volumes become detached in the real world, the ebs-csi driver should be able to reflect the actual state in Kubernetes before the timeout condition triggers in the controllermanager attachdetach-controller.

Ideally, we can then catch this dangling volume condition, and reconcile the state within seconds instead of 6+ minutes. Allowing the volumes to be remounted quickly on another node.

How to reproduce it (as minimally and precisely as possible)?

  • Create a cluster with Karpenter and a provisioning group
  • Create a Stateful set with 1 replica and ~20 volumes
    • 20 volumes are used here to reliably reproduce the bug, but it can occur with fewer volumes. It’s a factor of how quickly the Kubelet VolumeManager can unmount volumes after pod termination and update the Node status object.
  • Use the Kubernetes API to delete the Karpenter node the pod is running on
    • This will be captured by the Karpenter termination controller, causing a cordon and drain
    • Once this is complete, the node is Terminated using the EC2 API
      • If Volumes are listed in the Node status when the finalizer is removed - this bug occurs

Reproduction Walkthrough

2023-06-30T09:04:30.218Z    INFO    controller.termination    cordoned node    {"commit": "26e2d35-dirty", "node": "ip-10-227-165-182.eu-central-1.compute.internal"}                                                        │
2023-06-30T09:04:31.769Z    INFO    controller.termination    deleted node    {"commit": "26e2d35-dirty", "node": "ip-10-227-165-182.eu-central-1.compute.internal"}                                                         │
2023-06-30T09:04:32.049Z    INFO    controller.machine.termination    deleted machine    {"commit": "26e2d35-dirty", "machine": "nodes-ubuntu-j67qz", "node": "ip-10-227-165-182.eu-central-1.compute.internal",
  • The pod is successfully terminated
[2023-06-30T09:04:31.364131Z] Name:  ebstest-0
[2023-06-30T09:04:31.364192Z] Type:  MODIFIED
[2023-06-30T09:04:31.364213Z] Phase:  Running
[2023-06-30T09:04:31.364228Z] Deletion Timestamp:  2023-06-30 09:04:30+00:00
[2023-06-30T09:04:31.364252Z] Node Name:  ip-10-227-165-182.eu-central-1.compute.internal
[2023-06-30T09:04:31.364267Z]  ====================================================================================== =
[2023-06-30T09:04:31.368870Z] Name:  ebstest-0
[2023-06-30T09:04:31.368932Z] Type:  DELETED
[2023-06-30T09:04:31.368955Z] Phase:  Running
[2023-06-30T09:04:31.368972Z] Deletion Timestamp:  2023-06-30 09:04:30+00:00
[2023-06-30T09:04:31.368997Z] Node Name:  ip-10-227-165-182.eu-central-1.compute.internal
[2023-06-30T09:04:31.369037Z]  ====================================================================================== =
  • The node is officially deleted from the API, however, it still has VolumesAttached and VolumesInUse. It’s then also purged from existence by the EC2 API. If 1 or two volumes are on the node, it can sometimes process all of these in time, however, any more volumes and this bug becomes consistently reproducible.
[2023-06-30T09:04:30.451152Z] Name:  ip-10-227-165-182.eu-central-1.compute.internal
[2023-06-30T09:04:30.451202Z] Type:  MODIFIED
[2023-06-30T09:04:30.451216Z] Deletion Timestamp:  2023-06-30 09:04:30+00:00
[2023-06-30T09:04:30.451237Z] Finalizers:  ['karpenter.sh/termination']
[2023-06-30T09:04:30.451582Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-039c251ec3bb18e1a'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-044c85228f50c3cc3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0a04dc850d50fde39'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ae8ca9ce6efd78c3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01e3c874d68c54844'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-000bf0a07713e801c'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08cdaad8d4fb97ded'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-062a56ba3b8325000'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-03d4144b21e2e4240'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00ab6f6923e2cedd2'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0e082a6e4eee7a6d3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd58963b12e47220'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5723c4698267bc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d8944ef66977a5e9'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d97e48cc694a6e90'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd917ecb162abf39'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-009a8731a6c1a5d75'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-010c6f981ce7fb716'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ca6be22757fe2860'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0dd2106a0b2e12980'}]
[2023-06-30T09:04:30.451628Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-000bf0a07713e801c', 'kubernetes.io/csi/ebs.csi.aws.com^vol-009a8731a6c1a5d75', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00ab6f6923e2cedd2', 'kubernetes.io/csi/ebs.csi.aws.com^vol-010c6f981ce7fb716', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01e3c874d68c54844', 'kubernetes.io/csi/ebs.csi.aws.com^vol-039c251ec3bb18e1a', 'kubernetes.io/csi/ebs.csi.aws.com^vol-03d4144b21e2e4240', 'kubernetes.io/csi/ebs.csi.aws.com^vol-044c85228f50c3cc3', 'kubernetes.io/csi/ebs.csi.aws.com^vol-062a56ba3b8325000', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08cdaad8d4fb97ded', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0a04dc850d50fde39', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5723c4698267bc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ae8ca9ce6efd78c3', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd58963b12e47220', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd917ecb162abf39', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ca6be22757fe2860', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d8944ef66977a5e9', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d97e48cc694a6e90', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0dd2106a0b2e12980', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0e082a6e4eee7a6d3']
[2023-06-30T09:04:30.451656Z]  ====================================================================================== =
[2023-06-30T09:04:31.966559Z] Name:  ip-10-227-165-182.eu-central-1.compute.internal
[2023-06-30T09:04:31.966637Z] Type:  DELETED
[2023-06-30T09:04:31.966657Z] Deletion Timestamp:  2023-06-30 09:04:30+00:00
[2023-06-30T09:04:31.966682Z] Finalizers:  ['karpenter.sh/termination']
[2023-06-30T09:04:31.967037Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-039c251ec3bb18e1a'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-044c85228f50c3cc3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0a04dc850d50fde39'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ae8ca9ce6efd78c3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01e3c874d68c54844'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-000bf0a07713e801c'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08cdaad8d4fb97ded'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-062a56ba3b8325000'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-03d4144b21e2e4240'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00ab6f6923e2cedd2'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0e082a6e4eee7a6d3'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd58963b12e47220'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5723c4698267bc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d8944ef66977a5e9'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d97e48cc694a6e90'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd917ecb162abf39'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-009a8731a6c1a5d75'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-010c6f981ce7fb716'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ca6be22757fe2860'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0dd2106a0b2e12980'}]
[2023-06-30T09:04:31.967086Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-000bf0a07713e801c', 'kubernetes.io/csi/ebs.csi.aws.com^vol-009a8731a6c1a5d75', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00ab6f6923e2cedd2', 'kubernetes.io/csi/ebs.csi.aws.com^vol-010c6f981ce7fb716', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01e3c874d68c54844', 'kubernetes.io/csi/ebs.csi.aws.com^vol-039c251ec3bb18e1a', 'kubernetes.io/csi/ebs.csi.aws.com^vol-03d4144b21e2e4240', 'kubernetes.io/csi/ebs.csi.aws.com^vol-044c85228f50c3cc3', 'kubernetes.io/csi/ebs.csi.aws.com^vol-062a56ba3b8325000', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08cdaad8d4fb97ded', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0a04dc850d50fde39', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5723c4698267bc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ae8ca9ce6efd78c3', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd58963b12e47220', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0bd917ecb162abf39', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ca6be22757fe2860', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d8944ef66977a5e9', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0d97e48cc694a6e90', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0dd2106a0b2e12980', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0e082a6e4eee7a6d3']

Here is a snippet of VolumeAttachment objects after the node has been deleted - referencing the stale node. After 6 minutes, these objects will be cleaned up and recreated for the newly scheduled node.

csi-2b69ecb15e39f04be54aff1ca9c435cf3b699c71abc97a0817b1f48e56035cd9   ebs.csi.aws.com   pvc-a12f2685-a61b-4f35-b5db-d0d2d120dae3   ip-10-227-165-182.eu-central-1.compute.internal   true       20h
csi-4983722e4e0284f939fe90e75ff3b7bd7631ef8034b3fe9301680d05c0d29c57   ebs.csi.aws.com   pvc-323c2507-e3b7-405b-8bdd-bdb451190c6e   ip-10-227-165-182.eu-central-1.compute.internal   true       20h
csi-52a39a2029b26218f3c50f9b7199e7627f748d84348dfb7e3baf18914dd04e50   ebs.csi.aws.com   pvc-63131420-107f-4560-a4be-f5478b6a7b10   ip-10-227-165-182.eu-central-1.compute.internal   true       20h
csi-7d391010e1749ceeaf03ac930de8f38c287bbfaedba589f32c0e5907d82d754d   ebs.csi.aws.com   pvc-b097ef05-7813-4813-b245-23af831d9609   ip-10-227-165-182.eu-central-1.compute.internal   true       20h**Environment**

Versions

  • Kubernetes version (use kubectl version): v1.25.10-eks-c12679a running Ubuntu Kubelet v1.25.10
  • Driver version: 1.19.0

Possible Solutions

There are lots of components at play in this specific issue.

  • Karpenter is deleting the node before the Status.Volumes* fields are empty - but is it really the responsibility of Karpenter to care about this?
  • The attachdetach-controller uses the Status.Volumes* data to populate its ActualStateOfWorld, which, in our specific case is not the truth
  • Maybe the Kubelet is not detaching volumes quickly enough? This, however, is an asynchronous reconciliation process
  • The ebs-csi-controller is not notified about any of these actions

To help explain all the components involved, we put together a simplified diagram.

image

As the ebs-csi controller, we have access to the real-world EBS Volume attachments information in the EC2 API. As such, the driver should be able to update the VolumeAttachment object to notify the attachdetach-controller of the correct state.

One possible solution to this problem could be:

  • When a CSI Driver is deployed to a node, add a csi-ebs finalizer to the Node
  • During runtime: On node deletion, capture the event
    • Log and Wait until all csi-ebs volumes are detached before removing the finalizer
  • On controller startup, check if there are any pending node deletions and run above logic

The advantage of the above flow, is we are just forcing a wait and allowing the standard CSI flows to happen. However, looking at the Karpenter logic, it seems that it wouldn’t care about our finalizer, terminating the instance regardless https://github.com/aws/karpenter-core/blob/1a05bdff59ced95c8868c75efdddfa77a3540092/pkg/controllers/termination/controller.go#L84-L102. Perhaps that is a bug on their side, where they should be waiting for a final delete event before proceeding with the TerminateInstance call?

Another solution could be:

  • Listen to Node deletion events
  • When a node has been removed from the API
  • Add each dangling volumesAttached to a queue
  • Periodically check to see if the volume is still in use in the EC2 API to the dangling nodeName
  • Update any dangling VolumeAttachment objects to reflect the actual EC2 state
  • On csi-controller startup, check VolumeAttachments for bindings to any nodes which don’t exist

In either case, missing events (say during a Controller restart) will fall back to the 6-minute stale timeout controlled by the attachdetach-controller. The aim is to make this reconciliation quicker and in a safe manner.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jun 30, 2023
@dcarrion87
Copy link

We're running into similar issues that may not be exactly the same but related: #1677.

@martysweet
Copy link
Author

@torredil In regards to this PR #1736 which was merged into 1.23.0.

After updating the EBS CSI Driver Addon to 1.23.0 we can still observe the volume hanging containers issue (due to Volume Attachments still existing).

Steps:

  • Cordon
  • K8S Node Delete (Set DeletionTimestamp)
[2023-09-28T13:31:21.233922Z]  ====================================================================================== =
[2023-09-28T13:31:21.391654Z] Name:  ip-10-112-249-183.ec2.internal
[2023-09-28T13:31:21.391688Z] Type:  MODIFIED
[2023-09-28T13:31:21.391696Z] Deletion Timestamp:  2023-09-28 13:31:23+00:00
[2023-09-28T13:31:21.391709Z] Finalizers:  ['karpenter.sh/termination']
[2023-09-28T13:31:21.391718Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T13:31:21.391724Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T13:31:21.391911Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}]
[2023-09-28T13:31:21.391927Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']
[2023-09-28T13:31:21.391942Z]  ====================================================================================== =
[2023-09-28T13:31:22.991720Z] Name:  ip-10-112-249-183.ec2.internal
[2023-09-28T13:31:22.991754Z] Type:  DELETED
[2023-09-28T13:31:22.991762Z] Deletion Timestamp:  2023-09-28 13:31:23+00:00
[2023-09-28T13:31:22.991776Z] Finalizers:  ['karpenter.sh/termination']
[2023-09-28T13:31:22.991789Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T13:31:22.991798Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T13:31:22.991960Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}]
[2023-09-28T13:31:22.991974Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']

This led us down a bit of a rabbit hole regarding the behaviour of preStop for DaemonSets. Our understanding is that without the kubelet shutdownGracePeriod and shutdownGracePeriodCriticalPods options set, preStop would have little to no effect.

Looking at the nodes being provisioned by Karpenter, we saw GracefulShutdown was not enabled (I'm not sure if this behaviour is different by default on AL2?):

ubuntu@ip-10-112-249-47:~$ systemd-inhibit --list
WHO                          UID USER PID COMM            WHAT     WHY                                                    
   MODE 
Unattended Upgrades Shutdown 0   root 555 unattended-upgr shutdown Stop ongoing upgrades or perform upgrades before shutdown delay

In the UserData, adding the following enabled the Graceful Shutdown handling:

echo "$(jq '.shutdownGracePeriod="30s"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
echo "$(jq '.shutdownGracePeriodCriticalPods="10s"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json

====
ubuntu@ip-10-112-249-141:~$ systemd-inhibit --list
WHO                          UID USER PID  COMM            WHAT     WHY                                                   
    MODE 
Unattended Upgrades Shutdown 0   root 561  unattended-upgr shutdown Stop ongoing upgrades or perform upgrades before shutdown delay
kubelet                      0   root 4169 kubelet         shutdown Kubelet needs time to handle node shutdown                delay

However, the same behaviour is observed. I imagine this is because Karpenter is removing the finalizer too early, which in turns deletes the Node object https://github.com/aws/karpenter-core/blob/main/pkg/controllers/termination/controller.go#L103-L106. Thus not letting the ebs node process update the Node object.

Increasing the Graceful Node Timeout:

# Remove max delay  limit
rm /usr/lib/systemd/logind.conf.d/unattended-upgrades-logind-maxdelay.conf                                                                                                                            

# Set Kubelet periods
echo "$(jq '.shutdownGracePeriod="300s"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json
echo "$(jq '.shutdownGracePeriodCriticalPods="290s"' /etc/kubernetes/kubelet/kubelet-config.json)" > /etc/kubernetes/kubelet/kubelet-config.json    
[2023-09-28T14:21:55.208470Z]  ====================================================================================== =
[2023-09-28T14:21:55.237182Z] Name:  ip-10-112-249-67.ec2.internal
[2023-09-28T14:21:55.237235Z] Type:  MODIFIED
[2023-09-28T14:21:55.237246Z] Deletion Timestamp:  2023-09-28 14:21:57+00:00
[2023-09-28T14:21:55.237263Z] Finalizers:  ['karpenter.sh/termination']
[2023-09-28T14:21:55.237280Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T14:21:55.237296Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T14:21:55.237305Z] Taint:  Key: node.kubernetes.io/not-ready => Effect: NoSchedule
[2023-09-28T14:21:55.237481Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}]
[2023-09-28T14:21:55.237495Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']
[2023-09-28T14:21:55.237506Z]  ====================================================================================== =
[2023-09-28T14:21:56.706409Z] Name:  ip-10-112-249-67.ec2.internal
[2023-09-28T14:21:56.706441Z] Type:  DELETED
[2023-09-28T14:21:56.706449Z] Deletion Timestamp:  2023-09-28 14:21:57+00:00
[2023-09-28T14:21:56.706462Z] Finalizers:  ['karpenter.sh/termination']
[2023-09-28T14:21:56.706473Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T14:21:56.706497Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T14:21:56.706505Z] Taint:  Key: node.kubernetes.io/not-ready => Effect: NoSchedule
[2023-09-28T14:21:56.706666Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}]
[2023-09-28T14:21:56.706680Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']

While tailing the logs and initialising a terminate via the AWS CLI - we observed:

  • The delete event appeared quickly
  • The machine hung for much longer before shutting down
  • The only daemon set still Running was the ebs-csi-node

Mentioned in the OP, was the need for an ebs managed finializer in order to prevent the object deletion from the k8s api before it was ready. However, I'm sure this will get a bit problematic if Graceful Shutdown isn't configured correctly and the node is nuked too quickly, leaving over stale objects).

Attempting to add a new finaliser - called "ebs.io/martytest".

[2023-09-28T15:00:39.275652Z]  ====================================================================================== =
[2023-09-28T15:00:39.302534Z] Name:  ip-10-112-249-178.ec2.internal
[2023-09-28T15:00:39.302568Z] Type:  MODIFIED
[2023-09-28T15:00:39.302576Z] Deletion Timestamp:  2023-09-28 15:00:41+00:00
[2023-09-28T15:00:39.302590Z] Finalizers:  ['karpenter.sh/termination', 'ebs.io/martytest']
[2023-09-28T15:00:39.302604Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T15:00:39.302613Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T15:00:39.302809Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}]
[2023-09-28T15:00:39.302825Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']
[2023-09-28T15:00:39.302838Z]  ====================================================================================== =
[2023-09-28T15:00:44.681627Z] Name:  ip-10-112-249-178.ec2.internal
[2023-09-28T15:00:44.681664Z] Type:  MODIFIED
[2023-09-28T15:00:44.681673Z] Deletion Timestamp:  2023-09-28 15:00:41+00:00
[2023-09-28T15:00:44.681688Z] Finalizers:  ['ebs.io/martytest']
[2023-09-28T15:00:44.681701Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T15:00:44.681710Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T15:00:44.681902Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}]
[2023-09-28T15:00:44.681918Z] Status volumes in use:  ['kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566', 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529', 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac', 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343', 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92', 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454', 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798', 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1', 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474', 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958', 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8']
[2023-09-28T15:00:44.681931Z]  ====================================================================================== =
[2023-09-28T15:00:44.794953Z] Name:  ip-10-112-249-178.ec2.internal
[2023-09-28T15:00:44.795021Z] Type:  MODIFIED
[2023-09-28T15:00:44.795034Z] Deletion Timestamp:  2023-09-28 15:00:41+00:00
[2023-09-28T15:00:44.795047Z] Finalizers:  ['ebs.io/martytest']
[2023-09-28T15:00:44.795078Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T15:00:44.795085Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T15:00:44.795249Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}]
[2023-09-28T15:00:44.795262Z] Status volumes in use:  None
[2023-09-28T15:00:44.795268Z]  ====================================================================================== =
[2023-09-28T15:00:44.845716Z] Name:  ip-10-112-249-178.ec2.internal
[2023-09-28T15:00:44.845751Z] Type:  MODIFIED
[2023-09-28T15:00:44.845759Z] Deletion Timestamp:  2023-09-28 15:00:41+00:00
[2023-09-28T15:00:44.845771Z] Finalizers:  ['ebs.io/martytest']
[2023-09-28T15:00:44.845783Z] Taint:  Key: org.com/do-not-consolidate => Effect: NoSchedule
[2023-09-28T15:00:44.845792Z] Taint:  Key: node.kubernetes.io/unschedulable => Effect: NoSchedule
[2023-09-28T15:00:44.845801Z] Taint:  Key: node.kubernetes.io/not-ready => Effect: NoSchedule
[2023-09-28T15:00:44.845967Z] Status volumes attached:  [{'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0307226999ac56343'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ac5e2fdbc4f3d6f7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec01fe4a6aee1958'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-005db39685d5f5566'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-07a893ded97178fa1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b4046571ebb0fff6'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0ec232a5640ef6dd8'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05d9553face706ca7'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-08230e8c4e14b8474'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0133af7364ff631ac'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-057363c85f52a2575'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-00db2b0fdc05b2523'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-04bd1e9295ff69c92'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-05c7c9ee95e405798'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0b348e13a119024cc'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-0269b031a18d3bec1'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-053ea92510461b454'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-006257d2bca12f529'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-095b60f0541af2307'}, {'device_path': '', 'name': 'kubernetes.io/csi/ebs.csi.aws.com^vol-01d153838e2dcf798'}]
[2023-09-28T15:00:44.845982Z] Status volumes in use:  None

In this case, the pod seems to be successfully rescheduled on another host within a few minutes.

There was an error in one case:

  Normal   Scheduled          46m                 default-scheduler  Successfully assigned kube-system/ebs-csi-node-5pqld to ip-10-112-249-185.ec2.internal
  Warning  NetworkNotReady    46m (x13 over 46m)  kubelet            network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  Normal   Pulling            46m                 kubelet            Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.23.0"
  Normal   Pulled             46m                 kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.23.0" in 2.983593963s
  Normal   Created            46m                 kubelet            Created container ebs-plugin
  Normal   Started            46m                 kubelet            Started container ebs-plugin
  Normal   Pulling            46m                 kubelet            Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-node-driver-registrar:v2.8.0-eks-1-28-4"
  Normal   Pulled             46m                 kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-node-driver-registrar:v2.8.0-eks-1-28-4" in 720.268616ms
  Normal   Created            46m                 kubelet            Created container node-driver-registrar
  Normal   Started            46m                 kubelet            Started container node-driver-registrar
  Normal   Pulling            46m                 kubelet            Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/livenessprobe:v2.10.0-eks-1-28-4"
  Normal   Pulled             46m                 kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/livenessprobe:v2.10.0-eks-1-28-4" in 712.303838ms
  Normal   Created            46m                 kubelet            Created container liveness-probe
  Normal   Started            46m                 kubelet            Started container liveness-probe
  Normal   Killing            13m                 kubelet            Stopping container ebs-plugin
  Normal   Killing            13m                 kubelet            Stopping container liveness-probe
  Normal   Killing            13m                 kubelet            Stopping container node-driver-registrar
  Warning  FailedPreStopHook  12m                 kubelet            Exec lifecycle hook ([/bin/aws-ebs-csi-driver pre-stop-hook]) for Container "ebs-plugin" in Pod "ebs-csi-node-5pqld_kube-system(1dfb36d0-c56a-408e-9f17-a84ea3c49859)" failed - error: command '/bin/aws-ebs-csi-driver pre-stop-hook' exited with 137: I0928 14:56:29.251528     362 prestop.go:32] "PreStop: executing PreStop lifecycle hook" timeout="30s"
I0928 14:56:29.289312     362 prestop.go:48] "PreStop: node is being drained, checking for remaining VolumeAttachments" node="ip-10-112-249-185.ec2.internal"
E0928 14:56:29.299846     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
I0928 14:56:29.303156     362 prestop.go:118] "isVolumeAttachmentEmpty: not ready to exit, found VolumeAttachment" attachment={"metadata":{"name":"csi-0c66a3b3252a587bb40cb81d8691dc8b933fd2b91e58d19d4f52fdf90f9e97f5","uid":"5ef57b47-e10f-4e9e-ae7b-45d587eb182b","resourceVersion":"84217381","creationTimestamp":"2023-09-28T14:42:02Z","annotations":{"csi.alpha.kubernetes.io/node-id":"i-09a15870d61b92cee"},"finalizers":["external-attacher/ebs-csi-aws-com"],"managedFields":[{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2023-09-28T14:42:02Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csi.alpha.kubernetes.io/node-id":{}},"f:finalizers":{".":{},"v:\"external-attacher/ebs-csi-aws-com\"":{}}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2023-09-28T14:42:02Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:attacher":{},"f:nodeName":{},"f:source":{"f:persistentVolumeName":{}}}}},{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2023-09-28T14:42:03Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:attached":{},"f:attachmentMetadata":{".":{},"f:devicePath":{}}}},"subresource":"status"}]},"spec":{"attacher":"ebs.csi.aws.com","source":{"persistentVolumeName":"pvc-ffd8d7ad-d859-4c80-831e-5d9381bfb7d1"},"nodeName":"ip-10-112-249-185.ec2.internal"},"status":{"attached":true,"attachmentMetadata":{"devicePath":"/dev/xvdaf"}}} node="ip-10-112-249-185.ec2.internal"
E0928 14:56:30.905648     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0928 14:56:33.645028     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0928 14:56:39.937964     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
E0928 14:56:46.909808     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
, message: "I0928 14:56:29.251528     362 prestop.go:32] \"PreStop: executing PreStop lifecycle hook\" timeout=\"30s\"\nI0928 14:56:29.289312     362 prestop.go:48] \"PreStop: node is being drained, checking for remaining VolumeAttachments\" node=\"ip-10-112-249-185.ec2.internal\"\nE0928 14:56:29.299846     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)\nI0928 14:56:29.303156     362 prestop.go:118] \"isVolumeAttachmentEmpty: not ready to exit, found VolumeAttachment\" attachment={\"metadata\":{\"name\":\"csi-0c66a3b3252a587bb40cb81d8691dc8b933fd2b91e58d19d4f52fdf90f9e97f5\",\"uid\":\"5ef57b47-e10f-4e9e-ae7b-45d587eb182b\",\"resourceVersion\":\"84217381\",\"creationTimestamp\":\"2023-09-28T14:42:02Z\",\"annotations\":{\"csi.alpha.kubernetes.io/node-id\":\"i-09a15870d61b92cee\"},\"finalizers\":[\"external-attacher/ebs-csi-aws-com\"],\"managedFields\":[{\"manager\":\"csi-attacher\",\"operation\":\"Update\",\"apiVersion\":\"storage.k8s.io/v1\",\"time\":\"2023-09-28T14:42:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:csi.alpha.kubernetes.io/node-id\":{}},\"f:finalizers\":{\".\":{},\"v:\\\"external-attacher/ebs-csi-aws-com\\\"\":{}}}}},{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"storage.k8s.io/v1\",\"time\":\"2023-09-28T14:42:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:spec\":{\"f:attacher\":{},\"f:nodeName\":{},\"f:source\":{\"f:persistentVolumeName\":{}}}}},{\"manager\":\"csi-attacher\",\"operation\":\"Update\",\"apiVersion\":\"storage.k8s.io/v1\",\"time\":\"2023-09-28T14:42:03Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\"f:attached\":{},\"f:attachmentMetadata\":{\".\":{},\"f:devicePath\":{}}}},\"subresource\":\"status\"}]},\"spec\":{\"attacher\":\"ebs.csi.aws.com\",\"source\":{\"persistentVolumeName\":\"pvc-ffd8d7ad-d859-4c80-831e-5d9381bfb7d1\"},\"nodeName\":\"ip-10-112-249-185.ec2.internal\"},\"status\":{\"attached\":true,\"attachmentMetadata\":{\"devicePath\":\"/dev/xvdaf\"}}} node=\"ip-10-112-249-185.ec2.internal\"\nE0928 14:56:30.905648     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)\nE0928 14:56:33.645028     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)\nE0928 14:56:39.937964     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)\nE0928 14:56:46.909808     362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)\n"
  Warning  FailedKillPod  12m (x2 over 12m)  kubelet  error killing pod: failed to "KillPodSandbox" for "1dfb36d0-c56a-408e-9f17-a84ea3c49859" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"2de45293b7ae847567c78048fbe43a17fd994e9e6e580ef48bdb530b5ca0d490\": cni plugin not initialized"

Hope this helps

@torredil torredil self-assigned this Sep 29, 2023
@egeturgay
Copy link

egeturgay commented Jan 5, 2024

We have come across similar issues but I don't think this is Karpenter related. I don't think PreStop hook works at times for multiple reasons.

If a node is drained first which effectively unmounts and detaches the volumes successfully, we don't see this problem (Karpenter would do the draining for consolidation/termination). So, I would consider this an edge case.
I only see the problem when a node is cordoned (prerequisite for ebs pre-stop-hook hook to kick in) and the instance is terminated while there are pods running with PVCs attached.

On the pre-stop-hook side;

  • The cluster role provided in the helm chart (and what's being deployed as part of AWS EBS add-on is lacking the "watch" permissions (that is needed by the informer in pre-hook). That's what causing these errors reported in the previous comment "E0928 14:56:30.905648 362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)
  • shutdownGracePeriod kubelet configuration and systemd logind.conf.d is absent in Amazon Linux. ([EKS] [request]: Systemd upgrade to > v239 for enabling node graceful shutdown  aws/containers-roadmap#2057). We have configured these ourselves.
  • Enabling shutdownGracePeriod to anything beyond 30 seconds is not effective because the pod termination grace period is set to 30 seconds so ebs-util (where pre-hook lives) get killed by the shutdown manager. So, we increased this manually to 2-3 minutes.
  • Increasing the above is not going to take any effect from pre-stop-hook's perspective because there's a context set in pre-stop-hook to timeout after 30 seconds which leads to this error:
    failed to execute PreStop lifecycle hook" err="waitForVolumeAttachments: timed out waiting for preStopHook to complete: context deadline exceeded"
  • Increasing the 30 seconds timeout in the context does not help either because after a certain time kubelet_node_status is not able to find the node in kube api (which I presume handles reporting the detachment back to the API).

kubelet_node_status.go:545] "Error updating node status, will retry" err="error getting node \"ip-x-y-z-1.eu-west-3.compute.internal\": nodes \"ip-x-y-z-1.eu-west-3.eu-west-3.compute.internal\" not found"

These were the issues I've come across around pre-stop-hook and I don't think It will work as intended when an instance is terminated by an AWS call unless the node is drained first (which is the time I'm not sure if the pre-stop-hook is adding any value).

As you will see from the logs below, the PVCs are unmounted/detached successfully and yet pre-stop-hook still finds the volume attachments from the k8s API (I also checked the API directly, they're left over until controller manager kicks in 6 minutes later). I think somehow the unmounts/detachments are not communicated to k8s API timely (maybe due to kubelet node status not being able to find the node anymore?)

Jan  5 11:05:01 ip-x-y-z-1 kubelet: I0105 11:05:01.740197    1590 operation_generator.go:882] UnmountVolume.TearDown succeeded for volume "kubernetes.io/csi/ebs.csi.aws.com^vol-06c2c9c91c25b8d7a" (OuterVolumeSpecName: "nats-io-js-pvc") pod "28ad28fd-9a2e-48cd-9484-ebadd4af2154" (UID: "28ad28fd-9a2e-48cd-9484-ebadd4af2154"). InnerVolumeSpecName "pvc-58bfd977-0c3a-4b72-b531-2b6a7b556a3e". PluginName "kubernetes.io/csi", VolumeGidValue ""
Jan  5 11:05:01 ip-x-y-z-1 kubelet: I0105 11:05:01.827468    1590 reconciler_common.go:293] "operationExecutor.UnmountDevice started for volume \"pvc-58bfd977-0c3a-4b72-b531-2b6a7b556a3e\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-06c2c9c91c25b8d7a\") on node \"ip-x-y-z-1.eu-west-3.compute.internal\" "
Jan  5 11:05:01 ip-x-y-z-1 kubelet: I0105 11:05:01.907981    1590 operation_generator.go:996] UnmountDevice succeeded for volume "pvc-58bfd977-0c3a-4b72-b531-2b6a7b556a3e" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-06c2c9c91c25b8d7a") on node "ip-x-y-z-1.eu-west-3.compute.internal"
Jan  5 11:05:01 ip-x-y-z-1 kubelet: I0105 11:05:01.927778    1590 reconciler_common.go:300] "Volume detached for volume \"pvc-58bfd977-0c3a-4b72-b531-2b6a7b556a3e\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-06c2c9c91c25b8d7a\") on node \"ip-x-y-z-1.eu-west-3.compute.internal\" DevicePath \"csi-fd70310c45190a54a04283f2a6487162d110aebc3adbe19af3d176ec4db0309c\""
Jan  5 11:05:02 ip-x-y-z-1 kubelet: I0105 11:05:02.034134    1590 reconciler_common.go:172] "operationExecutor.UnmountVolume started for volume \"nats-io-js-pvc\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-008e0b4293561d0a6\") pod \"fb418f8a-9e53-4f66-a1a8-56da8db8f567\" (UID: \"fb418f8a-9e53-4f66-a1a8-56da8db8f567\") "
Jan  5 11:05:02 ip-x-y-z-1 kubelet: I0105 11:05:02.044298    1590 operation_generator.go:882] UnmountVolume.TearDown succeeded for volume "kubernetes.io/csi/ebs.csi.aws.com^vol-008e0b4293561d0a6" (OuterVolumeSpecName: "nats-io-js-pvc") pod "fb418f8a-9e53-4f66-a1a8-56da8db8f567" (UID: "fb418f8a-9e53-4f66-a1a8-56da8db8f567"). InnerVolumeSpecName "pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448". PluginName "kubernetes.io/csi", VolumeGidValue ""
Jan  5 11:05:02 ip-x-y-z-1 kubelet: I0105 11:05:02.141426    1590 reconciler_common.go:293] "operationExecutor.UnmountDevice started for volume \"pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-008e0b4293561d0a6\") on node \"ip-x-y-z-1.eu-west-3.compute.internal\" "
Jan  5 11:05:02 ip-x-y-z-1 kubelet: I0105 11:05:02.221426    1590 operation_generator.go:996] UnmountDevice succeeded for volume "pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-008e0b4293561d0a6") on node "ip-x-y-z-1.eu-west-3.compute.internal"
Jan  5 11:05:02 ip-x-y-z-1 kubelet: I0105 11:05:02.251384    1590 reconciler_common.go:300] "Volume detached for volume \"pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-008e0b4293561d0a6\") on node \"ip-x-y-z-1.eu-west-3.compute.internal\" DevicePath \"csi-d8c71a23cf7dd35a99445a7515755c74546e71035e701363d1e2a2a48df79bdb\""
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:02.146729      14 prestop.go:118] "isVolumeAttachmentEmpty: not ready to exit, found VolumeAttachment" attachment={"metadata":{"name":"csi-d8c71a23cf7dd35a99445a7515755c74546e71035e701363d1e2a2a48df79bdb","uid":"5d42bdd1-b3e7-49c4-ac1d-842c0de321ef","resourceVersion":"1583325928","creationTimestamp":"2024-01-03T16:12:38Z","annotations":{"csi.alpha.kubernetes.io/node-id":"i-0d83a3d8a12384cf0"},"finalizers":["external-attacher/ebs-csi-aws-com"],"managedFields":[{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csi.alpha.kubernetes.io/node-id":{}},"f:finalizers":{".":{},"v:\"external-attacher/ebs-csi-aws-com\"":{}}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:attacher":{},"f:nodeName":{},"f:source":{"f:persistentVolumeName":{}}}}},{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:40Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:attached":{},"f:attachmentMetadata":{".":{},"f:devicePath":{}}}},"subresource":"status"}]},"spec":{"attacher":"ebs.csi.aws.com","source":{"persistentVolumeName":"pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448"},"nodeName":"ip-x-y-z-1.eu-west-3.compute.internal"},"status":{"attached":true,"attachmentMetadata":{"devicePath":"/dev/xvdaa"}}} node="ip-x-y-z-1.eu-west-3.compute.internal"
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:02.146729      14 prestop.go:118] "isVolumeAttachmentEmpty: not ready to exit, found VolumeAttachment" attachment={"metadata":{"name":"csi-d8c71a23cf7dd35a99445a7515755c74546e71035e701363d1e2a2a48df79bdb","uid":"5d42bdd1-b3e7-49c4-ac1d-842c0de321ef","resourceVersion":"1583325928","creationTimestamp":"2024-01-03T16:12:38Z","annotations":{"csi.alpha.kubernetes.io/node-id":"i-0d83a3d8a12384cf0"},"finalizers":["external-attacher/ebs-csi-aws-com"],"managedFields":[{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csi.alpha.kubernetes.io/node-id":{}},"f:finalizers":{".":{},"v:\"external-attacher/ebs-csi-aws-com\"":{}}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:attacher":{},"f:nodeName":{},"f:source":{"f:persistentVolumeName":{}}}}},{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:40Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:attached":{},"f:attachmentMetadata":{".":{},"f:devicePath":{}}}},"subresource":"status"}]},"spec":{"attacher":"ebs.csi.aws.com","source":{"persistentVolumeName":"pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448"},"nodeName":"ip-x-y-z-1.eu-west-3.compute.internal"},"status":{"attached":true,"attachmentMetadata":{"devicePath":"/dev/xvdaa"}}} node="ip-x-y-z-1.eu-west-3.compute.internal"
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:02.146729      14 prestop.go:118] "isVolumeAttachmentEmpty: not ready to exit, found VolumeAttachment" attachment={"metadata":{"name":"csi-d8c71a23cf7dd35a99445a7515755c74546e71035e701363d1e2a2a48df79bdb","uid":"5d42bdd1-b3e7-49c4-ac1d-842c0de321ef","resourceVersion":"1583325928","creationTimestamp":"2024-01-03T16:12:38Z","annotations":{"csi.alpha.kubernetes.io/node-id":"i-0d83a3d8a12384cf0"},"finalizers":["external-attacher/ebs-csi-aws-com"],"managedFields":[{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:csi.alpha.kubernetes.io/node-id":{}},"f:finalizers":{".":{},"v:\"external-attacher/ebs-csi-aws-com\"":{}}}}},{"manager":"kube-controller-manager","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:38Z","fieldsType":"FieldsV1","fieldsV1":{"f:spec":{"f:attacher":{},"f:nodeName":{},"f:source":{"f:persistentVolumeName":{}}}}},{"manager":"csi-attacher","operation":"Update","apiVersion":"storage.k8s.io/v1","time":"2024-01-03T16:12:40Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:attached":{},"f:attachmentMetadata":{".":{},"f:devicePath":{}}}},"subresource":"status"}]},"spec":{"attacher":"ebs.csi.aws.com","source":{"persistentVolumeName":"pvc-29207b41-a6ac-4cd8-b0e6-1c22c0e88448"},"nodeName":"ip-x-y-z-1.eu-west-3.compute.internal"},"status":{"attached":true,"attachmentMetadata":{"devicePath":"/dev/xvdaa"}}} node="ip-x-y-z-1.eu-west-3.compute.internal"
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:32.129445    1590 kuberuntime_container.go:745] "Killing container with a grace period" pod="kube-system/ebs-csi-node-chr4q" podUID="6287fb9f-59db-46df-aa4c-5f94533d83eb" containerName="ebs-plugin" containerID="containerd://de4b31e905e6d02b4f087e40f40b5367a6a246ac6160f49298c861fe10738655" gracePeriod=210
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:32.285293    1590 nodeshutdown_manager_linux.go:395] "Shutdown manager finished killing pod" pod="kube-system/ebs-csi-node-chr4q"
Jan  5 11:05:32 ip-x-y-z-1 containerd: time="2024-01-05T11:05:32.243663316Z" level=info msg="shim disconnected" id=2eb763b5275f9a048e3b9677f57c4f433bca173a5ea64e914dbcbb25f40da5f1 namespace=k8s.io
Jan  5 11:05:32 ip-x-y-z-1 containerd: time="2024-01-05T11:05:32.245390370Z" level=warning msg="cleaning up after shim disconnected" id=2eb763b5275f9a048e3b9677f57c4f433bca173a5ea64e914dbcbb25f40da5f1 namespace=k8s.io
Jan  5 11:05:32 ip-x-y-z-1 containerd: time="2024-01-05T11:05:32.245680821Z" level=info msg="cleaning up dead shim" namespace=k8s.io
Jan  5 11:05:32 ip-x-y-z-1 containerd: time="2024-01-05T11:05:32.278579357Z" level=info msg="TearDown network for sandbox \"2eb763b5275f9a048e3b9677f57c4f433bca173a5ea64e914dbcbb25f40da5f1\" successfully"
Jan  5 11:05:32 ip-x-y-z-1 containerd: time="2024-01-05T11:05:32.278927040Z" level=info msg="StopPodSandbox for \"2eb763b5275f9a048e3b9677f57c4f433bca173a5ea64e914dbcbb25f40da5f1\" returns successfully"
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:32.285293    1590 nodeshutdown_manager_linux.go:395] "Shutdown manager finished killing pod" pod="kube-system/ebs-csi-node-chr4q"
Jan  5 11:05:32 ip-x-y-z-1 systemd-logind: System is powering down.
Jan  5 11:05:32 ip-x-y-z-1 kubelet: I0105 11:05:32.287070    1590 nodeshutdown_manager_linux.go:327] "Shutdown manager completed processing shutdown event, node will shutdown shortly"
Jan  5 11:05:32 ip-x-y-z-1 systemd: Stopped target Graphical Interface.```

@torredil
Copy link
Member

Hi @martysweet @egeturgay - appreciate the comprehensive feedback, this is really helpful 👍

A few things to cover here:

Our understanding is that without the kubelet shutdownGracePeriod and shutdownGracePeriodCriticalPods options set, preStop would have little to no effect.

These were the issues I've come across around pre-stop-hook and I don't think It will work as intended when an instance is terminated by an AWS call unless the node is drained first (which is the time I'm not sure if the pre-stop-hook is adding any value).

First and most importantly: the pre-stop LCH primarily aims to address the following bug in Kubernetes: Graceful node shutdown doesn't wait for volume teardown #115148.

Michelle explains it very well - due to a race condition between Node-Shutdown Manager and Volume Manager in Kubelet, users may experience significant delays in reattaching volumes not cleanly unmounted, due to the absence of a mechanism that accounts for volume teardown during the kubectl drain process: the recommend method for gracefully terminating instances.

That is to say, the pre-stop LCH is useful and important in preventing delayed attachments - especially when using the driver with Karpenter and Spot Instances - even when shutdownGracePeriod and shutdownGracePeriodCriticalPods have not been configured. When an instance is gracefully terminated (drained), the pre-stop LCH will be executed before the EBS CSI node pod receives a SIGTERM signal and thus prevent the pod from terminating for a period of 30 seconds (as defined by terminationGracePeriodSeconds in the pod specification) if there are VolumeAttachment objects associated with the node.

Separately (important distinction), configuring shutdownGracePeriod and shutdownGracePeriodCriticalPods is highly recommended for unexpected shutdown scenarios:

Kubelet makes use of this mechanism to ensure your pods will be terminated cleanly. When the kubelet starts, it acquires a systemd delay-type inhibitor lock. When the system is about to shut down, the kubelet can delay that shutdown for a configurable, short duration utilizing the delay-type inhibitor lock it acquired earlier. This gives your pods extra time to terminate. As a result, even during unexpected shutdowns, your application will receive a SIGTERM, preStop hooks will execute.

Regarding the Kubelet config point:

shutdownGracePeriod kubelet configuration and systemd logind.conf.d is absent in Amazon Linux. ([EKS] [request]: Systemd upgrade to > v239 for enabling node graceful shutdown aws/containers-roadmap#2057). We have configured these ourselves.

This is addressed by awslabs/amazon-eks-ami#1544

The cluster role provided in the helm chart (and what's being deployed as part of AWS EBS add-on is lacking the "watch" permissions (that is needed by the informer in pre-hook). That's what causing these errors reported in the previous comment "E0928 14:56:30.905648 362 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.2/tools/cache/reflector.go:229: Failed to watch *v1.VolumeAttachment: unknown (get volumeattachments.storage.k8s.io)

Thank you for pointing this out! I will address this in a PR shortly.

As you will see from the logs below, the PVCs are unmounted/detached successfully and yet pre-stop-hook still finds the volume attachments from the k8s API (I also checked the API directly, they're left over until controller manager kicks in 6 minutes later).

That message observed in the logs is misleading - Kubelet is not responsible for detaching volumes, the Attach/Detach controller is, see the relevant code here: https://github.com/kubernetes/kubernetes/blob/3f7a50f38688eb332e2a1b013678c6435d539ae6/pkg/kubelet/volumemanager/reconciler/reconciler_common.go#L298

The pre-stop LCH correctly finds volume attachments from the K8s API because the associated volumes have not been fully detached from the instance.

@torredil
Copy link
Member

Addressed by #1895, which will be part of the upcoming 1.27.0 release. If anyone runs into further hiccups while using the latest version of the driver please report it and provide all the relevant context / information for triaging. Thank you!

/close

@k8s-ci-robot
Copy link
Contributor

@torredil: Closing this issue.

In response to this:

Addressed by #1895, which will be part of the upcoming 1.27.0 release. If anyone runs into further hiccups while using the latest version of the driver please report it and provide all the relevant context / information for triaging. Thank you!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@levanlongktmt
Copy link

levanlongktmt commented Mar 2, 2024

@torredil I still get problem on newest version 1.28, after node has been terminate by karpenter, it still waited 6 minutes before be detached and useable in another node.


What did I do

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: dev
spec:
  version: 8.12.2
  volumeClaimDeletePolicy: DeleteOnScaledownAndClusterDeletion
  updateStrategy:
    changeBudget:
      maxSurge: 2
      maxUnavailable: 1
  nodeSets:
  - name: default
    count: 3
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data # Do not change this name unless you set up a volume mount for the data path.
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 25Gi
    podTemplate:
      spec:
        nodeSelector:
          kubernetes.io/arch: arm64
          topology.kubernetes.io/zone: eu-central-1a
        containers:
        - name: elasticsearch
          env:
            - name: ES_JAVA_OPTS
              value: -Xms4g -Xmx4g
          resources:
            requests:
              memory: 5Gi
              cpu: 1
            limits:
              memory: 5Gi
              cpu: 2
    config:
      node.store.allow_mmap: false
  • Trigger spot instance termination or just delete 1 ec2 instance
  • The node has been removed in k8s very quick, old pod has been Terminated and k8s start new pod
  • Pod stuck in 6 minutes with error Multi-Attach error for volume "pvc-xxxxx-xxxxx-xxx" Volume is already exclusively attached to one node and can't be attached to another
  • After 6 minutes new pod can attach volume
  • Here is logs of ebs-csi-controller
I0302 06:12:10.305080       1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1" devicePath="/dev/xvdaa"
<< at 06:14 the node has been terminated but no logs here >>
I0302 06:20:18.486042       1 controller.go:471] "ControllerUnpublishVolume: detaching" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:18.584737       1 cloud.go:792] "DetachDisk: called on non-attached volume" volumeID="vol-02b33186429105461"
I0302 06:20:18.807752       1 controller.go:474] "ControllerUnpublishVolume: attachment not found" volumeID="vol-02b33186429105461" nodeID="i-0715ec90e486bb8a1"
I0302 06:20:19.124534       1 controller.go:421] "ControllerPublishVolume: attaching" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb"
I0302 06:20:20.635493       1 controller.go:430] "ControllerPublishVolume: attached" volumeID="vol-02b33186429105461" nodeID="i-0ee2a470112401ffb" devicePath="/dev/xvdaa"

@k8s-ci-robot
Copy link
Contributor

@levanlongktmt: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants