feat: disruption.terminationGracePeriod #916

wmgroot · 2023-12-27T23:31:43Z

Fixes #743

Description
This is a PR to add support for an equivalent to CAPI's nodeDrainTimeout feature for Karpenter.
I've taken the feedback provided in #834.

This implementation relies on K8s 1.26 for default support of the non-graceful node shutdown Taint.
https://kubernetes.io/docs/concepts/architecture/nodes/#non-graceful-node-shutdown

Additionally, this PR implements the behavior described in https://github.com/jmdeal/karpenter/blob/disruption-grace-design/designs/termination-grace-period-extension.md. This allows the feature to be a comprehensive solution that addresses issues with disruption logic that previously prevented cluster administrators from enforcing drain timeout behavior for eventual disruption cases (such as node expiration and drift, but not consolidation).

Pods are also deleted preemptively at time T = node.expirationTime - pod.terminationGracePeriodSeconds. This attempts to ensure that a pod always has its full terminationGracePeriodSeconds worth of time to terminate gracefully before the node expires.

How was this change tested?
The PR includes unit tests for all added logic.

I've also tested this by building a custom Karpenter image and running it in our live clusters with various test cases by deleting nodes and nodeclaims directly to verify pods are are successfully deleted and the node is reaped when it reaches the desired expiration time.

Pods at their PDB limit.
Pods still running a prestop hook.
Pods with do-not-disrupt annotations.

Example

Initial Resources

A deployment with a strict PDB (maxUnavailable: 1)
A preStop hook sleep of 120 seconds to delay pod termination.
A pod terminationGracePod of 300 seconds. Pods should be deleted on the node 5m after node termination starts (10m - 5m).
A startupProbe delay of 180s to delay pod readiness.

$ kubectl get nodepool qa -o yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
spec:
  template:
    terminationGracePeriod: 10m

$ kubectl get deploy -n default        
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
hello-world-nginx   3/4     4            3           70d

$ kubectl get pdb -n default                                     
NAME                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
hello-world-nginx   N/A             1                 1                     64d

$ kubectl get pod -n default -o yaml hello-world-nginx-6965d88b66-7ks7t
...
spec:
  containers:
  - lifecycle:
      preStop:
        exec:
          command:
          - sleep
          - "120"
    name: hello-world-nginx
    startupProbe:
      exec:
        command:
        - echo
        - done
      failureThreshold: 3
      initialDelaySeconds: 180
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 1
  nodeName: ip-10-115-227-214.us-east-2.compute.internal
  terminationGracePeriodSeconds: 300

Results

$ kubectl delete node ip-10-115-227-214.us-east-2.compute.internal                                                                  
node "ip-10-115-227-214.us-east-2.compute.internal" deleted
(blocking...)

// A warning event is applied to the node noting that pods will be forcibly removed via the out-of-service taint in 10 minutes.
$ kubectl describe node ip-10-115-227-214.us-east-2.compute.internal

Events:
  Type     Reason                            Age                  From                   Message
  ----     ------                            ----                 ----                   -------
  Normal   DisruptionBlocked                 40m (x6 over 57m)    karpenter              Cannot disrupt Node: Nominated for a pending pod
  Normal   DisruptionBlocked                 38m (x4 over 49m)    karpenter              Cannot disrupt Node: PDB "default/hello-world-nginx" prevents pod evictions
  Warning  FailedDraining                    12s (x2 over 2m12s)  karpenter              Failed to drain node, 7 pods are waiting to be evicted
  Warning  TerminationGracePeriodExpiration  12s (x2 over 2m12s)  karpenter              Node will have the out-of-service taint applied at: 2024-01-29 21:05:07 +0000 UTC (TerminationGracePeriod: &Duration{Duration:10m0s,})

// 1 new pod waits for a replacement node to become available. 1 pod remains in its preStop sleep before termination.
$ kubectl get pod -n default -o wide
NAME                                 READY   STATUS        RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-8md5n   1/1     Running       0          52m   10.115.147.149   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-cn8vg   1/1     Running       0          38m   10.115.131.138   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-khk84   0/1     Pending       0          18s   <none>           <none>                                         <none>           <none>
hello-world-nginx-6965d88b66-xgn6t   1/1     Running       0          41m   10.115.136.144   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-zgwdf   1/1     Terminating   0          44m   10.115.169.134   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>

// The first new pod schedules and begins waiting for its startupProbe delay.
$ kubectl get pod -n default -o wide
NAME                                 READY   STATUS        RESTARTS   AGE   IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-8md5n   1/1     Running       0          53m   10.115.147.149   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-cn8vg   1/1     Running       0          38m   10.115.131.138   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-khk84   0/1     Running       0          66s   10.115.172.60    ip-10-115-231-43.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-xgn6t   1/1     Running       0          41m   10.115.136.144   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-zgwdf   1/1     Terminating   0          45m   10.115.169.134   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>

// The first terminating pod completes its preStop sleep and terminates.
$ kubectl get pod -n default -o wide                                
NAME                                 READY   STATUS    RESTARTS   AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-8md5n   1/1     Running   0          55m     10.115.147.149   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-cn8vg   1/1     Running   0          40m     10.115.131.138   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-khk84   0/1     Running   0          2m59s   10.115.172.60    ip-10-115-231-43.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-xgn6t   1/1     Running   0          43m     10.115.136.144   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>

// The first new pod becomes ready, a second pod is scheduled on the new node and a second pod begins termination.
$ kubectl get pod -n default -o wide
NAME                                 READY   STATUS        RESTARTS   AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-8md5n   1/1     Running       0          56m     10.115.147.149   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-cn8vg   1/1     Running       0          42m     10.115.131.138   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-hztpq   0/1     Running       0          43s     10.115.138.31    ip-10-115-237-41.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-khk84   1/1     Running       0          4m42s   10.115.172.60    ip-10-115-231-43.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-xgn6t   1/1     Terminating   0          45m     10.115.136.144   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>

// 5m have passed, and all remaining pods are deleted, triggering their preStop sleeps.
// New pod #3 and #4 are immediately scheduled to the new node and being their startupProbe delay.
// At this point, our maxUnavailable PDB has been exceeded by 2, since we expect to have 3/4 pods ready at all times.
$ kubectl get pod -n default -o wide
NAME                                 READY   STATUS        RESTARTS   AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-8md5n   1/1     Terminating   0          57m     10.115.147.149   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-cn8vg   1/1     Terminating   0          43m     10.115.131.138   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-f6wfm   0/1     Running       0          17s     10.115.182.48    ip-10-115-236-223.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-hp7qc   0/1     Running       0          17s     10.115.188.220   ip-10-115-236-223.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-hztpq   0/1     Running       0          78s     10.115.138.31    ip-10-115-237-41.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-khk84   1/1     Running       0          5m17s   10.115.172.60    ip-10-115-231-43.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-xgn6t   1/1     Terminating   0          46m     10.115.136.144   ip-10-115-227-214.us-east-2.compute.internal   <none>           <none>

// All pods become ready on the new node, and all terminating pods terminate cleanly, since their terminationGracePeriodSeconds of 300 is greater than the preStop sleep of 120.
$ kubectl get pod -n default -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP               NODE                                           NOMINATED NODE   READINESS GATES
hello-world-nginx-6965d88b66-f6wfm   1/1     Running   0          3m30s   10.115.182.48    ip-10-115-236-223.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-hp7qc   1/1     Running   0          3m30s   10.115.188.220   ip-10-115-236-223.us-east-2.compute.internal   <none>           <none>
hello-world-nginx-6965d88b66-hztpq   1/1     Running   0          4m31s   10.115.138.31    ip-10-115-237-41.us-east-2.compute.internal    <none>           <none>
hello-world-nginx-6965d88b66-khk84   1/1     Running   0          8m30s   10.115.172.60    ip-10-115-231-43.us-east-2.compute.internal    <none>           <none>

// Relevant karpenter logs. The node was terminated cleanly after the pods gracefully terminated from pod deletion, despite the PDB violation.
{"level":"INFO","time":"2024-01-29T21:00:07.046Z","logger":"controller.node.termination","message":"pods to delete: 3","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:00:07.046Z","logger":"controller.node.termination","message":"delete pod: default/hello-world-nginx-6965d88b66-cn8vg","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:00:07.046Z","logger":"controller.node.termination","message":"delete pod: default/hello-world-nginx-6965d88b66-8md5n","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:00:07.046Z","logger":"controller.node.termination","message":"delete pod: default/hello-world-nginx-6965d88b66-xgn6t","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}

{"level":"INFO","time":"2024-01-29T21:01:08.476Z","logger":"controller.node.termination","message":"pods to delete: 2","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:01:08.476Z","logger":"controller.node.termination","message":"delete pod: default/hello-world-nginx-6965d88b66-cn8vg","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:01:08.476Z","logger":"controller.node.termination","message":"delete pod: default/hello-world-nginx-6965d88b66-8md5n","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}

{"level":"INFO","time":"2024-01-29T21:02:08.821Z","logger":"controller.node.termination","message":"pods to delete: 1","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:02:08.821Z","logger":"controller.node.termination","message":"delete pod: cis20-custom-metrics/cis20-custom-metrics-srsxh","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:02:09.319Z","logger":"controller.node.termination","message":"deleted node","commit":"e23d58b-dirty","node":"ip-10-115-227-214.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T21:02:09.685Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"e23d58b-dirty","nodeclaim":"worker-qa-2dd7r","node":"ip-10-115-227-214.us-east-2.compute.internal","provider-id":"aws:///us-east-2c/i-0fba97440b38155cc"}

If the pods don't terminate cleanly, the terminator will eventually attempt a delete on all pods running on the node based on their terminationGracePeriodSeconds, including daemonsets and other infrastructure pods.

{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"pods to delete: 11","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/efs-csi-node-7vbnd","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/node-local-dns-xhpxn","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/ebs-csi-node-qcr7w","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: consul/cv-consul-consul-client-96pf8","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/aws-node-termination-handler-wbtgh","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/cilium-node-init-cc4jj","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: falcon-system--regular/falcon-sensor-regular-xdghd","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/cluster-config-maps-m42wd","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: log-shipper/log-shipper-fluent-bit-zchhk","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: cis20-custom-metrics/cis20-custom-metrics-mhd24","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:51.875Z","logger":"controller.node.termination","message":"delete pod: kube-system/cilium-mp5v9","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:52.318Z","logger":"controller.node.termination","message":"deleted node","commit":"e23d58b-dirty","node":"ip-10-115-231-38.us-east-2.compute.internal"}
{"level":"INFO","time":"2024-01-29T20:09:52.652Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"e23d58b-dirty","nodeclaim":"worker-qa-5vzbz","node":"ip-10-115-231-38.us-east-2.compute.internal","provider-id":"aws:///us-east-2c/i-07349f09ac0816f66"}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2023-12-27T23:31:53Z

Hi @wmgroot. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/controllers/node/termination/terminator/terminator.go

github-actions · 2024-01-13T12:01:31Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

garvinp-stripe · 2024-01-18T22:35:00Z

Small comment is it possible to update the status of Nodeclaim when a node start getting forcibly drained?

wmgroot · 2024-07-09T21:20:02Z

Is this just for drift or for all disruptions (delete, replace etc)?

The TGP timeout applies to all forms of node disruption. It does not allow disruption to be initiated for consolidation if the node is blocked by a do-not-disrupt pod or a pod with a blocked PDB.

njtran

A couple more comments on the test coverage. I think there's the 2x2x2 combinatorics of graceful vs forceful, no tgp vs tgp, and pdb vs do-not-disrupt that made it hard to track the tests. you could consider doing a DescribeTable if that helps. Otherwise, just some comments on error wrapping.

pkg/controllers/disruption/suite_test.go

pkg/controllers/node/termination/suite_test.go

pkg/controllers/node/termination/terminator/terminator.go

pkg/controllers/nodeclaim/termination/controller.go

njtran

Looks good to me! Excited to get this in, going to test on my own end before merging.

…s with do-not-disrupt pods and blocking PDBs Signed-off-by: wmgroot <wmgroot@gmail.com>

…odeClaim.spec Signed-off-by: wmgroot <wmgroot@gmail.com>

njtran

/lgtm
/approve

k8s-ci-robot · 2024-07-12T22:36:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njtran, wmgroot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [njtran]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 27, 2023

k8s-ci-robot requested review from jackfrancis and tallaxes December 27, 2023 23:31

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 27, 2023

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 27, 2023

wmgroot force-pushed the disruption-termination-grace-period branch from e94c2e4 to 47ba900 Compare December 27, 2023 23:36

wmgroot commented Dec 27, 2023

View reviewed changes

pkg/controllers/node/termination/terminator/terminator.go Outdated Show resolved Hide resolved

wmgroot mentioned this pull request Dec 29, 2023

docs: RFC for disruption.terminationGracePeriod feature #834

Closed

wmgroot force-pushed the disruption-termination-grace-period branch from 47ba900 to fb8ad85 Compare December 29, 2023 21:57

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 29, 2023

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2024

github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

wmgroot force-pushed the disruption-termination-grace-period branch from fb8ad85 to f4ae490 Compare January 23, 2024 20:33

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2024

wmgroot force-pushed the disruption-termination-grace-period branch from f4ae490 to ff8ed69 Compare January 30, 2024 21:16

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2024

wmgroot force-pushed the disruption-termination-grace-period branch from ff8ed69 to 95303b7 Compare January 31, 2024 04:39

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 31, 2024

wmgroot force-pushed the disruption-termination-grace-period branch 4 times, most recently from c7298be to c18da4e Compare January 31, 2024 16:22

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2024

wmgroot force-pushed the disruption-termination-grace-period branch 2 times, most recently from 949cb93 to 390a8b1 Compare July 10, 2024 20:06

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2024

wmgroot force-pushed the disruption-termination-grace-period branch from 390a8b1 to 8dfd5f9 Compare July 10, 2024 20:10

k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 10, 2024

wmgroot force-pushed the disruption-termination-grace-period branch 3 times, most recently from 8a58b20 to 50c89c5 Compare July 11, 2024 17:47

njtran reviewed Jul 12, 2024

View reviewed changes

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2024

wmgroot force-pushed the disruption-termination-grace-period branch from 50c89c5 to 7dc0512 Compare July 12, 2024 17:51

njtran reviewed Jul 12, 2024

View reviewed changes

wmgroot force-pushed the disruption-termination-grace-period branch from 7dc0512 to 5423764 Compare July 12, 2024 18:31

njtran force-pushed the disruption-termination-grace-period branch from 5423764 to 965cfbf Compare July 12, 2024 21:19

wmgroot and others added 4 commits July 12, 2024 14:20

feat: nodeclaim.spec.terminationGracePeriod, allow disruption of node…

046bcda

…s with do-not-disrupt pods and blocking PDBs Signed-off-by: wmgroot <wmgroot@gmail.com>

chore: move terminationGracePeriod from NodePool.spec.disruption to N…

5374cd8

…odeClaim.spec Signed-off-by: wmgroot <wmgroot@gmail.com>

fixup

19dfc03

move annotation to nodeclaim

3e6aafc

njtran force-pushed the disruption-termination-grace-period branch from 965cfbf to 3e6aafc Compare July 12, 2024 21:20

cleanup node to nodeclaim references

0d44ceb

njtran reviewed Jul 12, 2024

View reviewed changes

k8s-ci-robot assigned njtran Jul 12, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 12, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 12, 2024

k8s-ci-robot merged commit 108c1ef into kubernetes-sigs:main Jul 12, 2024
17 checks passed

This was referenced Jul 15, 2024

Can not properly delete NodeClass aws/karpenter-provider-aws#6462

Open

fix: Ensure persistent volumes are detached before deleting node #1294

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: disruption.terminationGracePeriod #916

feat: disruption.terminationGracePeriod #916

wmgroot commented Dec 27, 2023 •

edited

Loading

k8s-ci-robot commented Dec 27, 2023

github-actions bot commented Jan 13, 2024

garvinp-stripe commented Jan 18, 2024

wmgroot commented Jul 9, 2024

njtran left a comment

njtran left a comment

njtran left a comment

k8s-ci-robot commented Jul 12, 2024

feat: disruption.terminationGracePeriod #916

feat: disruption.terminationGracePeriod #916

Conversation

wmgroot commented Dec 27, 2023 • edited Loading

Example

k8s-ci-robot commented Dec 27, 2023

github-actions bot commented Jan 13, 2024

garvinp-stripe commented Jan 18, 2024

wmgroot commented Jul 9, 2024

njtran left a comment

Choose a reason for hiding this comment

njtran left a comment

Choose a reason for hiding this comment

njtran left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jul 12, 2024

wmgroot commented Dec 27, 2023 •

edited

Loading