kubectl drain ignores terminating pods when reboot master node #1532

LuckyChen666 · 2023-12-11T07:33:23Z

What happened:
When I evict and update one of the worker nodes, a master node is also rebooting at this time. At this time, the evicted pod needs to wait for terminationGracePeriodSeconds before the eviction is successful.
Then something strange happened, the eviction was successful, and the currently terminating pod was ignored.

kubectl/pkg/drain/drain.go

Line 420 in b359351

    
           if apierrors.IsNotFound(err) || (p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID) {

I found that because the master node was rebooting at this time, obtaining the pod's interface failed. At this time, the above judgment p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID was met, so the current pod was ignored.

What you expected to happen:
Judgment here should be strictly controlled，such as：

if apierrors.IsNotFound(err)  {
  if (p != nil && p.ObjectMeta.UID != pod.ObjectMeta.UID) {
     ...
  }else{
   ...
  }
} else {
...
}

Maybe there is a better way to fix it.
How to reproduce it (as minimally and precisely as possible):

You can try to drain a worker node which owns terminating pod for a long time. Then reboot master node.

The text was updated successfully, but these errors were encountered:

eddiezane · 2024-01-03T17:53:43Z

I am assuming that if the client-go Pod Get function is returning an err that p is nil by Go convention. If that is the case then I believe the current logic is correct?

@LuckyChen666 thoughts?

eddiezane · 2024-01-03T18:02:24Z

Actually no that assumption is incorrect. client go's getters use named returns and do alloc a struct.

So in this case p is never nil so our current logic is incorrect.

https://github.com/kubernetes/kubernetes/blob/cacdf6c70728086bfcab387b901d9765cbcbfb6b/staging/src/k8s.io/client-go/kubernetes/typed/core/v1/pod.go#L76

/triage accepted

eddiezane · 2024-01-03T18:03:42Z

/assign @brianpursley

LuckyChen666 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 11, 2023

LuckyChen666 changed the title ~~kubectl drain ignores some pods when reboot master node~~ kubectl drain ignores terminating pods when reboot master node Dec 11, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 3, 2024

k8s-ci-robot assigned brianpursley Jan 3, 2024

brianpursley mentioned this issue Jan 4, 2024

Fix kubectl drain error handling bug. kubernetes/kubernetes#122574

Merged

k8s-ci-robot closed this as completed in kubernetes/kubernetes#122574 Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl drain ignores terminating pods when reboot master node #1532

kubectl drain ignores terminating pods when reboot master node #1532

LuckyChen666 commented Dec 11, 2023 •

edited

eddiezane commented Jan 3, 2024

eddiezane commented Jan 3, 2024

eddiezane commented Jan 3, 2024

kubectl drain ignores terminating pods when reboot master node #1532

kubectl drain ignores terminating pods when reboot master node #1532

Comments

LuckyChen666 commented Dec 11, 2023 • edited

eddiezane commented Jan 3, 2024

eddiezane commented Jan 3, 2024

eddiezane commented Jan 3, 2024

LuckyChen666 commented Dec 11, 2023 •

edited