Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl/drain: add option skip-wait-for-delete-timeout #85577

Merged

Conversation

@michaelgugino
Copy link
Contributor

michaelgugino commented Nov 24, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind api-change
/kind bug
/kind cleanup
/kind design
/kind documentation
/kind failing-test

/kind feature

/kind flake

What this PR does / why we need it:
Currently, some circumstances may cause waitForDelete to
never succeed after the pod has been marked for deletion.
In particular, Nodes that are unresponsive and have
pods with local-storage will not be able to
successfully drain.

We should allow drain to ignore pods that have a
DeletionTimestamp older than a user-provided age.
This will allow controllers utilizing kubectl/drain
to optionally account for a pod that cannot be
removed due to a misbehaving node.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubectl/drain: add skip-wait-for-delete-timeout option.
If pod DeletionTimestamp older than N seconds, skip waiting for the pod.  Seconds must be greater than 0 to skip.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://github.com/kubernetes/enhancements/issues/1339
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 24, 2019

Hi @michaelgugino. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@seans3

This comment has been minimized.

Copy link
Contributor

seans3 commented Dec 2, 2019

/ok-to-test

@seans3

This comment has been minimized.

Copy link
Contributor

seans3 commented Dec 2, 2019

/priority important-longterm

@seans3

This comment has been minimized.

Copy link
Contributor

seans3 commented Dec 2, 2019

/assign

@michaelgugino michaelgugino force-pushed the mgugino-upstream-stage:unready-node-timeout branch 3 times, most recently from 1e94ccf to 9727c38 Dec 3, 2019
func (d *Helper) makeFilters() []podFilter {
return []podFilter{
d.skipDeletedFilter,

This comment has been minimized.

Copy link
@michaelgugino

michaelgugino Dec 3, 2019

Author Contributor

Previously I had this filter appended to the end of the list, but that broke the tests here:
./staging/src/k8s.io/kubectl/pkg/cmd/drain/

The effect of this was very non-obvious, so I added the comment above.

@@ -203,6 +207,9 @@ func (d *Helper) localStorageFilter(pod corev1.Pod) podDeleteStatus {
return makePodDeleteStatusWithError(localStorageFatal)
}

// TODO: this warning gets dropped by subsequent filters;
// consider accounting for multiple warning conditions or at least
// preserving the last warning message.
return makePodDeleteStatusWithWarning(true, localStorageWarning)

This comment has been minimized.

Copy link
@michaelgugino

michaelgugino Dec 3, 2019

Author Contributor

The user will never see this message. The next filter will most likely return 'ok' and the warning is swallowed. TBD if anyone actually cares about this.

This comment has been minimized.

Copy link
@michaelgugino

michaelgugino Dec 3, 2019

Author Contributor

FYI, this was not caused by this patch set.

Copy link
Contributor

seans3 left a comment

This looks good. I appreciate the clear focus of the PR, and I also appreciate the unit tests. A couple small nits.

staging/src/k8s.io/kubectl/pkg/drain/drain.go Outdated Show resolved Hide resolved
DeleteLocalData bool
Selector string
PodSelector string
SkipWaitForDeleteTimeoutSeconds int

This comment has been minimized.

Copy link
@seans3

seans3 Dec 3, 2019

Contributor

I think we should add the KEP description of this feature as a comment here (the one that begins: "Alternative to "Unready Node Timeout" above...)

This comment has been minimized.

Copy link
@michaelgugino

michaelgugino Dec 3, 2019

Author Contributor

Done.

@michaelgugino michaelgugino force-pushed the mgugino-upstream-stage:unready-node-timeout branch from 9727c38 to 4ff5817 Dec 3, 2019
if test.ctxTimeoutEarly {
ctx, _ = context.WithTimeout(ctx, 100*time.Millisecond)
ctx, cancel = context.WithTimeout(ctx, 100*time.Millisecond)

This comment has been minimized.

Copy link
@michaelgugino

michaelgugino Dec 3, 2019

Author Contributor

go vet complains if you don't defer this cancel even though we should never need to, thus the ugly var declarations above.

@seans3

This comment has been minimized.

Copy link
Contributor

seans3 commented Dec 4, 2019

/assign @cheftako

Currently, some circumstances may cause waitForDelete to
never succeed after the pod has been marked for deletion.
In particular, Nodes that are unresponsive and have
pods with local-storage will not be able to
successfully drain.

We should allow drain to ignore pods that have a
DeletionTimestamp older than a user-provided age.
This will allow controllers utilizing kubectl/drain
to optionally account for a pod that cannot be
removed due to a misbehaving node.
@michaelgugino michaelgugino force-pushed the mgugino-upstream-stage:unready-node-timeout branch from 4ff5817 to da53044 Dec 6, 2019
@seans3

This comment has been minimized.

Copy link
Contributor

seans3 commented Dec 9, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 9, 2019
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 9, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: michaelgugino, seans3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 93eb2e3 into kubernetes:master Dec 9, 2019
13 of 15 checks passed
13 of 15 checks passed
pull-kubernetes-kubemark-e2e-gce-big Job triggered.
Details
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation michaelgugino authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Dec 9, 2019
alexander-demichev added a commit to alexander-demichev/machine-api-operator that referenced this pull request Jan 13, 2020
This PR moves code from 2 upstream PRs to this repo. It adds support for context(used to cancel waitForDelete) and
allows to ignore pods that have a DeletionTimestamp older than a user-provided.
See below PRs for more detailed description:

kubernetes/kubernetes#85577
kubernetes/kubernetes#85574
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.