Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upkubectl: wait for all errors and successes on podEviction #64896
Conversation
k8s-ci-robot
added
the
release-note-none
label
Jun 7, 2018
k8s-ci-robot
requested a review
from
sjenning
Jun 7, 2018
k8s-ci-robot
added
size/S
cncf-cla: yes
labels
Jun 7, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
/retest |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
/assign @apelisse |
k8s-ci-robot
assigned
apelisse
Jun 12, 2018
| @@ -613,16 +620,21 @@ func (o *DrainOptions) evictPods(pods []corev1.Pod, policyGroupVersion string, g | ||
| for { | ||
| select { | ||
| case err := <-errCh: | ||
| return err |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
frobware
Jun 14, 2018
Contributor
Do we not now lose the underlying reason? Looking below I see we just do "Drain did not complete...".
frobware
Jun 14, 2018
Contributor
Do we not now lose the underlying reason? Looking below I see we just do "Drain did not complete...".
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rphillips
Jun 14, 2018
Member
Since there can be N different errors, they get glogged and counted. A summary error is returned on line 635. Do you think we should capture all the error messages?
rphillips
Jun 14, 2018
Member
Since there can be N different errors, they get glogged and counted. A summary error is returned on line 635. Do you think we should capture all the error messages?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
liggitt
Jun 26, 2018
Member
outputting via glog like this rather than actually returning errors means DrainOptions can't easily be used programmatically or composed into larger commands
liggitt
Jun 26, 2018
Member
outputting via glog like this rather than actually returning errors means DrainOptions can't easily be used programmatically or composed into larger commands
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
liggitt
Jun 26, 2018
Member
print to o.ErrOut if you're going to print here, and I would probably accumulate and return the actual errors
liggitt
Jun 26, 2018
Member
print to o.ErrOut if you're going to print here, and I would probably accumulate and return the actual errors
k8s-ci-robot
added
needs-rebase
and removed
needs-rebase
labels
Jun 24, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sjenning
Jun 26, 2018
Contributor
/cc @kubernetes/sig-cli-maintainers
This fix will allow kubectl drain to complete successfully when a node is draining.
The particular issue we hit is if the node has a pod in a terminating namespace, the drain will immediately fail with a "can't modify resource in a terminating namespace" error and fail to remove the remaining pods.
With this PR, the drain still fails, but not before trying to remove every pod on the node. This is a better situation in that all pods will be in a terminating state after the first drain.
|
/cc @kubernetes/sig-cli-maintainers
The particular issue we hit is if the node has a pod in a terminating namespace, the drain will immediately fail with a "can't modify resource in a terminating namespace" error and fail to remove the remaining pods. With this PR, the drain still fails, but not before trying to remove every pod on the node. This is a better situation in that all pods will be in a terminating state after the first drain. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
sjenning
Jun 26, 2018
Contributor
using wider cc that I just found out about
/cc @kubernetes/sig-cli-pr-reviews
|
using wider cc that I just found out about |
k8s-ci-robot
added
sig/cli
size/M
and removed
size/S
labels
Jun 26, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rphillips
Jun 26, 2018
Member
@liggitt I refactored this PR to collect all the errors and make it more reusable.
|
@liggitt I refactored this PR to collect all the errors and make it more reusable. |
k8s-ci-robot
added
size/L
and removed
size/M
labels
Jun 26, 2018
| // derived from https://github.com/golang/appengine/blob/master/errors.go | ||
| // MultiError is returned by batch operations. | ||
| type MultiError []error |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
adohe
Jun 27, 2018
Member
we already have AggregateError, what's the difference? seems no need to define multi error here.
adohe
Jun 27, 2018
Member
we already have AggregateError, what's the difference? seems no need to define multi error here.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rphillips
Jun 27, 2018
Member
I didn't know about Aggregate. I updated the PR to use the Aggregate API. Thanks!
rphillips
Jun 27, 2018
Member
I didn't know about Aggregate. I updated the PR to use the Aggregate API. Thanks!
k8s-ci-robot
added
size/M
and removed
size/L
labels
Jun 27, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
/test pull-kubernetes-integration |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
/test pull-kubernetes-e2e-gce |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
thanks! |
k8s-ci-robot
assigned
sjenning
Jul 3, 2018
k8s-ci-robot
added
the
lgtm
label
Jul 3, 2018
| @@ -571,39 +572,41 @@ func (o *DrainOptions) deleteOrEvictPods(pods []corev1.Pod) error { | ||
| } | ||
| } | ||
| // evictPods return |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
| } | ||
| case <-globalTimeoutCh: | ||
| return fmt.Errorf("Drain did not complete within %v", globalTimeout) | ||
| return utilerrors.NewAggregate(errors) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
apelisse
Jul 3, 2018
Member
You're losing the timeout error information here, which means if it times out before the first error is reported, that's going to become a success. Maybe consider just returning timeout error in that case (as it was done before)?
apelisse
Jul 3, 2018
Member
You're losing the timeout error information here, which means if it times out before the first error is reported, that's going to become a success. Maybe consider just returning timeout error in that case (as it was done before)?
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
k8s-ci-robot
removed
the
lgtm
label
Jul 3, 2018
| } | ||
| case <-globalTimeoutCh: | ||
| return fmt.Errorf("Drain did not complete within %v", globalTimeout) | ||
| } | ||
| if doneCount == numPods { |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
apelisse
Jul 3, 2018
Member
This condition could have been part of the "for" statement: for doneCount < numPods { ... }.
apelisse
Jul 3, 2018
Member
This condition could have been part of the "for" statement: for doneCount < numPods { ... }.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
apelisse
Jul 3, 2018
Member
Thanks for fixing quickly! Feel free to fix the "while" loop (now or later).
/lgtm
/approve
|
Thanks for fixing quickly! Feel free to fix the "while" loop (now or later). |
k8s-ci-robot
added
lgtm
approved
labels
Jul 3, 2018
k8s-ci-robot
removed
the
lgtm
label
Jul 3, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rphillips
Jul 3, 2018
Member
@apelisse I fixed the for loop, this will need one more approval. Thank you!
|
@apelisse I fixed the for loop, this will need one more approval. Thank you! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
apelisse
Jul 3, 2018
Member
It'd be great if I could leave all my comments once and for all :-). How hard would it be to add tests?
/lgtm
|
It'd be great if I could leave all my comments once and for all :-). How hard would it be to add tests? /lgtm |
k8s-ci-robot
added
the
lgtm
label
Jul 3, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
k8s-ci-robot
Jul 3, 2018
Contributor
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: apelisse, rphillips, sjenning
The full list of commands accepted by this bot can be found here.
The pull request process is described here
pkg/kubectl/OWNERS[apelisse]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apelisse, rphillips, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
k8s-merge-robot
Jul 4, 2018
Contributor
Automatic merge from submit-queue (batch tested with PRs 65776, 64896). If you want to cherry-pick this change to another branch, please follow the instructions here.
|
Automatic merge from submit-queue (batch tested with PRs 65776, 64896). If you want to cherry-pick this change to another branch, please follow the instructions here. |
rphillips commentedJun 7, 2018
What this PR does / why we need it: This fixes
kubectl drainto wait until all errors and successes are processed, instead of returning the first error. It also tweaks the behavior of the cleanup to check to see if the pod is already terminating, and if it is to not reissue the pod terminate which leads to an error getting thrown. This fix will allowkubectl drainto complete successfully when a node is draining.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
/cc @sjenning
Release note:
Reproduction steps
sleep.yml