New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl: wait for all errors and successes on podEviction #64896
kubectl: wait for all errors and successes on podEviction #64896
Conversation
|
/retest |
|
/assign @apelisse |
pkg/kubectl/cmd/drain.go
Outdated
| @@ -613,16 +620,21 @@ func (o *DrainOptions) evictPods(pods []corev1.Pod, policyGroupVersion string, g | |||
| for { | |||
| select { | |||
| case err := <-errCh: | |||
| return err | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not now lose the underlying reason? Looking below I see we just do "Drain did not complete...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there can be N different errors, they get glogged and counted. A summary error is returned on line 635. Do you think we should capture all the error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outputting via glog like this rather than actually returning errors means DrainOptions can't easily be used programmatically or composed into larger commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print to o.ErrOut if you're going to print here, and I would probably accumulate and return the actual errors
49a332b
to
4bfea39
Compare
|
/cc @kubernetes/sig-cli-maintainers
The particular issue we hit is if the node has a pod in a terminating namespace, the drain will immediately fail with a "can't modify resource in a terminating namespace" error and fail to remove the remaining pods. With this PR, the drain still fails, but not before trying to remove every pod on the node. This is a better situation in that all pods will be in a terminating state after the first drain. |
|
using wider cc that I just found out about |
4bfea39
to
c158391
Compare
|
@liggitt I refactored this PR to collect all the errors and make it more reusable. |
c158391
to
4ff8f32
Compare
823bfb6
to
a0efb63
Compare
pkg/util/multierror/multierror.go
Outdated
| // derived from https://github.com/golang/appengine/blob/master/errors.go | ||
|
|
||
| // MultiError is returned by batch operations. | ||
| type MultiError []error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have AggregateError, what's the difference? seems no need to define multi error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about Aggregate. I updated the PR to use the Aggregate API. Thanks!
a0efb63
to
4fe8896
Compare
|
/test pull-kubernetes-integration |
|
/test pull-kubernetes-e2e-gce |
4fe8896
to
792f4dd
Compare
45495c9
to
9421527
Compare
|
thanks! |
pkg/kubectl/cmd/drain.go
Outdated
| @@ -571,39 +572,41 @@ func (o *DrainOptions) deleteOrEvictPods(pods []corev1.Pod) error { | |||
| } | |||
| } | |||
|
|
|||
| // evictPods return | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? :-)
pkg/kubectl/cmd/drain.go
Outdated
| } | ||
| case <-globalTimeoutCh: | ||
| return fmt.Errorf("Drain did not complete within %v", globalTimeout) | ||
| return utilerrors.NewAggregate(errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're losing the timeout error information here, which means if it times out before the first error is reported, that's going to become a success. Maybe consider just returning timeout error in that case (as it was done before)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch on both. fixed.
9421527
to
d8322d0
Compare
pkg/kubectl/cmd/drain.go
Outdated
| } | ||
| case <-globalTimeoutCh: | ||
| return fmt.Errorf("Drain did not complete within %v", globalTimeout) | ||
| } | ||
| if doneCount == numPods { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition could have been part of the "for" statement: for doneCount < numPods { ... }.
|
Thanks for fixing quickly! Feel free to fix the "while" loop (now or later). |
d8322d0
to
5b4770e
Compare
|
@apelisse I fixed the for loop, this will need one more approval. Thank you! |
|
It'd be great if I could leave all my comments once and for all :-). How hard would it be to add tests? /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: apelisse, rphillips, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Automatic merge from submit-queue (batch tested with PRs 65776, 64896). If you want to cherry-pick this change to another branch, please follow the instructions here. |
…tion Automatic merge from submit-queue (batch tested with PRs 65776, 64896). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. kubectl: wait for all errors and successes on podEviction **What this PR does / why we need it**: This fixes `kubectl drain` to wait until all errors and successes are processed, instead of returning the first error. It also tweaks the behavior of the cleanup to check to see if the pod is already terminating, and if it is to not reissue the pod terminate which leads to an error getting thrown. This fix will allow `kubectl drain` to complete successfully when a node is draining. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes # **Special notes for your reviewer**: /cc @sjenning **Release note**: ```release-note NONE ``` ```yaml apiVersion: v1 kind: Pod metadata: name: bash spec: containers: - name: bash image: bash resources: limits: cpu: 500m memory: 500Mi command: - bash - -c - "nothing() { sleep 1; } ; trap nothing 15 ; while true; do echo \"hello\"; sleep 10; done" terminationGracePeriodSeconds: 3000 restartPolicy: Never ``` ``` $ kubectl create ns testing $ kubectl create -f sleep.yml $ kubectl delete ns testing $ kubectl drain 127.0.0.1 --force ```
What this PR does / why we need it: This fixes
kubectl drainto wait until all errors and successes are processed, instead of returning the first error. It also tweaks the behavior of the cleanup to check to see if the pod is already terminating, and if it is to not reissue the pod terminate which leads to an error getting thrown. This fix will allowkubectl drainto complete successfully when a node is draining.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
/cc @sjenning
Release note:
Reproduction steps
sleep.yml