kubectl --force deletion is breaking upgrade tests #37117

krousey · 2016-11-18T19:40:43Z

In 1.5, kubectl now requires --force when --grace-period=0 for deletion. This is causing 1.3 e2e tests to fail when running with a 1.5 kubectl in the upgrade tests. This was added in #35484. See https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster/143?log

Expected error:
    <*errors.errorString | 0xc820019820>: {
        s: "Error running &{/workspace/kubernetes_skew/cluster/kubectl.sh [/workspace/kubernetes_skew/cluster/kubectl.sh --server=https://199.223.235.243 --kubeconfig=/workspace/.kube/config delete --grace-period=0 -f - --namespace=e2e-tests-kubectl-up6nc] []  0xc8207b9b60  error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.\n [] <nil> 0xc8206fe440 exit status 1 <nil> true [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4720 0xc820ea4738 0xc820ea4748] [0xa97840 0xa979a0 0xa979a0] 0xc82027b800}:\nCommand stdout:\n\nstderr:\nerror: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.\n\nerror:\nexit status 1\n",
    }
    Error running &{/workspace/kubernetes_skew/cluster/kubectl.sh [/workspace/kubernetes_skew/cluster/kubectl.sh --server=https://199.223.235.243 --kubeconfig=/workspace/.kube/config delete --grace-period=0 -f - --namespace=e2e-tests-kubectl-up6nc] []  0xc8207b9b60  error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.
     [] <nil> 0xc8206fe440 exit status 1 <nil> true [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4718 0xc820ea4740 0xc820ea4750] [0xc820ea4720 0xc820ea4738 0xc820ea4748] [0xa97840 0xa979a0 0xa979a0] 0xc82027b800}:
    Command stdout:
    
    stderr:
    error: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. You must pass --force to delete with grace period 0.
    
    error:
    exit status 1
    
not to have occurred

krousey · 2016-11-18T19:41:46Z

cc @saad-ali

saad-ali · 2016-11-18T19:46:54Z

The new kubectl behavior introduced in PR #35484 is causing automated upgrade tests to fail.

krousey · 2016-11-18T19:47:22Z

@foxish I wish it was as easy as just back-porting the addition of --force to the 1.4 and 1.3 e2e tests, but as far as I can tell, the 1.3 kubectl delete didn't have a --force option, so that would just break all the normal 1.3 e2es.

bgrant0607 · 2016-11-18T20:38:19Z

@krousey Why are we running 1.3 upgrade tests on 1.5? kubectl officially only supports 1 minor release of version skew.

krousey · 2016-11-18T22:01:28Z

@bgrant0607 My understanding from the test description

Deploys a cluster at v{version-old} at {image-old}, upgrades the master to v{version-new} at {image-new}, and runs the v{version-old} tests.

Is that this is 1.3 e2e tests shelling out to a 1.5 kubectl communicating with a 1.5 master. There shouldn't be a version skew between kubectl and master.

smarterclayton · 2016-11-18T22:30:22Z

So the new behavior was intentional because the old behavior can result in data loss in stateful sets (or data loss in any component like a DB running on Kube). The problem with continuing to allow the behavior of existing scripts is because they are doing something very dangerous, and that's not ok.

smarterclayton · 2016-11-18T22:33:47Z

The alternatives discussed are:

No change - let hard coded scripts with grace-period 0 continue to lead to split brain and data loss on the cluster
Have --grace-period=0 actually set grace period 1, which would break any script that assumes that they can call delete and the resource is immediately gone (so also breaking, but more subtly).

Grace-period=0 is not something people should be doing. If we recognize they should not be doing it, breaking scripts deliberately to force this behavior is appropriate.

smarterclayton · 2016-11-18T22:36:54Z

1.3 kubectl with 1.5 server would continue to behave without change.

krousey · 2016-11-18T22:41:57Z

@smarterclayton I'm fine with this breaking change if it's preventing bad behavior. This broke a script (in the form of 1.3 e2e tests). This somehow needs to be fixed so that it can work with a 1.3, 1.4, and 1.5 kubectl.

smarterclayton · 2016-11-18T22:43:37Z

One option is grep "--force" on the help text as an if block (I think we did that somewhere else).

I really believe that we have to consciously break this one scripting experience because it has such dramatic impact on clusters.

smarterclayton · 2016-11-18T22:53:27Z

if kubectl delete -h | grep -q force; then
  kubectl delete --force ...
else
  kubectl delete ...
fi
``

is one option

pwittrock · 2016-11-18T22:55:11Z

The release notes typically have a "Known behavior changes" section where we can add this.

smarterclayton · 2016-11-18T23:09:30Z

The release note from the mentioned PR definitely addresses the change - agree it would be in the behavior change section.

krousey · 2016-11-18T23:36:21Z

@smarterclayton or @foxish could one of you take the time to implement a fix for the e2es?

smarterclayton · 2016-11-18T23:46:46Z

I will. Want Brian's agreement or disagreement on the statement that data
integrity overrides back compat.

On Nov 18, 2016, at 6:36 PM, krousey notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton or @foxish
https://github.com/foxish could one of you take the time to implement a
fix for the e2es?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37117 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pwGEaj1knLAhqreAn0fNUNaQ5UI7ks5q_jZ9gaJpZM4K22Za
.

foxish · 2016-11-18T23:46:59Z

ping @bgrant0607

bgrant0607 · 2016-11-19T00:40:56Z

@smarterclayton The proposal was #34160?

A better solution would be to only enforce this requirement upon pods that really need this behavior, so that existing acceptable uses wouldn't break.

Somewhat related: #10179.

I don't really like the proposed alternatives, but it's too late to implement a proper solution in the API.

Assuming that we have no choice but to go forward with --force, in addition to an action-required release note, please also ensure user docs are updated, and also relevant stackoverflow posts, such as:

http://stackoverflow.com/questions/35453792/pods-stuck-at-terminated-status

saad-ali · 2016-11-19T00:59:54Z

1.5 known issues are being tracked here: #37134

Assuming this behavior is kept for 1.5 it should be added there.

smarterclayton · 2016-11-19T01:30:54Z

I don't think we should leave it up to the user to opt out of split brain
and data loss.

Another alternative is to make --grace-period=0 send grace period 1 but
wait until the pod is deleted to continue. If you think forcing users to
understand that grace-period=0 is dangerous is too far (even given safety
first), that would potentially reduce the impact.

smarterclayton · 2016-11-19T02:33:06Z

Proposal was #34160 but discussion in #29033 covers the guarantees for users.

bgrant0607 · 2016-11-19T06:19:20Z

@smarterclayton I like your latest suggestion: "Another alternative is to make --grace-period=0 send grace period 1 but wait until the pod is deleted to continue."

smarterclayton · 2016-11-19T06:21:00Z

I'll look at what that takes. We could wait forever, which is surprising
(and still a behavior change), so it's not a perfect fix.

On Nov 19, 2016, at 1:19 AM, Brian Grant notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton I like your latest
suggestion: "Another alternative is to make --grace-period=0 send grace
period 1 but wait until the pod is deleted to continue."

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37117 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p2_-k8Tq3jiw2YD_H-SKupbVJOr9ks5q_pTvgaJpZM4K22Za
.

bgrant0607 · 2016-11-19T06:22:16Z

@smarterclayton If the command timed out and failed in the "forever" case, it would be less surprising to the user when they tried to create another resource with the same name and failed.

smarterclayton · 2016-11-22T03:31:07Z

#37263 contains the second proposal. I didn't realize what bad shape delete was in (we don't respect grace-period when cascade is false), so the low risk fix doesn't cover all the possible scenarios. Also, we don't have access to the existing object so we can't do UID checks (which means we may end up waiting forever if someone recreates the same resource).

smarterclayton · 2016-11-22T18:34:55Z

Folks with an opinion, please review the attached PR.

dims · 2016-11-23T22:26:52Z

Thanks for the update @smarterclayton !

Automatic merge from submit-queue When --grace-period=0 is provided, wait for deletion The grace-period is automatically set to 1 unless --force is provided, and the client waits until the object is deleted. This preserves backwards compatibility with 1.4 and earlier. It does not handle scenarios where the object is deleted and a new object is created with the same name because we don't have the initial object loaded (and that's a larger change for 1.5). Fixes #37117 by relaxing the guarantees provided. ```release-note When deleting an object with `--grace-period=0`, the client will begin a graceful deletion and wait until the resource is fully deleted. To force deletion, use the `--force` flag. ```

krousey added area/test area/kubectl priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 18, 2016

krousey added this to the v1.5 milestone Nov 18, 2016

krousey assigned smarterclayton Nov 18, 2016

saad-ali assigned foxish Nov 18, 2016

saad-ali added the release-blocker label Nov 18, 2016

smarterclayton mentioned this issue Nov 18, 2016

Add --force to kubectl delete and explain force deletion #35484

Merged

smarterclayton mentioned this issue Nov 22, 2016

When --grace-period=0 is provided, wait for deletion #37263

Merged

k8s-github-robot closed this as completed in #37263 Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl --force deletion is breaking upgrade tests #37117

kubectl --force deletion is breaking upgrade tests #37117

krousey commented Nov 18, 2016

krousey commented Nov 18, 2016

saad-ali commented Nov 18, 2016

krousey commented Nov 18, 2016

bgrant0607 commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

pwittrock commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

foxish commented Nov 18, 2016

bgrant0607 commented Nov 19, 2016

saad-ali commented Nov 19, 2016 •

edited

Loading

smarterclayton commented Nov 19, 2016 •

edited

Loading

smarterclayton commented Nov 19, 2016

bgrant0607 commented Nov 19, 2016

smarterclayton commented Nov 19, 2016

bgrant0607 commented Nov 19, 2016

smarterclayton commented Nov 22, 2016

smarterclayton commented Nov 22, 2016

dims commented Nov 23, 2016

kubectl --force deletion is breaking upgrade tests #37117

kubectl --force deletion is breaking upgrade tests #37117

Comments

krousey commented Nov 18, 2016

krousey commented Nov 18, 2016

saad-ali commented Nov 18, 2016

krousey commented Nov 18, 2016

bgrant0607 commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

pwittrock commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

krousey commented Nov 18, 2016

smarterclayton commented Nov 18, 2016

foxish commented Nov 18, 2016

bgrant0607 commented Nov 19, 2016

saad-ali commented Nov 19, 2016 • edited Loading

smarterclayton commented Nov 19, 2016 • edited Loading

smarterclayton commented Nov 19, 2016

bgrant0607 commented Nov 19, 2016

smarterclayton commented Nov 19, 2016

bgrant0607 commented Nov 19, 2016

smarterclayton commented Nov 22, 2016

smarterclayton commented Nov 22, 2016

dims commented Nov 23, 2016

saad-ali commented Nov 19, 2016 •

edited

Loading

smarterclayton commented Nov 19, 2016 •

edited

Loading