-
Notifications
You must be signed in to change notification settings - Fork 38.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubectl --force deletion is breaking upgrade tests #37117
Comments
cc @saad-ali |
The new kubectl behavior introduced in PR #35484 is causing automated upgrade tests to fail. |
@foxish I wish it was as easy as just back-porting the addition of |
@krousey Why are we running 1.3 upgrade tests on 1.5? kubectl officially only supports 1 minor release of version skew. |
@bgrant0607 My understanding from the test description
Is that this is 1.3 e2e tests shelling out to a 1.5 kubectl communicating with a 1.5 master. There shouldn't be a version skew between kubectl and master. |
So the new behavior was intentional because the old behavior can result in data loss in stateful sets (or data loss in any component like a DB running on Kube). The problem with continuing to allow the behavior of existing scripts is because they are doing something very dangerous, and that's not ok. |
The alternatives discussed are:
Grace-period=0 is not something people should be doing. If we recognize they should not be doing it, breaking scripts deliberately to force this behavior is appropriate. |
1.3 kubectl with 1.5 server would continue to behave without change. |
@smarterclayton I'm fine with this breaking change if it's preventing bad behavior. This broke a script (in the form of 1.3 e2e tests). This somehow needs to be fixed so that it can work with a 1.3, 1.4, and 1.5 kubectl. |
One option is grep "--force" on the help text as an if block (I think we did that somewhere else). I really believe that we have to consciously break this one scripting experience because it has such dramatic impact on clusters. |
|
The release notes typically have a "Known behavior changes" section where we can add this. |
The release note from the mentioned PR definitely addresses the change - agree it would be in the behavior change section. |
@smarterclayton or @foxish could one of you take the time to implement a fix for the e2es? |
I will. Want Brian's agreement or disagreement on the statement that data On Nov 18, 2016, at 6:36 PM, krousey notifications@github.com wrote: @smarterclayton https://github.com/smarterclayton or @foxish — |
ping @bgrant0607 |
@smarterclayton The proposal was #34160? A better solution would be to only enforce this requirement upon pods that really need this behavior, so that existing acceptable uses wouldn't break. Somewhat related: #10179. I don't really like the proposed alternatives, but it's too late to implement a proper solution in the API. Assuming that we have no choice but to go forward with --force, in addition to an action-required release note, please also ensure user docs are updated, and also relevant stackoverflow posts, such as: http://stackoverflow.com/questions/35453792/pods-stuck-at-terminated-status |
1.5 known issues are being tracked here: #37134 Assuming this behavior is kept for 1.5 it should be added there. |
I don't think we should leave it up to the user to opt out of split brain Another alternative is to make --grace-period=0 send grace period 1 but |
@smarterclayton I like your latest suggestion: "Another alternative is to make --grace-period=0 send grace period 1 but wait until the pod is deleted to continue." |
I'll look at what that takes. We could wait forever, which is surprising On Nov 19, 2016, at 1:19 AM, Brian Grant notifications@github.com wrote: @smarterclayton https://github.com/smarterclayton I like your latest — |
@smarterclayton If the command timed out and failed in the "forever" case, it would be less surprising to the user when they tried to create another resource with the same name and failed. |
#37263 contains the second proposal. I didn't realize what bad shape delete was in (we don't respect grace-period when cascade is false), so the low risk fix doesn't cover all the possible scenarios. Also, we don't have access to the existing object so we can't do UID checks (which means we may end up waiting forever if someone recreates the same resource). |
Folks with an opinion, please review the attached PR. |
Thanks for the update @smarterclayton ! |
Automatic merge from submit-queue When --grace-period=0 is provided, wait for deletion The grace-period is automatically set to 1 unless --force is provided, and the client waits until the object is deleted. This preserves backwards compatibility with 1.4 and earlier. It does not handle scenarios where the object is deleted and a new object is created with the same name because we don't have the initial object loaded (and that's a larger change for 1.5). Fixes #37117 by relaxing the guarantees provided. ```release-note When deleting an object with `--grace-period=0`, the client will begin a graceful deletion and wait until the resource is fully deleted. To force deletion, use the `--force` flag. ```
In 1.5, kubectl now requires --force when --grace-period=0 for deletion. This is causing 1.3 e2e tests to fail when running with a 1.5 kubectl in the upgrade tests. This was added in #35484. See https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster/143?log
The text was updated successfully, but these errors were encountered: