Use --force flag for reconciling addons #69424

kawych · 2018-10-04T13:18:05Z

What this PR does / why we need it:
Addon manager should use kubectl --force flag when reconciling addons. Without it, following scenario breaks the cluster:

kubectl delete on any resource with immutable fields, i.e. kube-dns service
immediately kubectl create your own resource with the same name, but different value for the immutable field, for example service kube-dns with ClusterIP = 10.35.240.12

After that, Addon Manager fails to reconcile this resource. A failure in this steps leads to further failures, for example Addon Manager doesn't attempt to prune not-wanted addons.

Release note:

Fix Addon Manager failure to reconcile addons with immutable fields.

MrHohn · 2018-10-04T17:42:25Z

Humm, good observation on that. What does it mean for those immutable fields after adding the --force flag? Would we get into a situation that apiserver force the API objects to be changed, but controllers fail to operate because of the invalid state?

/assign @mikedanese
/kind bug

MrHohn · 2018-10-04T17:42:43Z

/assign

kawych · 2018-10-05T10:16:11Z

I've tested this, it requires also newer version of kubectl. I have one more problem with that - if Addon Manager finds kube-dns service with incorrect ClusterIP (immutable field), instead of re-aplying it immediately, kube-dns service gets pruned and recreated only in the next Addon Manager run.

@MrHohn The behavior I expect is for the resource to be re-created with correct configuration.
"Would we get into a situation that apiserver force the API objects to be changed, but controllers fail to operate because of the invalid state?" - I didn't fully understand. Do you mean:
(1) If cluster uses forces addon changes, can it cause controllers to fail?
(2) If Addon Manager forces addon changes, can it cause controllers to fail?
For (1) I don't know, maybe it's possible. For (2), I believe not, any difference in addon configuration is probably caused by cluster user applying/forcing their changes, and I believe Addon Manager should revert those changes.

MrHohn · 2018-10-06T01:22:58Z

@kawych Thanks for the explanation, I wasn't sure about what --force means for kubectl apply but now it is clear (also took a look at #66602).

I was asking about if kubectl apply --force will force an in-place update on API object, in which case controller may not know what to do when that happens, but seems like that is not the case.

MrHohn

/lgtm

MrHohn · 2018-10-06T01:24:19Z

I have one more problem with that - if Addon Manager finds kube-dns service with incorrect ClusterIP (immutable field), instead of re-aplying it immediately, kube-dns service gets pruned and recreated only in the next Addon Manager run.

That means ~1 minute gap between deletion and recreation, guessing not too bad?

kawych · 2018-10-08T08:20:44Z

/retest

kawych · 2018-10-08T08:25:47Z

/cc @mikedanese

fejta-bot · 2019-01-19T19:19:50Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-02-18T19:37:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

k8s-ci-robot · 2019-03-19T16:39:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kawych
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: mikedanese

If they are not already assigned, you can assign the PR to them by writing /assign @mikedanese in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

cluster/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kawych · 2019-03-19T16:42:45Z

@mikedanese PTAL

MrHohn · 2019-03-21T17:59:13Z

/lgtm

kawych · 2019-03-22T09:27:52Z

/retest

mikedanese · 2019-03-23T10:58:31Z

I'm looking at the kubectl apply flags and I see this:

--force=false: Only used when grace-period=0. If true, immediately remove resources from API and bypass graceful deletion. Note that immediate deletion of some resources may result in inconsistency or data loss and requires confirmation.

This doesn't sound like the behavior you are describing.
It requires confirmation?

From the code, it looks like this does what you describe. But this looks really scary to me:

kubernetes/pkg/kubectl/cmd/apply/apply.go

Lines 945 to 947 in 9f15368

    
           if err != nil && (errors.IsConflict(err) || errors.IsInvalid(err)) && p.Force { 
        
           	patchBytes, patchObject, err = p.deleteAndCreate(current, modified, namespace, name) 
        
           }

What happens if we push an object config that is invalid in some clusters? What if a customer accidentally misconfigures an admission webhook and we start getting invalid requests on applies? It seems like we would just start deleting objects and recreating them.

Even more likely, what if the deployment controller updates a deployment at the same time we run apply and get a conflict? Wouldn't we delete the deployment, wait for it to fully delete, then recreate it?

cc @apelisse

kawych · 2019-03-25T11:36:43Z

/remove-lifecycle rotten

What if a customer accidentally misconfigures an admission webhook and we start getting invalid requests on applies? It seems like we would just start deleting objects and recreating them.

I think the same behavior happens for normal apply behavior, with the exception of immutable objects - only because normal kubectl apply will fail.

Even more likely, what if the deployment controller updates a deployment at the same time we run apply and get a conflict? Wouldn't we delete the deployment, wait for it to fully delete, then recreate it?

Not sure if this is possible, if so, it indeed seems like an issue. However, the desired outcome would be not to simply skip reconciling, but to pick up the updated deployment and perform reconcile on it, right? The code you linked includes retries for the patch operation, shouldn't it resolve the problem?

Note the problem that I'm trying to address here: Addon Manager fails to reconcile some addons and the reconcile loop breaks. I believe the right solution would be to force reconciling immutable objects. If there are risks to it, we have to think of another solution. For example, can we add a logic that resumes the reconcile loop, so that kubectl apply on a single object doesn't affect all other objects?

apelisse · 2019-03-25T19:58:54Z

I'm not an expert of client-side apply, adding people who wrote/reviewed the change: @dixudx @soltysh

soltysh · 2019-03-29T16:22:16Z

--force in apply is enforced when a conflict occurs or in case of invalid resources. So what @mikedanese writes holds true, I don't have the option to verify how this affects addon manager, but I can speak about using --force is very dangerous and I'd discourage from using it in automated scripts.

fejta-bot · 2019-06-27T16:48:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-07-27T17:36:10Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-08-26T18:33:39Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-08-26T18:33:47Z

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from MrHohn and smarterclayton October 4, 2018 13:18

k8s-ci-robot assigned mikedanese Oct 4, 2018

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 4, 2018

k8s-ci-robot assigned MrHohn Oct 4, 2018

kawych force-pushed the master branch from fd7189c to c853e9f Compare October 5, 2018 09:03

MrHohn approved these changes Oct 6, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2018

k8s-ci-robot requested a review from mikedanese October 8, 2018 08:25

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 21, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 18, 2019

Use --force flag for reconciling addons

a97ce1b

kawych force-pushed the master branch from c853e9f to a97ce1b Compare March 19, 2019 16:38

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 19, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 21, 2019

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 25, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 27, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 27, 2019

k8s-ci-robot closed this Aug 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use --force flag for reconciling addons #69424

Use --force flag for reconciling addons #69424

kawych commented Oct 4, 2018

MrHohn commented Oct 4, 2018

MrHohn commented Oct 4, 2018

kawych commented Oct 5, 2018

MrHohn commented Oct 6, 2018

MrHohn left a comment

MrHohn commented Oct 6, 2018

kawych commented Oct 8, 2018

kawych commented Oct 8, 2018

fejta-bot commented Jan 19, 2019

fejta-bot commented Feb 18, 2019

k8s-ci-robot commented Mar 19, 2019

kawych commented Mar 19, 2019

MrHohn commented Mar 21, 2019

kawych commented Mar 22, 2019

mikedanese commented Mar 23, 2019

kawych commented Mar 25, 2019

apelisse commented Mar 25, 2019

soltysh commented Mar 29, 2019

fejta-bot commented Jun 27, 2019

fejta-bot commented Jul 27, 2019

fejta-bot commented Aug 26, 2019

k8s-ci-robot commented Aug 26, 2019

Use --force flag for reconciling addons #69424

Use --force flag for reconciling addons #69424

Conversation

kawych commented Oct 4, 2018

MrHohn commented Oct 4, 2018

MrHohn commented Oct 4, 2018

kawych commented Oct 5, 2018

MrHohn commented Oct 6, 2018

MrHohn left a comment

Choose a reason for hiding this comment

MrHohn commented Oct 6, 2018

kawych commented Oct 8, 2018

kawych commented Oct 8, 2018

fejta-bot commented Jan 19, 2019

fejta-bot commented Feb 18, 2019

k8s-ci-robot commented Mar 19, 2019

kawych commented Mar 19, 2019

MrHohn commented Mar 21, 2019

kawych commented Mar 22, 2019

mikedanese commented Mar 23, 2019

kawych commented Mar 25, 2019

apelisse commented Mar 25, 2019

soltysh commented Mar 29, 2019

fejta-bot commented Jun 27, 2019

fejta-bot commented Jul 27, 2019

fejta-bot commented Aug 26, 2019

k8s-ci-robot commented Aug 26, 2019