Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ApplyTask patch behavior #1332

Merged
merged 16 commits into from
Feb 28, 2020
Merged

Fix ApplyTask patch behavior #1332

merged 16 commits into from
Feb 28, 2020

Conversation

ANeumann82
Copy link
Member

@ANeumann82 ANeumann82 commented Feb 4, 2020

What this PR does / why we need it:
The ApplyTask patch method currently uses a StrategicMerge patch, which by default merges some lists, for example, containers in a Deployment-PodTemplate. This makes it impossible to have a conditional container in an operator.

Added a test harness test.

Fixes #1286

Issue
The Big Problem is that applying new versions of rendered resources from KUDO to the cluster is not as easy as it seems: We generally render full resources from the OperatorVersion templates, but there is no easy way to update existing resources in the cluster:

Patch with StrategicMerge
This is the solution we use at the moment. The issue is, that a strategic merge does not replace certain lists - for example, the list of containers in a PodTemplate is merged, not replaced by default. If a list (or map) is merged instead of replaced depends on the annotations in the resource types:

	// +patchMergeKey=name
	// +patchStrategy=merge
	Containers []Container `json:"containers" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,2,rep,name=containers"`

It is possible to overwrite this behavior for strategic merges: https://www.disasterproject.com/kubernetes-kubectl-patching/
We can add an entry $patch: replace to a list or a map to replace instead of merge. But we can't add these entries to all maps and lists, as this would trigger the replacement of immutable fields which leads to an error by the k8s API. We would have to maintain a list of all places to add these patch-entries.

Replace
We could replace resources instead of patching them. This approach does not work as a lot of resources have immutable fields that can't be updated - for example the ServiceSpec.ClusterIP. As this field is usually filled in by k8s, it's not filled when we generate the resource from the KUDO template. A replace would then complain that ClusterIP will be updated with "" which is not allowed.

It might be possible to Force-Replace. This allows us to update immutable fields by doing a delete/create instead of a replace; the downside is that it would, for example, assign a new ClusterIP and be a disruptive update (https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#disruptive-updates)

ServerSideApply
This should theoretically work best. All creations and updates should be done via:

c.Patch(context.TODO(), newObj, client.Apply, client.FieldOwner("KUDO"))

and k8s should figure out what actually changed and apply it. There are some problems on the integration tests (no darwin binaries for kube-apiserver > 15.5), and bugs in the server side apply feature itself. Might be an option for the future.

Client Side Threeway-Diff
This is the only real way to do this at the moment.
First approach was to reuse code from kudectl apply, but that fails for various reasons. kubectl isn't really expected to be used as a library.

I rewrote the code similar to kudectl apply though, it still needs some polishing but it should work for now.

Changed task_apply to use update instead of patch

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
More testing with strategic merge patches

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Implemented correct three way merge for apply task

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Use kudo.dev annotation for lastAppliedConfig

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

# Conflicts:
#	pkg/engine/task/task_apply.go
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@zen-dog zen-dog changed the title Fix Task_Apply patch behavior Fix ApplyTask patch behaviour Feb 11, 2020
@zen-dog zen-dog changed the title Fix ApplyTask patch behaviour Fix ApplyTask patch behavior Feb 11, 2020
Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I left a bunch of nits (mostly around naming) and one question about what happens with existing operators after people upgrade. PTAL

pkg/engine/task/task_apply.go Outdated Show resolved Hide resolved
pkg/engine/task/task_delete.go Outdated Show resolved Hide resolved
pkg/engine/task/task_apply.go Outdated Show resolved Hide resolved
pkg/util/kudo/labels.go Show resolved Hide resolved
pkg/engine/task/task_apply.go Outdated Show resolved Hide resolved
pkg/engine/task/task_apply.go Outdated Show resolved Hide resolved
pkg/engine/task/task_apply.go Outdated Show resolved Hide resolved
return nil, nil
}

original, ok := annots[kudo.LastAppliedConfigAnnotation]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got stuck at this line. Are we backwards compatible after this is merged? In a scenario where someone re-runs a deploy plan for an existing operator (after updating KUDO), all plan resources will be patched, correct? However, there is no LastAppliedConfigAnnotation so it will return nill, nil and the subsequent merges will... fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a problem. I'll try to figure something out about this...

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
@ANeumann82
Copy link
Member Author

This PR introduces a backwards incompatible change:

When used on an already installed instance of an operator, the deployed resources are missing the new "kudo.dev/last-applied-configuration", which will fail the patch process, as no originalConfiguration can be found for the ThreeWayMerge.

There are multiple options to work around this issue, all of which are a bit problematic:
(1) Bail out on old deployed Instances - This is the current state after this PR
(2) Automatically Delete/Recreate the resource - This is problematic, as we would make a disruptive update without the user explicitly acknowledging or allowing this.
(2a) Allow the user to specify a --force flag when using k kudo update and then use Delete/Recreate. This is detailed in #1335 and probably the most usable approach
(3) Patch the resources as we did before this PR - Not a good solution, as we would leave the deployed instances in weird unknown states, which doesn't solve anything
(4) Calculate the originalConfiguration from the old parameters and metadata of the Instance - This would theoretically work, but is a lot of effort and probably error prone, as there are a lot of edge cases and things to consider.

Copy link
Contributor

@zen-dog zen-dog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to announce/changelog the breaking change. Otherwise, LGTM

@gerred
Copy link
Member

gerred commented Feb 12, 2020

We need to think about this one. It's one thing to make breaking changes without some form of automated migration for the operator development side, it's another thing entirely to do something that can cause data loss. Can we introduce some sort of migration to handle this for the future? Or can we at least come up, while noting this, a clear way for any existing users of our operators to preserve their data? Even if we don't have any users now, this is a good practice to establish early.

I understand there's probably no great way to actually perform a migration of existing instances though, and overall the PR itself LGTM.

@ANeumann82
Copy link
Member Author

@gerred I have a proposal to use --force on a parameter update, see #1335 That would allow users to continue using their existing installation, at the downside of having one disruptive update of the operator.

As long as the operator does not manually create a PV, that should not lead to any data loss.

@zen-dog
Copy link
Contributor

zen-dog commented Feb 12, 2020

it's another thing entirely to do something that can cause data loss.

In the proposed implementation, KUDO manager won't find the needed annotation, three-way-merge will fail and so will the step/plan (a transient ERROR if I'm correct). So no data loss should occur @gerred Human operator, however, will have to recreate the Instance to move on.

@ANeumann82
Copy link
Member Author

@zen-dog Not a transient error anymore. In my last commit I changed a couple of the errors to fatalExecutionError, as they won't recover.
Apart from that you're correct, no data loss (but no change to the deployed resource possible either)

@gerred
Copy link
Member

gerred commented Feb 12, 2020

Right, but we're telling them the only way to move forward is to re-create the instance. Deleting the instance will destroy the existing instance. It might not get rid of PVs, but we're now in a manual situation where data loss can occur if I don't manually ensure that my PV is used or backed up/restored, right?

@zen-dog
Copy link
Contributor

zen-dog commented Feb 12, 2020

True, this is exactly as any prior breaking change situation when instances had to be recreated in the new KUDO version.

@ANeumann82
Copy link
Member Author

@gerred As you already mentioned, this would be a prime example for a migration process. I'll take that chance and prioritise working on KUDO upgrades and try to get a migration for this in.

@ANeumann82 ANeumann82 added the release/breaking-change This PR contains breaking changes and is marked in the release notes label Feb 28, 2020
@ANeumann82 ANeumann82 merged commit 8afbb97 into master Feb 28, 2020
@ANeumann82 ANeumann82 deleted the an/update-instead-of-patch branch February 28, 2020 08:35
runyontr pushed a commit that referenced this pull request Mar 11, 2020
ApplyTask now uses correct ThreeWayMerges (either plain JSON or Strategic K8s merges) to apply the task, the same way `kubectl apply` does.

It stores the last applied version of the resource in an annotation of the resource so that the merge can be done correctly on the next apply.

As already applied resources do not have this annoation, the ApplyTask can not calculate the correct patch and fails, this is the breaking change. 

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Thomas Runyon <runyontr@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs review release/breaking-change This PR contains breaking changes and is marked in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Removal of field in update is not applied
4 participants