-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ApplyTask
patch behavior
#1332
Conversation
Changed task_apply to use update instead of patch Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
More testing with strategic merge patches Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Implemented correct three way merge for apply task Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Use kudo.dev annotation for lastAppliedConfig Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com> # Conflicts: # pkg/engine/task/task_apply.go
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I left a bunch of nits (mostly around naming) and one question about what happens with existing operators after people upgrade. PTAL
return nil, nil | ||
} | ||
|
||
original, ok := annots[kudo.LastAppliedConfigAnnotation] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got stuck at this line. Are we backwards compatible after this is merged? In a scenario where someone re-runs a deploy
plan for an existing operator (after updating KUDO), all plan resources will be patch
ed, correct? However, there is no LastAppliedConfigAnnotation
so it will return nill, nil
and the subsequent merges will... fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a problem. I'll try to figure something out about this...
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>
This PR introduces a backwards incompatible change: When used on an already installed instance of an operator, the deployed resources are missing the new "kudo.dev/last-applied-configuration", which will fail the patch process, as no originalConfiguration can be found for the ThreeWayMerge. There are multiple options to work around this issue, all of which are a bit problematic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to announce/changelog the breaking change. Otherwise, LGTM
We need to think about this one. It's one thing to make breaking changes without some form of automated migration for the operator development side, it's another thing entirely to do something that can cause data loss. Can we introduce some sort of migration to handle this for the future? Or can we at least come up, while noting this, a clear way for any existing users of our operators to preserve their data? Even if we don't have any users now, this is a good practice to establish early. I understand there's probably no great way to actually perform a migration of existing instances though, and overall the PR itself LGTM. |
@gerred I have a proposal to use As long as the operator does not manually create a PV, that should not lead to any data loss. |
In the proposed implementation, KUDO manager won't find the needed annotation, three-way-merge will fail and so will the step/plan (a transient |
@zen-dog Not a transient error anymore. In my last commit I changed a couple of the errors to |
Right, but we're telling them the only way to move forward is to re-create the instance. Deleting the instance will destroy the existing instance. It might not get rid of PVs, but we're now in a manual situation where data loss can occur if I don't manually ensure that my PV is used or backed up/restored, right? |
True, this is exactly as any prior breaking change situation when instances had to be recreated in the new KUDO version. |
@gerred As you already mentioned, this would be a prime example for a migration process. I'll take that chance and prioritise working on KUDO upgrades and try to get a migration for this in. |
ApplyTask now uses correct ThreeWayMerges (either plain JSON or Strategic K8s merges) to apply the task, the same way `kubectl apply` does. It stores the last applied version of the resource in an annotation of the resource so that the merge can be done correctly on the next apply. As already applied resources do not have this annoation, the ApplyTask can not calculate the correct patch and fails, this is the breaking change. Signed-off-by: Andreas Neumann <aneumann@mesosphere.com> Signed-off-by: Thomas Runyon <runyontr@gmail.com>
What this PR does / why we need it:
The ApplyTask patch method currently uses a StrategicMerge patch, which by default merges some lists, for example, containers in a Deployment-PodTemplate. This makes it impossible to have a conditional container in an operator.
Added a test harness test.
Fixes #1286
Issue
The Big Problem is that applying new versions of rendered resources from KUDO to the cluster is not as easy as it seems: We generally render full resources from the
OperatorVersion
templates, but there is no easy way to update existing resources in the cluster:Patch with StrategicMerge
This is the solution we use at the moment. The issue is, that a strategic merge does not replace certain lists - for example, the list of containers in a PodTemplate is merged, not replaced by default. If a list (or map) is merged instead of replaced depends on the annotations in the resource types:
It is possible to overwrite this behavior for strategic merges: https://www.disasterproject.com/kubernetes-kubectl-patching/
We can add an entry
$patch: replace
to a list or a map to replace instead of merge. But we can't add these entries to all maps and lists, as this would trigger the replacement of immutable fields which leads to an error by the k8s API. We would have to maintain a list of all places to add these patch-entries.Replace
We could replace resources instead of patching them. This approach does not work as a lot of resources have immutable fields that can't be updated - for example the ServiceSpec.ClusterIP. As this field is usually filled in by k8s, it's not filled when we generate the resource from the KUDO template. A replace would then complain that ClusterIP will be updated with "" which is not allowed.
It might be possible to Force-Replace. This allows us to update immutable fields by doing a delete/create instead of a replace; the downside is that it would, for example, assign a new ClusterIP and be a disruptive update (https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#disruptive-updates)
ServerSideApply
This should theoretically work best. All creations and updates should be done via:
and k8s should figure out what actually changed and apply it. There are some problems on the integration tests (no darwin binaries for kube-apiserver > 15.5), and bugs in the server side apply feature itself. Might be an option for the future.
Client Side Threeway-Diff
This is the only real way to do this at the moment.
First approach was to reuse code from
kudectl apply
, but that fails for various reasons. kubectl isn't really expected to be used as a library.I rewrote the code similar to
kudectl apply
though, it still needs some polishing but it should work for now.