New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add generic 2-way merge handler. #26
add generic 2-way merge handler. #26
Conversation
038a3b1
to
a7e55d9
Compare
8ff4ee4
to
a5bacc2
Compare
a5bacc2
to
662e182
Compare
return nil, fmt.Errorf("failed to get resource type: %v", err) | ||
} | ||
|
||
// sometimes manifests of non-namespaced resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is formatted strangely.
return &mapping.Resource, mapping.Scope.Name() == meta.RESTScopeNameNamespace, nil | ||
} | ||
|
||
// runBackgroundCacheReset - Starts the rest mapper cache reseting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: resetting?
/approve |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, crawford, yifan-gu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. Other changes in this commit: * I've pulled 'b.modifier(crd)' out of the switch, because that should happen regardless of the CRD version. * I've added an error for unrecognized CRD version, because that will be easier to debug than trying to figure out why a CRD manifest is being silently ignored by the CVO. * There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). * deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then.
deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then.
Even if the manifest authors state no opinions, these are not properties that we want to allow cluster admins to manipulate. For example, a customer cluster recently stuck a Deployment by inserting a reference to a non-existent secret [1]: $ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].spec.template.spec.containers[].envFrom[]' { "secretRef": { "name": "openshift-reg" } } $ yaml2json <namespaces/openshift-marketplace/pods/marketplace-operator-f7cc88d59-hhh75/marketplace-operator-f7cc88d59-hhh75.yaml | jq -r '.status.containerStatuses[].state' { "waiting": { "message": "secret \"openshift-reg\" not found", "reason": "CreateContainerConfigError" } } The outgoing logic dates back to the beginning of reconciling these properties in 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), and this commit's tightening follows on a number of reconciliation tightenings like 29b92d2 (lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil in required, 2020-01-16, openshift#298). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
The original APIService reconciliation landed in 662e182 (handle APIService, Service objects, 2018-09-27, openshift#26). We definitely reconcile a bunch of Service manifests (e.g. for serving operator metrics, including the CVO's own Service), but it's not clear what the use case was for APIService. DeleteAPIServicev1 landed in 0afb8a8 (Add a manifest annotation to be used for object deletion, 2020-08-17, openshift#438), but was never consumed. We dropped APIService reconciliation support in 5681a70 (Drop APIService support, 2021-06-10, openshift#566). This commit drops the unused DeleteAPIServicev1. Auditing z-stream tips from 4.3 through 4.11: $ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.50-x86_64 $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.57-x86_64 $ oc adm release extract --to 4.5 quay.io/openshift-release-dev/ocp-release:4.5.41-x86_64 $ oc adm release extract --to 4.4 quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64 $ oc adm release extract --to 4.3 quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64 $ grep -r 'kind: APIService' 4.* ...no hits...
CustomResources
for operators in a way that it stomps all changes, (2-way merge)./cc @crawford @yifan-gu