add generic 2-way merge handler. #26

abhinavdahiya · 2018-09-27T14:06:16Z

Add handler in CVO to apply random CustomResources for operators in a way that it stomps all changes, (2-way merge).

crawford · 2018-09-28T02:55:53Z

pkg/cvo/internal/dynamicclient/client.go

+		return nil, fmt.Errorf("failed to get resource type: %v", err)
+	}
+
+	// sometimes manifests of non-namespaced resources


This is formatted strangely.

yifan-gu · 2018-09-28T03:02:50Z

pkg/cvo/internal/dynamicclient/client.go

+	return &mapping.Resource, mapping.Scope.Name() == meta.RESTScopeNameNamespace, nil
+}
+
+// runBackgroundCacheReset - Starts the rest mapper cache reseting


nit: resetting?

yifan-gu · 2018-09-28T03:14:38Z

/approve

crawford · 2018-09-28T03:42:16Z

/lgtm

openshift-ci-robot · 2018-09-28T03:42:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, yifan-gu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [crawford,yifan-gu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…ow?" We've had 'if updated' guards around waitFor*Completion since the library landed in 2d334c2 (lib: add resource builder that allows Do on any lib.Manifest, 2018-08-20, openshift#10). But, only waiting when 'updated' is true is a weak block, because if/when we fail to complete, Task.Run will back-off and call builder.Apply again. That new Apply will see the already-updated object, set 'updated' false, and not wait. So whether we block or not is orthogonal to 'updated'; nobody cares about whether the most recent update happened in this builder.Apply, this sync cycle, or a previous cycle. We don't even care all that much about whether the Deployment, DaemonSet, CustomResourceDefinition, or Job succeeded. Most feedback is going to come from the ClusterOperator, so with this commit we continue past the resource wait-for unless the resource is really hurting, in which case we fail immediately (inside builder.Apply, Task.Run will still hit us a few times) to bubble that up. In situations where we don't see anything too terrible going on, we'll continue on past and later block on ClusterOperator not being ready. Other changes in this commit: * I've pulled 'b.modifier(crd)' out of the switch, because that should happen regardless of the CRD version. * I've added an error for unrecognized CRD version, because that will be easier to debug than trying to figure out why a CRD manifest is being silently ignored by the CVO. * There's no object status for CRDs or DaemonSets that marks "we are really hurting". The v1.18.0 Kubernetes CRD and DaemonSet controllers do not set any conditions in their operand status (although the API for those conditions exists [1,2]). With this commit, we have very minimal wait logic for either. Sufficiently unhealthy DaemonSet should be reported on via their associated ClusterOperator, and sufficiently unhealthy CRD should be reported on when we fail to push any custom resources consuming them (Task.Run retries will give the API server time to ready itself after accepting a CRD update before the CVO fails its sync cycle). * deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then. [1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590 [2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320

deploymentBuilder and daemonsetBuilder grow mode properties. They had been using 'actual.Generation > 1' as a proxy for "post-install" since 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), but generation 1 is just "we haven't changed the object since it was created", not "we're installing a fresh cluster". For example, a new Deployment or DaemonSet could be added as part of a cluster update, and we don't want special install-time "we don't care about specific manifest failures" then.

Even if the manifest authors state no opinions, these are not properties that we want to allow cluster admins to manipulate. For example, a customer cluster recently stuck a Deployment by inserting a reference to a non-existent secret [1]: $ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].spec.template.spec.containers[].envFrom[]' { "secretRef": { "name": "openshift-reg" } } $ yaml2json <namespaces/openshift-marketplace/pods/marketplace-operator-f7cc88d59-hhh75/marketplace-operator-f7cc88d59-hhh75.yaml | jq -r '.status.containerStatuses[].state' { "waiting": { "message": "secret \"openshift-reg\" not found", "reason": "CreateContainerConfigError" } } The outgoing logic dates back to the beginning of reconciling these properties in 14fab0b (add generic 2-way merge handler for random types, 2018-09-27, openshift#26), and this commit's tightening follows on a number of reconciliation tightenings like 29b92d2 (lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil in required, 2020-01-16, openshift#298). [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0

The original APIService reconciliation landed in 662e182 (handle APIService, Service objects, 2018-09-27, openshift#26). We definitely reconcile a bunch of Service manifests (e.g. for serving operator metrics, including the CVO's own Service), but it's not clear what the use case was for APIService. DeleteAPIServicev1 landed in 0afb8a8 (Add a manifest annotation to be used for object deletion, 2020-08-17, openshift#438), but was never consumed. We dropped APIService reconciliation support in 5681a70 (Drop APIService support, 2021-06-10, openshift#566). This commit drops the unused DeleteAPIServicev1. Auditing z-stream tips from 4.3 through 4.11: $ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1 $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64 $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64 $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64 $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.50-x86_64 $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.57-x86_64 $ oc adm release extract --to 4.5 quay.io/openshift-release-dev/ocp-release:4.5.41-x86_64 $ oc adm release extract --to 4.4 quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64 $ oc adm release extract --to 4.3 quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64 $ grep -r 'kind: APIService' 4.* ...no hits...

vendor: update

5ff71ce

openshift-ci-robot requested review from crawford and yifan-gu September 27, 2018 14:07

openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 27, 2018

abhinavdahiya force-pushed the generix_handlers branch 2 times, most recently from 038a3b1 to a7e55d9 Compare September 27, 2018 14:23

abhinavdahiya changed the title ~~Update CVO manifests to hostnetwork, add generic 2-way merge handler.~~ add generic 2-way merge handler. Sep 27, 2018

abhinavdahiya force-pushed the generix_handlers branch 7 times, most recently from 8ff4ee4 to a5bacc2 Compare September 28, 2018 01:32

abhinavdahiya added 2 commits September 27, 2018 19:51

add generic 2-way merge handler for random types

14fab0b

handle APIService, Service objects

662e182

abhinavdahiya force-pushed the generix_handlers branch from a5bacc2 to 662e182 Compare September 28, 2018 02:53

crawford reviewed Sep 28, 2018

View reviewed changes

yifan-gu reviewed Sep 28, 2018

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2018

openshift-ci-robot assigned crawford Sep 28, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2018

openshift-merge-robot merged commit e117626 into openshift:master Sep 28, 2018

wking mentioned this pull request Jun 11, 2019

docs/user/reconciliation: Document release-image application #201

Merged

wking mentioned this pull request Jul 7, 2020

lib/resourcebuilder: Replace wait-for with single-shot "is it alive now?" #400

Merged

wking mentioned this pull request Jul 8, 2020

lib/resourcebuilder/apps: Explicit mode instead of 'Generation > 1' #401

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add generic 2-way merge handler. #26

add generic 2-way merge handler. #26

abhinavdahiya commented Sep 27, 2018 •

edited

crawford Sep 28, 2018

yifan-gu Sep 28, 2018

yifan-gu commented Sep 28, 2018

crawford commented Sep 28, 2018

openshift-ci-robot commented Sep 28, 2018

add generic 2-way merge handler. #26

add generic 2-way merge handler. #26

Conversation

abhinavdahiya commented Sep 27, 2018 • edited

crawford Sep 28, 2018

Choose a reason for hiding this comment

yifan-gu Sep 28, 2018

Choose a reason for hiding this comment

yifan-gu commented Sep 28, 2018

crawford commented Sep 28, 2018

openshift-ci-robot commented Sep 28, 2018

abhinavdahiya commented Sep 27, 2018 •

edited