Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add generic 2-way merge handler. #26

Merged

Conversation

abhinavdahiya
Copy link
Contributor

@abhinavdahiya abhinavdahiya commented Sep 27, 2018

  • Add handler in CVO to apply random CustomResources for operators in a way that it stomps all changes, (2-way merge).

/cc @crawford @yifan-gu

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 27, 2018
@abhinavdahiya abhinavdahiya force-pushed the generix_handlers branch 2 times, most recently from 038a3b1 to a7e55d9 Compare September 27, 2018 14:23
@abhinavdahiya abhinavdahiya changed the title Update CVO manifests to hostnetwork, add generic 2-way merge handler. add generic 2-way merge handler. Sep 27, 2018
@abhinavdahiya abhinavdahiya force-pushed the generix_handlers branch 7 times, most recently from 8ff4ee4 to a5bacc2 Compare September 28, 2018 01:32
return nil, fmt.Errorf("failed to get resource type: %v", err)
}

// sometimes manifests of non-namespaced resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is formatted strangely.

return &mapping.Resource, mapping.Scope.Name() == meta.RESTScopeNameNamespace, nil
}

// runBackgroundCacheReset - Starts the rest mapper cache reseting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: resetting?

@yifan-gu
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 28, 2018
@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 28, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford, yifan-gu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit e117626 into openshift:master Sep 28, 2018
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 7, 2020
…ow?"

We've had 'if updated' guards around waitFor*Completion since the
library landed in 2d334c2 (lib: add resource builder that allows Do
on any lib.Manifest, 2018-08-20, openshift#10).  But, only waiting when
'updated' is true is a weak block, because if/when we fail to
complete, Task.Run will back-off and call builder.Apply again.  That
new Apply will see the already-updated object, set 'updated' false,
and not wait.  So whether we block or not is orthogonal to 'updated';
nobody cares about whether the most recent update happened in this
builder.Apply, this sync cycle, or a previous cycle.

We don't even care all that much about whether the Deployment,
DaemonSet, CustomResourceDefinition, or Job succeeded.  Most feedback
is going to come from the ClusterOperator, so with this commit we
continue past the resource wait-for unless the resource is really
hurting, in which case we fail immediately (inside builder.Apply,
Task.Run will still hit us a few times) to bubble that up.  In
situations where we don't see anything too terrible going on, we'll
continue on past and later block on ClusterOperator not being ready.

Other changes in this commit:

* I've pulled 'b.modifier(crd)' out of the switch, because that should
  happen regardless of the CRD version.
* I've added an error for unrecognized CRD version, because that will
  be easier to debug than trying to figure out why a CRD manifest is
  being silently ignored by the CVO.
* There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [1,2]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).
* deploymentBuilder and daemonsetBuilder grow mode properties.  They
  had been using 'actual.Generation > 1' as a proxy for "post-install"
  since 14fab0b (add generic 2-way merge handler for random types,
  2018-09-27, openshift#26), but generation 1 is just "we haven't changed the
  object since it was created", not "we're installing a fresh
  cluster".  For example, a new Deployment or DaemonSet could be added
  as part of a cluster update, and we don't want special install-time
  "we don't care about specific manifest failures" then.

[1]: https://github.com/kubernetes/api/blob/v0.18.0/apps/v1/types.go#L586-L590
[2]: https://github.com/kubernetes/apiextensions-apiserver/blob/v0.18.0/pkg/apis/apiextensions/types.go#L319-L320
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 8, 2020
deploymentBuilder and daemonsetBuilder grow mode properties.  They had
been using 'actual.Generation > 1' as a proxy for "post-install" since
14fab0b (add generic 2-way merge handler for random types,
2018-09-27, openshift#26), but generation 1 is just "we haven't changed the
object since it was created", not "we're installing a fresh cluster".
For example, a new Deployment or DaemonSet could be added as part of a
cluster update, and we don't want special install-time "we don't care
about specific manifest failures" then.
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 8, 2020
deploymentBuilder and daemonsetBuilder grow mode properties.  They had
been using 'actual.Generation > 1' as a proxy for "post-install" since
14fab0b (add generic 2-way merge handler for random types,
2018-09-27, openshift#26), but generation 1 is just "we haven't changed the
object since it was created", not "we're installing a fresh cluster".
For example, a new Deployment or DaemonSet could be added as part of a
cluster update, and we don't want special install-time "we don't care
about specific manifest failures" then.
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 20, 2021
Even if the manifest authors state no opinions, these are not
properties that we want to allow cluster admins to manipulate.  For
example, a customer cluster recently stuck a Deployment by inserting a
reference to a non-existent secret [1]:

  $ yaml2json <namespaces/openshift-marketplace/apps/deployments.yaml | jq -r '.items[].spec.template.spec.containers[].envFrom[]'
  {
    "secretRef": {
      "name": "openshift-reg"
    }
  }
  $ yaml2json <namespaces/openshift-marketplace/pods/marketplace-operator-f7cc88d59-hhh75/marketplace-operator-f7cc88d59-hhh75.yaml | jq -r '.status.containerStatuses[].state'
  {
    "waiting": {
      "message": "secret \"openshift-reg\" not found",
      "reason": "CreateContainerConfigError"
    }
  }

The outgoing logic dates back to the beginning of reconciling these
properties in 14fab0b (add generic 2-way merge handler for random
types, 2018-09-27, openshift#26), and this commit's tightening follows on a
number of reconciliation tightenings like 29b92d2
(lib/resourcemerge/core: Clear livenessProbe and readinessProbe if nil
in required, 2020-01-16, openshift#298).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1951339#c0
wking added a commit to wking/cluster-version-operator that referenced this pull request May 9, 2022
The original APIService reconciliation landed in 662e182 (handle
APIService, Service objects, 2018-09-27, openshift#26).  We definitely
reconcile a bunch of Service manifests (e.g. for serving operator
metrics, including the CVO's own Service), but it's not clear what the
use case was for APIService.

DeleteAPIServicev1 landed in 0afb8a8 (Add a manifest annotation to
be used for object deletion, 2020-08-17, openshift#438), but was never
consumed.  We dropped APIService reconciliation support in 5681a70
(Drop APIService support, 2021-06-10, openshift#566).  This commit drops the
unused DeleteAPIServicev1.  Auditing z-stream tips from 4.3 through
4.11:

  $ oc adm release extract --to 4.11 quay.io/openshift-release-dev/ocp-release-nightly@sha256:080e5cf5e3e043ac0877b8f545ba2b596016f3fd76f3f457d15060603b3615e1
  $ oc adm release extract --to 4.10 quay.io/openshift-release-dev/ocp-release:4.10.13-x86_64
  $ oc adm release extract --to 4.9 quay.io/openshift-release-dev/ocp-release:4.9.32-x86_64
  $ oc adm release extract --to 4.8 quay.io/openshift-release-dev/ocp-release:4.8.39-x86_64
  $ oc adm release extract --to 4.7 quay.io/openshift-release-dev/ocp-release:4.7.50-x86_64
  $ oc adm release extract --to 4.6 quay.io/openshift-release-dev/ocp-release:4.6.57-x86_64
  $ oc adm release extract --to 4.5 quay.io/openshift-release-dev/ocp-release:4.5.41-x86_64
  $ oc adm release extract --to 4.4 quay.io/openshift-release-dev/ocp-release:4.4.33-x86_64
  $ oc adm release extract --to 4.3 quay.io/openshift-release-dev/ocp-release:4.3.40-x86_64
  $ grep -r 'kind: APIService' 4.*
  ...no hits...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants