Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vendor: Bump client-go and library-go to current 4.6 tips (2020-07-24) #420

Merged
merged 3 commits into from
Jul 31, 2020

Conversation

wking
Copy link
Member

@wking wking commented Jul 29, 2020

Pulling in openshift/client-go@94e1065ab1 and openshift/library-go@1ed21c4fa8. Generated with:

$ go get -u github.com/openshift/client-go@83993cebb5aee533bf2f2dded2d87b9e32835f4a
go: github.com/openshift/client-go 94e1065ab1f8a34cd8a0e1edb9ba8be0326e77b5 => v0.0.0-20200723130357-94e1065ab1f8
$ go get -u github.com/openshift/library-go@cc498c355c99983057e9e01f3abcceb40ca5c298
go: github.com/openshift/library-go 1ed21c4fa86c2bf6034823231b8e52df4f068d58 => v0.0.0-20200724192307-1ed21c4fa86c
$ go mod tidy
$ go mod vendor
$ git add -A go.* vendor

using:

$ go version
go version go1.14.4 linux/amd64

This gets us up to 1.19 for the Kubernetes-based dependencies, and should hopefully allow me to bump openshift/api to pick up openshift/api@3ae6754513 without hitting:

  go: github.com/openshift/cluster-version-operator/lib imports
	  	k8s.io/client-go/kubernetes/scheme imports
		  k8s.io/api/auditregistration/v1alpha1: package provided by k8s.io/api at latest version v0.18.6 but not at required version v0.19.0-rc.2

And the API bump in turn will allow us to start using the new Release type (#419).

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2020
Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Jul 29, 2020

Most recent failures were both CI-registry 503s.

@wking
Copy link
Member Author

wking commented Jul 29, 2020

e2e:

level=info msg="Pulling debug logs from the bootstrap machine"
level=info msg="Bootstrap gather logs captured here \"/tmp/installer/log-bundle-20200729194059.tar.gz\""
level=fatal msg="Bootstrap failed to complete: failed to wait for bootstrapping to complete: timed out waiting for the condition"

From bootstrap/journals/bootkube.log:

Jul 29 19:28:36 localhost bootkube.sh[2299]: [#394] failed to create some manifests:
Jul 29 19:28:36 localhost bootkube.sh[2299]: "99_openshift-cluster-api_master-machines-0.yaml": unable to get REST mapping for "99_openshift-cluster-api_master-machines-0.yaml": no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
...
Jul 29 19:28:39 localhost bootkube.sh[2299]: Failed to create "etcd-metric-signer-secret.yaml" secrets.v1./etcd-metric-signer -n openshift-config: namespaces "openshift-config" not found
Jul 29 19:28:39 localhost bootkube.sh[2299]: Error: error while checking pod status: timed out waiting for the condition
Jul 29 19:28:39 localhost bootkube.sh[2299]: Tearing down temporary bootstrap control plane...

From the CVO logs:

I0729 19:39:13.577749       1 sync_worker.go:669] Running sync for role "openshift-machine-api/cluster-autoscaler-operator" (211 of 606)
E0729 19:39:13.578329       1 runtime.go:78] Observed a panic: &errors.errorString{s:"converting (v1beta1.Role) to (v1.Role): unknown conversion"} (converting (v1beta1.Role) to (v1.Role): unknown conversion)
goroutine 294 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1713b80, 0xc00105d730)
  /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
  /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1713b80, 0xc00105d730)
  /usr/local/go/src/runtime/panic.go:969 +0x166
github.com/openshift/cluster-version-operator/lib/resourceread.ReadRoleV1OrDie(0xc000969500, 0x2f8, 0x300, 0x0)
  /go/src/github.com/openshift/cluster-version-operator/lib/resourceread/rbac.go:55 +0x1b0
github.com/openshift/cluster-version-operator/lib/resourcebuilder.(*roleBuilder).Do(0xc001ec6780, 0x1c09920, 0xc0000a3580, 0xc001ec6780, 0xc001ec6780)
  /go/src/github.com/openshift/cluster-version-operator/lib/resourcebuilder/rbac.go:98 +0x45
github.com/openshift/cluster-version-operator/pkg/cvo.(*resourceBuilder).Apply(0xc000c6fdd0, 0x1c09920, 0xc0000a3580, 0xc001436ec0, 0x2, 0xc001d99be0, 0xa)
  /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/cvo.go:687 +0xc2
github.com/openshift/cluster-version-operator/pkg/payload.(*Task).Run(0xc00097a7d0, 0x1c09920, 0xc0000a3580, 0xc000960500, 0x17, 0x1bc3880, 0xc000c6fdd0, 0x2, 0x0, 0x0)
  /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task.go:75 +0xaf
github.com/openshift/cluster-version-operator/pkg/cvo.(*SyncWorker).apply.func2(0x1c09920, 0xc0000a3580, 0xc001004670, 0x16, 0x332, 0x2, 0x2)
  /go/src/github.com/openshift/cluster-version-operator/pkg/cvo/sync_worker.go:678 +0x377
github.com/openshift/cluster-version-operator/pkg/payload.RunGraph.func3(0xc001970280, 0xc000dcc3c0, 0xc00100b890, 0xc000dcc420, 0x1c09920, 0xc0000a3580, 0x39)
  /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:487 +0xea
created by github.com/openshift/cluster-version-operator/pkg/payload.RunGraph
  /go/src/github.com/openshift/cluster-version-operator/pkg/payload/task_graph.go:477 +0x32a
panic: converting (v1beta1.Role) to (v1.Role): unknown conversion [recovered]
  panic: converting (v1beta1.Role) to (v1.Role): unknown conversion
...

The v1beta1.Role manifest in question is here. Hmm, looks like we have the v1beta1.Roles until at least v1.22. And indeed, in the CVO as of ba32622:

$ git --no-pager grep 'type Role struct'
vendor/k8s.io/api/rbac/v1/types.go:type Role struct {
vendor/k8s.io/api/rbac/v1alpha1/types.go:type Role struct {
vendor/k8s.io/api/rbac/v1beta1/types.go:type Role struct {

Which has not changed in this PR?

$ git --no-pager log -1 --oneline origin/master
e49df43a (wking/master, origin/release-4.7, origin/release-4.6, origin/master, origin/HEAD, master) Merge pull request #418 from wking/RetriveOnce-typo
$ git --no-pager grep 'type Role struct' e49df43a
e49df43a:vendor/k8s.io/api/rbac/v1/types.go:type Role struct {
e49df43a:vendor/k8s.io/api/rbac/v1alpha1/types.go:type Role struct {
e49df43a:vendor/k8s.io/api/rbac/v1beta1/types.go:type Role struct {

So not clear on what's going on here yet...

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Jul 29, 2020

From kubernetes/kubernetes#49642:

Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

But I can reproduce this in a new unit test, so I'll just stumble around until I figure out what magic we're missing to get automatic translation...

@wking
Copy link
Member Author

wking commented Jul 29, 2020

/hold

No need to spend CI on this while I debug the Role business.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2020
@wking
Copy link
Member Author

wking commented Jul 29, 2020

Ah:

$ git --no-pager show --oneline ba32622 vendor/k8s.io/apimachinery/pkg/conversion/converter.go
ba32622c (HEAD -> client-go-bump, wking/client-go-bump, origin/pr/420) vendor: Bump client-go and library-go to current 4.6 tips (2020-07-24)
diff --git a/vendor/k8s.io/apimachinery/pkg/conversion/converter.go b/vendor/k8s.io/apimachinery/pkg/conversion/converter.go
index 2d7c8bd1..838d5b0a 100644
--- a/vendor/k8s.io/apimachinery/pkg/conversion/converter.go
+++ b/vendor/k8s.io/apimachinery/pkg/conversion/converter.go
@@ -442,20 +442,20 @@ func (c *Converter) doConversion(src, dest interface{}, flags FieldMatchingFlags
 	if fn, ok := c.generatedConversionFuncs.untyped[pair]; ok {
 		return fn(src, dest, scope)
 	}
-	// TODO: consider everything past this point deprecated - we want to support only point to point top level
-	// conversions
 
 	dv, err := EnforcePtr(dest)
 	if err != nil {
 		return err
 	}
-	if !dv.CanAddr() && !dv.CanSet() {
-		return fmt.Errorf("can't write to dest")
-	}
 	sv, err := EnforcePtr(src)
 	if err != nil {
 		return err
 	}
+	return fmt.Errorf("converting (%s) to (%s): unknown conversion", sv.Type(), dv.Type())
+
+	// TODO: Everything past this point is deprecated.
+	//  Remove in 1.20 once we're sure it didn't break anything.
+
 	// Leave something on the stack, so that calls to struct tag getters never fail.
 	scope.srcStack.push(scopeStackElem{})
 	scope.destStack.push(scopeStackElem{})

So that's kubernetes/kubernetes#90018.

wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 30, 2020
Protecting us from [1,2]:

  converting (v1beta1.Role) to (v1.Role): unknown conversion

in unit tests.  The manual conversion and wash through JSON seems like
a terrible hack, but I haven't been able to figure out a more elegant
approach yet.  From [3]:

  Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

so all we really need is the apiVersion bump.

[1]: kubernetes/kubernetes#90018
[2]: openshift#420 (comment)
[3]: kubernetes/kubernetes#49642
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2020
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 30, 2020
Protecting us from [1,2]:

  converting (v1beta1.Role) to (v1.Role): unknown conversion

in unit tests.  The manual conversion and wash through JSON seems like
a terrible hack, but I haven't been able to figure out a more elegant
approach yet.  From [3]:

  Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

so all we really need is the apiVersion bump.

[1]: kubernetes/kubernetes#90018
[2]: openshift#420 (comment)
[3]: kubernetes/kubernetes#49642
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 30, 2020
Protecting us from [1,2]:

  converting (v1beta1.Role) to (v1.Role): unknown conversion

in unit tests.  The manual conversion and wash through JSON seems like
a terrible hack, but I haven't been able to figure out a more elegant
approach yet.  From [3]:

  Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

so all we really need is the apiVersion bump.

Also add similar logic for apiregistration, since that's the other
group where we register multiple schema versions.

[1]: kubernetes/kubernetes#90018
[2]: openshift#420 (comment)
[3]: kubernetes/kubernetes#49642
@wking
Copy link
Member Author

wking commented Jul 30, 2020

Ok, I think I have something that works in 6f247d4, although it's not particularly elegant.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
@vrutkovs
Copy link
Member

/retest

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Jul 30, 2020

@wking
Copy link
Member Author

wking commented Jul 30, 2020

e2e:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/420/pull-ci-openshift-cluster-version-operator-master-e2e/1288770413639241728/artifacts/e2e/ipi-install-install/log-bundle-20200730102730.tar | tar -xOz log-bundle-20200730102730/bootstrap/containers/cluster-version-operator-3859ba2ba9e7592f8693b4ee362fda81b2c463b32f6970c7b1d2ea0f0a0226fd.log | grep 'Result of work' | tail -n2
I0730 10:23:46.632144       1 task_graph.go:546] Result of work: [Cluster operator etcd is still updating Cluster operator kube-apiserver is still updating Cluster operator kube-controller-manager is still updating Cluster operator kube-scheduler is still updating Cluster operator config-operator is still updating Cluster operator machine-api is still updating Cluster operator openshift-apiserver is still updating Cluster operator kube-storage-version-migrator is still updating Cluster operator cloud-credential is still updating Cluster operator authentication is still updating Could not update role "openshift-machine-api/cluster-autoscaler-operator" (211 of 606) Cluster operator csi-snapshot-controller is still updating Cluster operator image-registry is still updating Cluster operator ingress is still updating Cluster operator machine-approver is still updating Could not update clusterrole "cluster-node-tuning-operator" (294 of 606) Cluster operator openshift-controller-manager is still updating Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (321 of 606): resource may have been deleted Cluster operator storage is still updating Could not update clusterrole "cluster-monitoring-operator" (361 of 606) Could not update oauthclient "console" (367 of 606): the server does not recognize this resource, check extension API servers Cluster operator insights is still updating Cluster operator operator-lifecycle-manager is still updating Could not update clusterrolebinding "marketplace-operator" (454 of 606) Cluster operator service-ca is still updating Could not update clusterrolebinding "default-account-cluster-network-operator" (477 of 606) Cluster operator dns is still updating Could not update clusterrolebinding "default-account-openshift-machine-config-operator" (503 of 606) Could not update role "openshift-authentication/prometheus-k8s" (514 of 606): resource may have been deleted]
I0730 10:25:16.724712       1 task_graph.go:546] Result of work: []

Dunno what's up with that yet. Ah, maybe somebody doesn't like our version-converted Roles and such?

@wking
Copy link
Member Author

wking commented Jul 30, 2020

Ah, version conversion still busted:

$ curl -s https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/420/pull-ci-openshift-cluster-version-operator-master-e2e/1288770413639241728/artifacts/e2e/ipi-install-install/log-bundle-20200730102730.tar | tar -xOz log-bundle-20200730102730/bootstrap/containers/cluster-version-operator-3859ba2ba9e7592f8693b4ee362fda81b2c463b32f6970c7b1d2ea0f0a0226fd.log | grep 'role "' | tail -n1
E0730 10:27:31.386463       1 task.go:81] error running apply for clusterrole "cluster-node-tuning-operator" (294 of 606): resource name may not be empty

@wking
Copy link
Member Author

wking commented Jul 30, 2020

/hold

No need to retest while I'm taking another pass at the conversion failure.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 30, 2020
Pulling in [1,2].  Generated with:

  $ go get -u github.com/openshift/client-go@83993cebb5aee533bf2f2dded2d87b9e32835f4a
  go: github.com/openshift/client-go 94e1065ab1f8a34cd8a0e1edb9ba8be0326e77b5 => v0.0.0-20200723130357-94e1065ab1f8
  $ go get -u github.com/openshift/library-go@cc498c355c99983057e9e01f3abcceb40ca5c298
  go: github.com/openshift/library-go 1ed21c4fa86c2bf6034823231b8e52df4f068d58 => v0.0.0-20200724192307-1ed21c4fa86c
  $ go mod tidy
  $ go mod vendor
  $ git add -A go.* vendor

using:

  $ go version
  go version go1.14.4 linux/amd64

This gets us up to 1.19 for the Kubernetes-based dependencies, and
should hopefully allow me to bump openshift/api to pick up [3] without hitting:

  go: github.com/openshift/cluster-version-operator/lib imports
	  	k8s.io/client-go/kubernetes/scheme imports
		  k8s.io/api/auditregistration/v1alpha1: package provided by k8s.io/api at latest version v0.18.6 but not at required version v0.19.0-rc.2

And the API bump in turn will allow us to start using the new Release
type.

[1]: openshift/client-go@94e1065
[2]: openshift/library-go@1ed21c4
[3]: openshift/api@3ae6754
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 31, 2020
Protecting us from [1,2]:

  converting (v1beta1.Role) to (v1.Role): unknown conversion

Because merging and application are also version-dependent, it's hard
to paper over this in resourceread with automatic type conversion.
Although from [3]:

  Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

so all we really need is the apiVersion bump.  Anyhow, with this
commit, I'm doubling down on the approach from 4ee7b07 (Add
apiextensions.k8s.io/v1 support for CRDs, 2019-10-22, openshift#259) and
collapsing the readers into two helpers that support all of our types
and return runtime.Object.

From 0a255ab (cvo: Use protobuf for sending events and other basic
API commands, 2019-01-18, openshift#90), protobuf is more efficient, so we
should use it where possible.

And because all of this is very tedious to maintain by hand, there's
now a Python generator to spit out all the boilerplate.

[1]: kubernetes/kubernetes#90018
[2]: openshift#420 (comment)
[3]: kubernetes/kubernetes#49642
Protecting us from [1,2]:

  converting (v1beta1.Role) to (v1.Role): unknown conversion

Because merging and application are also version-dependent, it's hard
to paper over this in resourceread with automatic type conversion.
Although from [3]:

  Promotes the rbac.authorization.k8s.io/v1beta1 API to v1 with no changes

so all we really need is the apiVersion bump.  Anyhow, with this
commit, I'm doubling down on the approach from 4ee7b07 (Add
apiextensions.k8s.io/v1 support for CRDs, 2019-10-22, openshift#259) and
collapsing the readers into two helpers that support all of our types
and return runtime.Object.

From 0a255ab (cvo: Use protobuf for sending events and other basic
API commands, 2019-01-18, openshift#90), protobuf is more efficient, so we
should use it where possible.

And because all of this is very tedious to maintain by hand, there's
now a Python generator to spit out all the boilerplate.

[1]: kubernetes/kubernetes#90018
[2]: openshift#420 (comment)
[3]: kubernetes/kubernetes#49642
Generated with:

  $ go mod tidy
  $ go mod vendor
  $ git add -A go.* vendor

using:

  $ go version
  go version go1.14.4 linux/amd64
@wking
Copy link
Member Author

wking commented Jul 31, 2020

e2e failed on an end user can use OLM can subscribe to the operator, which is the unrelated rhbz#1862322.

/override ci/prow/e2e

@openshift-ci-robot
Copy link
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e

In response to this:

e2e failed on an end user can use OLM can subscribe to the operator, which is the unrelated rhbz#1862322.

/override ci/prow/e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Member Author

wking commented Jul 31, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 31, 2020
Copy link
Member

@vrutkovs vrutkovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 31, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, vrutkovs, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,vrutkovs,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Jul 31, 2020

@openshift-merge-robot openshift-merge-robot merged commit 5d06bfc into openshift:master Jul 31, 2020
@wking wking deleted the client-go-bump branch July 31, 2020 14:04
@wking
Copy link
Member Author

wking commented Jul 31, 2020

Rebased around #410 with cd2d701 -> 4e12333 now that both it and #420 have landed.

wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 29, 2020
When we moved to 0.19 Kube tooling in ef236c3 (vendor: Bump
client-go and library-go to current 4.6 tips 2020-07-24, openshift#420), the
tooling moved to klog v2.  Port our local klog consumers so we don't
have to worry about configuring multiple klogs [1].

Generated with:

  $ sed -i 's|^go 1.13|go 1.15|' go.mod
  $ sed -i 's|k8s.io/klog v1.0.0|k8s.io/klog/v2 v2.3.0|' go.mod
  $ sed -i 's|"k8s.io/klog"|"k8s.io/klog/v2"|' $(git grep -l k8s.io/klog cmd lib pkg)
  $ sed -i 's/klog.V(\([0-9]\)) /klog.V(\1).Enabled() /' $(git grep -l klog.V cmd lib pkg)
  $ go mod tidy
  $ go mod vendor
  $ git add -A go.* vendor

using:

  $ go version
  go version go1.15.2 linux/amd64

The change to Go 1.15 catches us up with 1d9a319 (Dockerfile.rhel:
Bump to Go 1.15, 2020-09-16, openshift#457).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1883705
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 29, 2020
When we moved to 0.19 Kube tooling in ef236c3 (vendor: Bump
client-go and library-go to current 4.6 tips 2020-07-24, openshift#420), the
tooling moved to klog v2.  Port our local klog consumers so we don't
have to worry about configuring multiple klogs [1].

Generated with:

  $ sed -i 's|^go 1.13|go 1.15|' go.mod
  $ sed -i 's|k8s.io/klog v1.0.0|k8s.io/klog/v2 v2.3.0|' go.mod
  $ sed -i 's|"k8s.io/klog"|"k8s.io/klog/v2"|' $(git grep -l k8s.io/klog cmd lib pkg)
  $ sed -i 's|klog.V(\([0-9]\)) |klog.V(\1).Enabled() |' $(git grep -l klog.V cmd lib pkg)
  $ go mod tidy
  $ go mod vendor
  $ git add -A go.* vendor

using:

  $ go version
  go version go1.15.2 linux/amd64

The change to Go 1.15 catches us up with 1d9a319 (Dockerfile.rhel:
Bump to Go 1.15, 2020-09-16, openshift#457).

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1883705
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).

The always-fatal ApplyAPIServicev1 function gives us the required
symmetry with 0afb8a8's DeleteAPIServicev1, although I'm a bit
fuzzy on who would have been using an APIService manifest.  I'm not
excited about supporting a new manifest kind I know nothing about, and
failing all attempts at non-delete reconciling should protect us from
accidentally encouraging manifest authors to feed us this kind.
wking added a commit to wking/cluster-version-operator that referenced this pull request May 4, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
wking added a commit to wking/cluster-version-operator that referenced this pull request May 9, 2022
I'd accidentally broken this in e8ffccb (lib: Add autogeneration
for some resource* functionality, 2020-07-29, openshift#420), when I moved the
health checks out of the Do methods and began passing typedObject into
them.  typedObject is the release manifest, which lacks the status
attributes we want the health checks to cover.

This also catches the generation script up with some past manual
changes:

* 0afb8a8 (Add a manifest annotation to be used for object
  deletion, 2020-08-17, openshift#438).
* 05e1af7 (Log resource diffs on update only in reconcile mode,
  2021-07-13, openshift#628).
wking added a commit to wking/cluster-version-operator that referenced this pull request May 9, 2022
I'd dropped this in cc9292a (lib/resourcebuilder: Replace wait-for
with single-shot "is it alive now?", 2020-07-30, openshift#400), claiming:

  There's no object status for CRDs or DaemonSets that marks "we are
  really hurting".  The v1.18.0 Kubernetes CRD and DaemonSet
  controllers do not set any conditions in their operand status
  (although the API for those conditions exists [2,3]).  With this
  commit, we have very minimal wait logic for either.  Sufficiently
  unhealthy DaemonSet should be reported on via their associated
  ClusterOperator, and sufficiently unhealthy CRD should be reported
  on when we fail to push any custom resources consuming them
  (Task.Run retries will give the API server time to ready itself
  after accepting a CRD update before the CVO fails its sync cycle).

But from [1]:

  > It might take a few seconds for the endpoint to be created. You
  > can watch the Established condition of your
  > CustomResourceDefinition to be true or watch the discovery
  > information of the API server for your resource to show up.

So I was correct that we will hear about CRD issues when we fail to
push a dependent custom resource.  But I was not correct in claiming
that the CRD controller set no conditions.  I was probably confused by
e8ffccb (lib: Add autogeneration for some resource* functionality,
2020-07-29, openshift#420), which broke the health-check inputs as described in
002591d (lib/resourcebuilder: Use actual resource in check*Health
calls, 2022-05-03, openshift#771).  The code I removed in cc9292a was in
fact looking at the Established condition already.

This commit restores the Established check, but without the previous
PollImmediateUntil wait.

[1]: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#create-a-customresourcedefinition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants