set ClusterOperator.Status #72

danwinship · 2019-01-22T19:27:09Z

OK, this may still need a bit of testing/finishing but it's at least ready for review.

Currently, at any time exactly one of Failing, Progressing, and Available should be True, though we'd talked at least in passing about, eg, making the cluster be both Failing and Available if the requested configuration is bad but the deployed configuration is good.
AFAICT if we want to watch the cluster config, operator config, and a bunch of daemonsets, we need to have three separate Controllers and three separate Reconcilers.
In theory this will not need any changes to support Multus/OVN/Kuryr; the NetworkConfig reconciler will pull the Multus/OVN/Kuryr-related DaemonSet names out of the network.Render() result, and pass them to the DaemonSet reconciler, which will then update the operator status based on the status of those DaemonSets rather than / in addition to the openshift-sdn ones. (Although at some point we'll probably need to add Deployment-reconciling support in addition to the DaemonSet support.)
StatusManager exists mostly to deal with the fact that we need three separate Reconcilers, but they have to coordinate the resulting operator status. In particular, if the configuration is invalid, and then a DaemonSet's status changes, we don't want that to result in the operator status becoming Available, so StatusManager.SetFromDaemonSets() needs to be able to know not to change the status.
StatusManager is intended to eventually be useful to other operators (which is why it takes the operator name and version as constructor arguments).

squeed · 2019-01-23T11:37:45Z

pkg/controller/clusterconfig/clusterconfig_controller.go

@@ -83,27 +85,34 @@ func (r *ReconcileClusterConfig) Reconcile(request reconcile.Request) (reconcile
 		}
 		// Error reading the object - requeue the request.
 		log.Println(err)
+		// FIXME: operator status?


Yeah, tough to know what to do. If it weren't a transient failure, then I suppose we would treat this the same as invalid input. This seems very unlikely though.

squeed · 2019-01-23T11:56:27Z

pkg/controller/networkconfig/daemonset_controller.go

+func (r *ReconcileDaemonSets) Reconcile(request reconcile.Request) (reconcile.Result, error) {
+	found := false
+	for _, ds := range r.daemonSets {
+		if ds.Namespace == request.Namespace && ds.Name == request.Name {


Annoying that the operator-sdk doesn't support mixed-scope watches, AFAICT.

pkg/util/clusteroperator/status_manager.go

squeed · 2019-01-23T17:56:23Z

pkg/util/clusteroperator/status_manager.go

+		ds := &appsv1.DaemonSet{}
+		if err := status.client.Get(context.TODO(), dsName, ds); err != nil {
+			if errors.IsNotFound(err) {
+				return status.SetFailing("NoDaemonSet", fmt.Errorf("DaemonSet %q does not exist", dsName.String()))


Is it useful to know if more than one daemonset is missing?

yeah, probably

Actually, probably not; this would normally just be a transient failure during the first few moments of the cluster's existence, and no one will ever actually see it. (And if we're going to combine errors then what if we're missing the Namespace for one DaemonSet and the DaemonSet for another, etc?)

squeed · 2019-01-23T18:09:52Z

pkg/util/clusteroperator/status_manager.go

+	for _, condition := range conditions {
+		co.Status.Conditions = SetStatusCondition(co.Status.Conditions, condition)
+	}
+	co.Status.Version = status.version


It looks like the latest API (openshift/api#176) changes how version is reported. This field is now an array.

In the interest of getting this out the door, let's just drop this line, revendor api, and merge the PR, then figure out the operand stuff afterwards.

squeed · 2019-01-23T18:17:44Z

A few minor things, but this looks awesome. I'll play around with it tomorrow.

Implementing the operand versioning will probably require a bit more sophistication in the daemonset watcher, but I don't think we'll be sad if we merge this before.

pecameron

/lgtm
FYI, ovn has one daemonset and one deployment.

openshift-bot · 2019-01-23T21:17:47Z

/retest

Please review the full test history for this PR and help us cut down flakes.

deads2k · 2019-01-24T13:48:31Z

pkg/util/clusteroperator/status.go

+
+// SetStatusCondition returns the result of setting the specified condition in
+// the given slice of conditions.
+func SetStatusCondition(oldConditions []configv1.ClusterOperatorStatusCondition, condition *configv1.ClusterOperatorStatusCondition) []configv1.ClusterOperatorStatusCondition {


fyi, I think these exist here: https://github.com/openshift/library-go/blob/master/pkg/config/clusteroperator/v1helpers/status.go

That's based on the updated ClusterOperator so would require an openshift/api bump too. We'll grab that in the next update after we get the initial version committed.

danwinship · 2019-01-24T14:18:24Z

repushed with a StatusManager unit test

squeed · 2019-01-25T16:27:31Z

/hold
because pulls are on hold for rebase
/lgtm

openshift-ci-robot · 2019-01-25T16:27:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, pecameron, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [danwinship,squeed]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2019-01-25T17:21:19Z

@danwinship: The following test failed, say /retest to rerun them all:

        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-01-25T16:11:35Z",
                "generation": 1,
                "name": "openshift-network-operator",
                "namespace": "",
                "resourceVersion": "8145",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/openshift-network-operator",
                "uid": "e35b0622-20bb-11e9-83a5-121386b5f244"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-01-25T16:11:43Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-01-25T16:20:08Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-01-25T16:20:08Z",
                        "status": "True",
                        "type": "Available"
                    }
                ],
                "extension": null,
                "version": ""
            }
        }

Somebody Else's Problem! :-)

danwinship · 2019-01-25T17:21:25Z

/retest

squeed · 2019-01-28T13:12:13Z

/hold cancel

import pkg/util/clusteroperator from cluster-dns-operator

3969e1e

danwinship requested a review from squeed January 22, 2019 19:27

openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 22, 2019

openshift-ci-robot requested review from JacobTanenbaum and pecameron January 22, 2019 19:27

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2019

squeed requested a review from deads2k January 23, 2019 11:36

squeed reviewed Jan 23, 2019

View reviewed changes

pkg/util/clusteroperator/status_manager.go Show resolved Hide resolved

squeed reviewed Jan 23, 2019

View reviewed changes

openshift-ci-robot assigned pecameron Jan 23, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2019

pecameron approved these changes Jan 23, 2019

View reviewed changes

danwinship removed the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2019

deads2k reviewed Jan 24, 2019

View reviewed changes

danwinship force-pushed the operator-status branch from 0bddaa6 to 05ccc35 Compare January 24, 2019 14:00

danwinship force-pushed the operator-status branch from 05ccc35 to 65f4ead Compare January 25, 2019 12:30

Set ClusterOperator.Status

80eb989

danwinship force-pushed the operator-status branch from 65f4ead to 80eb989 Compare January 25, 2019 15:51

openshift-ci-robot assigned squeed Jan 25, 2019

openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jan 25, 2019

danwinship mentioned this pull request Jan 26, 2019

manifests: switch network operator to deployment #36

Merged

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 28, 2019

openshift-merge-robot merged commit e1ab4f6 into openshift:master Jan 28, 2019

danwinship deleted the operator-status branch January 31, 2019 14:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set ClusterOperator.Status #72

set ClusterOperator.Status #72

danwinship commented Jan 22, 2019

squeed Jan 23, 2019

squeed Jan 23, 2019

squeed Jan 23, 2019

danwinship Jan 23, 2019

danwinship Jan 23, 2019

squeed Jan 23, 2019 •

edited

Loading

squeed commented Jan 23, 2019

pecameron left a comment

openshift-bot commented Jan 23, 2019

deads2k Jan 24, 2019

danwinship Jan 24, 2019

danwinship commented Jan 24, 2019

squeed commented Jan 25, 2019

openshift-ci-robot commented Jan 25, 2019

danwinship commented Jan 25, 2019

danwinship commented Jan 25, 2019

squeed commented Jan 28, 2019

set ClusterOperator.Status #72

set ClusterOperator.Status #72

Conversation

danwinship commented Jan 22, 2019

squeed Jan 23, 2019

Choose a reason for hiding this comment

squeed Jan 23, 2019

Choose a reason for hiding this comment

squeed Jan 23, 2019

Choose a reason for hiding this comment

danwinship Jan 23, 2019

Choose a reason for hiding this comment

danwinship Jan 23, 2019

Choose a reason for hiding this comment

squeed Jan 23, 2019 • edited Loading

Choose a reason for hiding this comment

squeed commented Jan 23, 2019

pecameron left a comment

Choose a reason for hiding this comment

openshift-bot commented Jan 23, 2019

deads2k Jan 24, 2019

Choose a reason for hiding this comment

danwinship Jan 24, 2019

Choose a reason for hiding this comment

danwinship commented Jan 24, 2019

squeed commented Jan 25, 2019

openshift-ci-robot commented Jan 25, 2019

danwinship commented Jan 25, 2019

danwinship commented Jan 25, 2019

squeed commented Jan 28, 2019

squeed Jan 23, 2019 •

edited

Loading