Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set ClusterOperator.Status #72

Merged
merged 2 commits into from
Jan 28, 2019

Conversation

danwinship
Copy link
Contributor

OK, this may still need a bit of testing/finishing but it's at least ready for review.

  • Currently, at any time exactly one of Failing, Progressing, and Available should be True, though we'd talked at least in passing about, eg, making the cluster be both Failing and Available if the requested configuration is bad but the deployed configuration is good.
  • AFAICT if we want to watch the cluster config, operator config, and a bunch of daemonsets, we need to have three separate Controllers and three separate Reconcilers.
  • In theory this will not need any changes to support Multus/OVN/Kuryr; the NetworkConfig reconciler will pull the Multus/OVN/Kuryr-related DaemonSet names out of the network.Render() result, and pass them to the DaemonSet reconciler, which will then update the operator status based on the status of those DaemonSets rather than / in addition to the openshift-sdn ones. (Although at some point we'll probably need to add Deployment-reconciling support in addition to the DaemonSet support.)
  • StatusManager exists mostly to deal with the fact that we need three separate Reconcilers, but they have to coordinate the resulting operator status. In particular, if the configuration is invalid, and then a DaemonSet's status changes, we don't want that to result in the operator status becoming Available, so StatusManager.SetFromDaemonSets() needs to be able to know not to change the status.
  • StatusManager is intended to eventually be useful to other operators (which is why it takes the operator name and version as constructor arguments).

@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 22, 2019
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2019
@squeed squeed requested a review from deads2k January 23, 2019 11:36
@@ -83,27 +85,34 @@ func (r *ReconcileClusterConfig) Reconcile(request reconcile.Request) (reconcile
}
// Error reading the object - requeue the request.
log.Println(err)
// FIXME: operator status?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, tough to know what to do. If it weren't a transient failure, then I suppose we would treat this the same as invalid input. This seems very unlikely though.

func (r *ReconcileDaemonSets) Reconcile(request reconcile.Request) (reconcile.Result, error) {
found := false
for _, ds := range r.daemonSets {
if ds.Namespace == request.Namespace && ds.Name == request.Name {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying that the operator-sdk doesn't support mixed-scope watches, AFAICT.

ds := &appsv1.DaemonSet{}
if err := status.client.Get(context.TODO(), dsName, ds); err != nil {
if errors.IsNotFound(err) {
return status.SetFailing("NoDaemonSet", fmt.Errorf("DaemonSet %q does not exist", dsName.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it useful to know if more than one daemonset is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, probably not; this would normally just be a transient failure during the first few moments of the cluster's existence, and no one will ever actually see it. (And if we're going to combine errors then what if we're missing the Namespace for one DaemonSet and the DaemonSet for another, etc?)

for _, condition := range conditions {
co.Status.Conditions = SetStatusCondition(co.Status.Conditions, condition)
}
co.Status.Version = status.version
Copy link
Contributor

@squeed squeed Jan 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the latest API (openshift/api#176) changes how version is reported. This field is now an array.

In the interest of getting this out the door, let's just drop this line, revendor api, and merge the PR, then figure out the operand stuff afterwards.

@squeed
Copy link
Contributor

squeed commented Jan 23, 2019

A few minor things, but this looks awesome. I'll play around with it tomorrow.

Implementing the operand versioning will probably require a bit more sophistication in the daemonset watcher, but I don't think we'll be sad if we merge this before.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2019
Copy link
Contributor

@pecameron pecameron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
FYI, ovn has one daemonset and one deployment.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@danwinship danwinship removed the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2019

// SetStatusCondition returns the result of setting the specified condition in
// the given slice of conditions.
func SetStatusCondition(oldConditions []configv1.ClusterOperatorStatusCondition, condition *configv1.ClusterOperatorStatusCondition) []configv1.ClusterOperatorStatusCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's based on the updated ClusterOperator so would require an openshift/api bump too. We'll grab that in the next update after we get the initial version committed.

@danwinship
Copy link
Contributor Author

repushed with a StatusManager unit test

@squeed
Copy link
Contributor

squeed commented Jan 25, 2019

/hold
because pulls are on hold for rebase
/lgtm

@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jan 25, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, pecameron, squeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@danwinship
Copy link
Contributor Author

@danwinship: The following test failed, say /retest to rerun them all:

        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-01-25T16:11:35Z",
                "generation": 1,
                "name": "openshift-network-operator",
                "namespace": "",
                "resourceVersion": "8145",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/openshift-network-operator",
                "uid": "e35b0622-20bb-11e9-83a5-121386b5f244"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-01-25T16:11:43Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-01-25T16:20:08Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-01-25T16:20:08Z",
                        "status": "True",
                        "type": "Available"
                    }
                ],
                "extension": null,
                "version": ""
            }
        }

Somebody Else's Problem! :-)

@danwinship
Copy link
Contributor Author

/retest

@squeed
Copy link
Contributor

squeed commented Jan 28, 2019

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 28, 2019
@openshift-merge-robot openshift-merge-robot merged commit e1ab4f6 into openshift:master Jan 28, 2019
@danwinship danwinship deleted the operator-status branch January 31, 2019 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants