move status calculation into its own controller #255

joelddiaz · 2020-10-05T13:13:37Z

Rather than have each controller responsible for triggering a status calculation, move that responsiblity into its own controller.

Note: for situations where the status controller cannot watch for certain conditions that should cause a status recalculation, the controller can annotate the CCO's operator config object to add a timestamp (which will in turn trigger a status calc).

Also note: Ensure the managementState field is set on our CCO operator config (otherwise we run into problems trying to annotate the operator config). The operator config should not allow an empty string for the managementState field, but it absolutely is not set at all (which shows up as being set to the empty string after doing a Get() of the operator config). Detect this, and assume that the empty string means a managementState of "Managed", and send back "Managed" on Update().

joelddiaz · 2020-10-05T13:13:59Z

/assign @dgoodwin
/cc @akhil-rane

dgoodwin · 2020-10-05T17:47:59Z

pkg/operator/constants/constants.go

@@ -118,4 +123,19 @@ var (
 		ModeDegraded,
 		ModeUnknown,
 	}
+
+	// Add known new credentials for next version upgrade


What was the rationale behind going to constants vs the actuator? Too difficult or awkward to instantiate an actuator in the status controller?

Mostly seemed weird to instantiate the entire credentials actuator to just get a static list of objects. My first pass had me setting up the status controller to embed the actuator, but I ended up with this instead.

dgoodwin · 2020-10-05T17:59:25Z

pkg/operator/oidcdiscoveryendpoint/controller.go

+			// Need to be able to force a status calculation here because the status
+			// controller has no Watch that could cause it to run for this situation.
+			// Could we annotate our secret to trigger a status calc???
+			status.SyncStatus(r.controllerRuntimeClient, r.logger)


Stared at this for a bit, still not sure what we should do to avoid that. Need to think on it a bit more.

I'm really annoyed by this one. This thought occurred to me this morning: we could annotate the config CR with the current datestamp from the oidc controller???

When this controller sees the degraded state change, annotated the CloudConfig CR with a timestamp to force a reconcile? That would put it into the normal controller flow, with error handling etc. Seems like a good idea to me.

pkg/operator/status/status_controller.go

pkg/operator/constants/constants.go

joelddiaz · 2020-10-06T16:40:54Z

/test e2e-aws

joelddiaz · 2020-10-06T21:19:43Z

/test e2e-aws

joelddiaz · 2020-10-07T11:37:02Z

@dgoodwin un-WIPed as I finally tested this and it looked fine to me. The working (but IMO non-ideal) syncing from the oidc controller is still in the current version of the PR.

dgoodwin · 2020-10-08T12:12:48Z

pkg/operator/status/status_controller.go

+		if existingCondition == nil {
+			existing = append(existing, newCondition)
+		} else {
+			logger.WithField("statushandler", handlerName).Infof("condition already set for type %s by a previous handler, the new condition from the current handler will be accepted: %v", newCondition.Type, newCondition)


Last write wins? Could you clarify this in the function comment a little. This should probably be a warning as well as info is being lost. The suggestion above for logging every condition should help when combined with that.

dgoodwin · 2020-10-08T12:13:27Z

pkg/operator/status/status_controller.go

+		if err != nil {
+			logger.WithError(err).WithField("statushandler", handlerName).Errorf("failed to get conditions from status handler")
+			continue
+		}


Could you info log each condition we got here and the status handler it came from. Then below when we see a conflict we can log that as a warning and still know the user/admin can see which handlers reported the same condition.

dgoodwin · 2020-10-08T12:15:59Z

pkg/operator/status/status_controller.go

+		rootSecret.Name = constants.VSphereCloudCredSecretName
+		secrets = append(secrets, constants.VsphereUpcomingSecrets...)
+	default:
+		log.Errorf("unable to provide upcoming secrets for unknown platform: %v", platformType)


This looks like it will be called on bare metal, so hitting this line is really not an error IMO, I'd go with info (or maybe even debug) and soften the message up a little bit.

dgoodwin · 2020-10-08T12:18:54Z

pkg/operator/status/status_controller_test.go

+				},
+			},
+		},
+	}


Coverage seems a little light in here, could we get something to simulate two handlers returning the same condition? Or is the handler registration making this infeasible?

openshift-ci-robot · 2020-10-08T19:44:57Z

@joelddiaz: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws	`dcd7d1a`	link	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

joelddiaz · 2020-10-14T17:41:17Z

/test e2e-aws

dgoodwin

One small typo but I think we're gtg otherwise, please squash.

dgoodwin · 2020-10-14T18:40:48Z

pkg/operator/oidcdiscoveryendpoint/controller.go

+			// cannot watch for. So just update the config CR (which the status controller does watch)
+			// to trigger a status recalculation.
+			if err := r.updateConfigCRTimestamp(); err != nil {
+				r.logger.WithError(err).Error("failed to updated config CR to trigger status caluclation")


small calculation typo here

Rather than have each controller responsible for triggering a status calculation, move that responsiblity into its own controller. Store the upcoming list of Secrets as a shared constant/variable (for use by both the Actuator's Upgradeable() and the status controller's Secret watch list). Remove GetUpcomingSecrets() from the Actuator interface. Stop setting default ClusterOperator status condition values from the Actuator's Upgradeable() API call (it is already set in the current status calculation codepath). Update controllers to register themselves to the new status controller for status calculation callbacks. Convert previous pkg/util/status into the new status controller (now in pkg/operator/status). Remove hack of sending fake CredentialsRequests just to trigger a status sync from credentialsrequest controller (new status calculator now will watch and react appropriately). Update credentialsrequest test cases to only check for the setting of non-default conditions. Update credentialsrequest test cases to no longer need to register and deregister status callbacks. Move credentialsrequest status test cases to new status controller. Update secretannotator actuators to no longer trigger a status sync. Note: for situations where the status controller cannot watch for certain conditions that should cause a status recalculation, the controller can annotate the CCO's operator config object to add a timestamp (which will in turn trigger a status calc). Important changes: Ensure the managementState field is set on our CCO operator config (otherwise we run into problems trying to annotate the operator config). The operator config should not allow an empty string for the managementState field, but it absolutely is not set at all (which shows up as being set to the empty string after doing a Get() of the operator config). Detect this, and assume that the empty string means a managementState of "Managed", and send back "Managed" on Update().

dgoodwin · 2020-10-14T19:14:16Z

/lgtm

Nice work, looking forward to this as I have mass conflicts every time my creds rotate. :)

openshift-ci-robot · 2020-10-14T19:14:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, joelddiaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin,joelddiaz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-10-14T20:41:48Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2020

openshift-ci-robot assigned dgoodwin Oct 5, 2020

openshift-ci-robot requested review from akhil-rane, abutcher and csrwng October 5, 2020 13:15

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2020

joelddiaz force-pushed the status-controller branch from ee41f62 to 8336503 Compare October 5, 2020 13:40

dgoodwin reviewed Oct 5, 2020

View reviewed changes

akhil-rane reviewed Oct 6, 2020

View reviewed changes

pkg/operator/constants/constants.go Outdated Show resolved Hide resolved

joelddiaz force-pushed the status-controller branch 2 times, most recently from 6898026 to ceb48d6 Compare October 6, 2020 12:50

joelddiaz changed the title ~~WIP: move status calculation into its own controller~~ move status calculation into its own controller Oct 6, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2020

dgoodwin requested changes Oct 8, 2020

View reviewed changes

joelddiaz force-pushed the status-controller branch 3 times, most recently from aea4378 to dcd7d1a Compare October 8, 2020 18:18

joelddiaz force-pushed the status-controller branch from dcd7d1a to fabe045 Compare October 14, 2020 15:39

dgoodwin reviewed Oct 14, 2020

View reviewed changes

joelddiaz force-pushed the status-controller branch from fabe045 to dbfdb78 Compare October 14, 2020 18:58

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2020

openshift-merge-robot merged commit a59af2c into openshift:master Oct 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

move status calculation into its own controller #255

move status calculation into its own controller #255

joelddiaz commented Oct 5, 2020 •

edited

joelddiaz commented Oct 5, 2020

dgoodwin Oct 5, 2020

joelddiaz Oct 5, 2020

dgoodwin Oct 5, 2020

joelddiaz Oct 5, 2020

dgoodwin Oct 8, 2020

joelddiaz commented Oct 6, 2020

joelddiaz commented Oct 6, 2020

joelddiaz commented Oct 7, 2020

dgoodwin Oct 8, 2020

dgoodwin Oct 8, 2020

dgoodwin Oct 8, 2020

dgoodwin Oct 8, 2020

openshift-ci-robot commented Oct 8, 2020

joelddiaz commented Oct 14, 2020

dgoodwin left a comment

dgoodwin Oct 14, 2020

dgoodwin commented Oct 14, 2020

openshift-ci-robot commented Oct 14, 2020

openshift-bot commented Oct 14, 2020

move status calculation into its own controller #255

move status calculation into its own controller #255

Conversation

joelddiaz commented Oct 5, 2020 • edited

joelddiaz commented Oct 5, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelddiaz commented Oct 6, 2020

joelddiaz commented Oct 6, 2020

joelddiaz commented Oct 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Oct 8, 2020

joelddiaz commented Oct 14, 2020

dgoodwin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgoodwin commented Oct 14, 2020

openshift-ci-robot commented Oct 14, 2020

openshift-bot commented Oct 14, 2020

joelddiaz commented Oct 5, 2020 •

edited