Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move status calculation into its own controller #255

Merged
merged 1 commit into from Oct 14, 2020

Conversation

joelddiaz
Copy link
Contributor

@joelddiaz joelddiaz commented Oct 5, 2020

Rather than have each controller responsible for triggering a status calculation, move that responsiblity into its own controller.

  • Store the upcoming list of Secrets as a shared constant/variable (for use by both the Actuator's Upgradeable() and the status controller's Secret watch list).
  • Remove GetUpcomingSecrets() from the Actuator interface.
  • Stop setting default ClusterOperator status condition values from the Actuator's Upgradeable() API call (it is already set in the current status calculation codepath).
  • Update controllers to register themselves to the new status controller for status calculation callbacks.
  • Convert previous pkg/util/status into the new status controller (now in pkg/operator/status).
  • Remove hack of sending fake CredentialsRequests just to trigger a status sync from credentialsrequest controller (new status calculator now will watch and react appropriately).
  • Update credentialsrequest test cases to only check for the setting of non-default conditions.
  • Update credentialsrequest test cases to no longer need to register and deregister status callbacks.
  • Move credentialsrequest status test cases to new status controller.
  • Update secretannotator actuators to no longer trigger a status sync.

Note: for situations where the status controller cannot watch for certain conditions that should cause a status recalculation, the controller can annotate the CCO's operator config object to add a timestamp (which will in turn trigger a status calc).

Also note: Ensure the managementState field is set on our CCO operator config (otherwise we run into problems trying to annotate the operator config). The operator config should not allow an empty string for the managementState field, but it absolutely is not set at all (which shows up as being set to the empty string after doing a Get() of the operator config). Detect this, and assume that the empty string means a managementState of "Managed", and send back "Managed" on Update().

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2020
@joelddiaz
Copy link
Contributor Author

/assign @dgoodwin
/cc @akhil-rane

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2020
@@ -118,4 +123,19 @@ var (
ModeDegraded,
ModeUnknown,
}

// Add known new credentials for next version upgrade
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the rationale behind going to constants vs the actuator? Too difficult or awkward to instantiate an actuator in the status controller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seemed weird to instantiate the entire credentials actuator to just get a static list of objects. My first pass had me setting up the status controller to embed the actuator, but I ended up with this instead.

// Need to be able to force a status calculation here because the status
// controller has no Watch that could cause it to run for this situation.
// Could we annotate our secret to trigger a status calc???
status.SyncStatus(r.controllerRuntimeClient, r.logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stared at this for a bit, still not sure what we should do to avoid that. Need to think on it a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really annoyed by this one. This thought occurred to me this morning: we could annotate the config CR with the current datestamp from the oidc controller???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this controller sees the degraded state change, annotated the CloudConfig CR with a timestamp to force a reconcile? That would put it into the normal controller flow, with error handling etc. Seems like a good idea to me.

pkg/operator/status/status_controller.go Outdated Show resolved Hide resolved
pkg/operator/status/status_controller.go Outdated Show resolved Hide resolved
pkg/operator/status/status_controller.go Outdated Show resolved Hide resolved
@joelddiaz joelddiaz force-pushed the status-controller branch 2 times, most recently from 6898026 to ceb48d6 Compare October 6, 2020 12:50
@joelddiaz joelddiaz changed the title WIP: move status calculation into its own controller move status calculation into its own controller Oct 6, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 6, 2020
@joelddiaz
Copy link
Contributor Author

/test e2e-aws

1 similar comment
@joelddiaz
Copy link
Contributor Author

/test e2e-aws

@joelddiaz
Copy link
Contributor Author

@dgoodwin un-WIPed as I finally tested this and it looked fine to me. The working (but IMO non-ideal) syncing from the oidc controller is still in the current version of the PR.

if existingCondition == nil {
existing = append(existing, newCondition)
} else {
logger.WithField("statushandler", handlerName).Infof("condition already set for type %s by a previous handler, the new condition from the current handler will be accepted: %v", newCondition.Type, newCondition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last write wins? Could you clarify this in the function comment a little. This should probably be a warning as well as info is being lost. The suggestion above for logging every condition should help when combined with that.

if err != nil {
logger.WithError(err).WithField("statushandler", handlerName).Errorf("failed to get conditions from status handler")
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you info log each condition we got here and the status handler it came from. Then below when we see a conflict we can log that as a warning and still know the user/admin can see which handlers reported the same condition.

rootSecret.Name = constants.VSphereCloudCredSecretName
secrets = append(secrets, constants.VsphereUpcomingSecrets...)
default:
log.Errorf("unable to provide upcoming secrets for unknown platform: %v", platformType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it will be called on bare metal, so hitting this line is really not an error IMO, I'd go with info (or maybe even debug) and soften the message up a little bit.

},
},
},
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage seems a little light in here, could we get something to simulate two handlers returning the same condition? Or is the handler registration making this infeasible?

@joelddiaz joelddiaz force-pushed the status-controller branch 3 times, most recently from aea4378 to dcd7d1a Compare October 8, 2020 18:18
@openshift-ci-robot
Copy link
Contributor

@joelddiaz: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws dcd7d1a link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@joelddiaz
Copy link
Contributor Author

/test e2e-aws

Copy link
Contributor

@dgoodwin dgoodwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small typo but I think we're gtg otherwise, please squash.

// cannot watch for. So just update the config CR (which the status controller does watch)
// to trigger a status recalculation.
if err := r.updateConfigCRTimestamp(); err != nil {
r.logger.WithError(err).Error("failed to updated config CR to trigger status caluclation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small calculation typo here

Rather than have each controller responsible for triggering a status calculation, move that responsiblity into its own controller.

Store the upcoming list of Secrets as a shared constant/variable (for use by both the Actuator's Upgradeable() and the status controller's Secret watch list).
Remove GetUpcomingSecrets() from the Actuator interface.
Stop setting default ClusterOperator status condition values from the Actuator's Upgradeable() API call (it is already set in the current status calculation codepath).
Update controllers to register themselves to the new status controller for status calculation callbacks.
Convert previous pkg/util/status into the new status controller (now in pkg/operator/status).
Remove hack of sending fake CredentialsRequests just to trigger a status sync from credentialsrequest controller (new status calculator now will watch and react appropriately).
Update credentialsrequest test cases to only check for the setting of non-default conditions.
Update credentialsrequest test cases to no longer need to register and deregister status callbacks.
Move credentialsrequest status test cases to new status controller.
Update secretannotator actuators to no longer trigger a status sync.

Note: for situations where the status controller cannot watch for certain conditions that should cause a status recalculation, the controller can annotate the CCO's operator config object to add a timestamp (which will in turn trigger a status calc).

Important changes:
Ensure the managementState field is set on our CCO operator config (otherwise we run into problems trying to annotate the operator config). The operator config should not allow an empty string for the managementState field, but it absolutely is not set at all (which shows up as being set to the empty string after doing a Get() of the operator config). Detect this, and assume that the empty string means a managementState of "Managed", and send back "Managed" on Update().
@dgoodwin
Copy link
Contributor

/lgtm

Nice work, looking forward to this as I have mass conflicts every time my creds rotate. :)

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 14, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, joelddiaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a59af2c into openshift:master Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants