Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic control plane upgrade orchestration #81

Merged

Conversation

ironcladlou
Copy link
Contributor

@ironcladlou ironcladlou commented Mar 4, 2021

This commit introduces basic support for control plane upgrades according to the
HostedCluster versioning API.

Control plane upgrades are initiated by updating the HostedCluster
.spec.release.image field, which is propagated to the HostedControlPlane
resource following some rules which are articulated in the test cases but not
yet codified in the API. For example, rollouts are serialized. See the test
cases for more details.

To make this work, the control plane operator has been updated to remove gating
logic which caused it to stop reconciling in reaction to fields like a "ready"
flag (which are reminiscent of state machine behavior). Now the
HostedControlPlane will always be reconciled idempotently.

The HostedControlPlane controller itself isn't architected with granular
reconciliation in mind and so there are still some spurious updates and
reconcile attempts when a no-op would have been preferred, but things should be
basically correct.

In addition, the HostedControlPlane controller is not very accurate with regards
to its status reporting, as it does no deep per-component analysis to gauge
control plane health. This weakness of reporting is reflected to some extent now
in HostedCluster (garbage in, garbage out).

This commit expands status for HostedCluster with a basic Availability condition
(see the new tests for details).

Most of the status reporting happening now is very minimal and has lots of room
for improvement to provide better decision making context for other logic. The
goal of this commit is to establish the bare minimum parts to make everything
work together and demonstrate a functional system for iteration.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 4, 2021
@@ -96,6 +96,9 @@ type HostedControlPlaneStatus struct {
// +kubebuilder:validation:Optional
Version string `json:"version,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need both version and ReleaseImage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from our original API design... my understanding is users of HostedCluster want to know the semver for the release, so it has to bubble up from here as it's resolved at runtime. It's not used for reconciliation and is just additional metadata associated with the release image (again, just my understanding). cc @csrwng @sjenning @derekwaynecarr

@ironcladlou ironcladlou force-pushed the cp-operator-reconciling branch 5 times, most recently from 2a621fc to a9151c5 Compare March 5, 2021 19:05
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@ironcladlou ironcladlou changed the title WIP: Implement basic control plane upgrades Implement basic control plane upgrade orchestration Mar 5, 2021
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 5, 2021
@ironcladlou
Copy link
Contributor Author

@sjenning @enxebre @csrwng @derekwaynecarr this is ready for review, I'm down to just polishing at this point. Hopefully the test cases (both unit and e2e) are useful for evaluating whether the behavior is correctly defined.

@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@ironcladlou
Copy link
Contributor Author

/test e2e-aws

This commit introduces basic support for control plane upgrades according to the
HostedCluster versioning API.

Control plane upgrades are initiated by updating the HostedCluster
`.spec.release.image` field, which is propagated to the HostedControlPlane
resource following some rules which are articulated in the test cases but not
yet codified in the API. For example, rollouts are serialized. See the test
cases for more details.

To make this work, the control plane operator has been updated to remove gating
logic which caused it to stop reconciling in reaction to fields like a "ready"
flag (which are reminiscent of state machine behavior). Now the
HostedControlPlane will always be reconciled idempotently.

The HostedControlPlane controller itself isn't architected with granular
reconciliation in mind and so there are still some spurious updates and
reconcile attempts when a no-op would have been preferred, but things should be
basically correct.

In addition, the HostedControlPlane controller is not very accurate with regards
to its status reporting, as it does no deep per-component analysis to gauge
control plane health. This weakness of reporting is reflected to some extent now
in HostedCluster (garbage in, garbage out).

This commit expands status for HostedCluster with a basic Availability condition
(see the new tests for details).

Most of the status reporting happening now is very minimal and has lots of room
for improvement to provide better decision making context for other logic. The
goal of this commit is to establish the bare minimum parts to make everything
work together and demonstrate a functional system for iteration.
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

1 similar comment
@ironcladlou
Copy link
Contributor Author

/test e2e-aws

@sjenning
Copy link
Contributor

sjenning commented Mar 5, 2021

/lgtm

need to make sure we want to stick with this as the API, but this works for me

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2021
@openshift-merge-robot openshift-merge-robot merged commit 52bc679 into openshift:main Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants