Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout #619

Merged
merged 1 commit into from Jul 1, 2021

Conversation

hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Jun 30, 2021

openshift-managed-config configmap csr-controller-ca is a critical resource for the cluster and is generated by openshift-kube-apiserver-operator render command using ca minted on the installer. When KCM-O starts it deletes and replaces this resource and then manages cert rotation moving forward.

The PR fixes a race condition in SNO where if etcd and or kas rollout and generate disruption before the ca pivots manager we can brick the cluster if the operator loses leadership before the resource is generated. This can lead to node Not Ready and the operator not able to start again. Now we wait for the signal of pivot then continue rollout.

related to: openshift/cluster-kube-apiserver-operator#1169

@openshift-ci openshift-ci bot requested review from marun and retroflexer June 30, 2021 01:43
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 30, 2021
@hexfusion
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 30, 2021
@hexfusion hexfusion force-pushed the wait-for-kcm branch 4 times, most recently from eda0d26 to 0141219 Compare June 30, 2021 03:28
@openshift-ci openshift-ci bot requested review from deads2k and sttts June 30, 2021 05:11
@hexfusion hexfusion changed the title pkg/operator/targetconfigcontroller: wait for kcm-o to rotate certs before rollout pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout Jun 30, 2021
@romfreiman
Copy link

/test e2e-single-node-live-iso

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 30, 2021

@romfreiman: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test configmap-scale
  • /test e2e-agnostic
  • /test e2e-agnostic-upgrade
  • /test e2e-aws
  • /test e2e-aws-disruptive
  • /test e2e-aws-disruptive-ovn
  • /test e2e-aws-single-node
  • /test e2e-azure
  • /test e2e-gcp
  • /test e2e-gcp-disruptive
  • /test e2e-gcp-disruptive-five-control-plane-replicas
  • /test e2e-gcp-disruptive-ovn
  • /test e2e-gcp-five-control-plane-replicas
  • /test e2e-gcp-upgrade-five-control-plane-replicas
  • /test e2e-metal-assisted
  • /test e2e-metal-ipi
  • /test e2e-metal-single-node-live-iso
  • /test e2e-operator
  • /test images
  • /test unit
  • /test verify
  • /test verify-deps

Use /test all to run the following jobs:

  • pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic
  • pull-ci-openshift-cluster-etcd-operator-master-e2e-agnostic-upgrade
  • pull-ci-openshift-cluster-etcd-operator-master-e2e-gcp-five-control-plane-replicas
  • pull-ci-openshift-cluster-etcd-operator-master-e2e-operator
  • pull-ci-openshift-cluster-etcd-operator-master-images
  • pull-ci-openshift-cluster-etcd-operator-master-unit
  • pull-ci-openshift-cluster-etcd-operator-master-verify
  • pull-ci-openshift-cluster-etcd-operator-master-verify-deps

In response to this:

/test e2e-single-node-live-iso

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@romfreiman
Copy link

/test e2e-metal-single-node-live-iso

@hexfusion
Copy link
Contributor Author

/test e2e-metal-single-node-live-iso

… before rollout

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Contributor Author

/test e2e-metal-single-node-live-iso

@hexfusion hexfusion changed the title pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout Jul 1, 2021
@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jul 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@hexfusion: This pull request references Bugzilla bug 1969633, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @sandeepknd

In response to this:

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@openshift-ci[bot]: GitHub didn't allow me to request PR reviews from the following users: sandeepknd.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

@hexfusion: This pull request references Bugzilla bug 1969633, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @sandeepknd

In response to this:

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 1, 2021
Copy link
Member

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@hexfusion: This pull request references Bugzilla bug 1969633, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @sandeepknd

In response to this:

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@openshift-ci[bot]: GitHub didn't allow me to request PR reviews from the following users: sandeepknd.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

@hexfusion: This pull request references Bugzilla bug 1969633, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @sandeepknd

In response to this:

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Contributor Author

/override ci/prow/e2e-gcp-five-control-plane-replicas

failure not related to this PR

 [sig-storage] Multi-AZ Cluster Volumes should schedule pods in the same zones as statically provisioned PVs [Suite:openshift/conformance/parallel] [Suite:k8s]

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@hexfusion: Overrode contexts on behalf of hexfusion: ci/prow/e2e-gcp-five-control-plane-replicas

In response to this:

/override ci/prow/e2e-gcp-five-control-plane-replicas

failure not related to this PR

[sig-storage] Multi-AZ Cluster Volumes should schedule pods in the same zones as statically provisioned PVs [Suite:openshift/conformance/parallel] [Suite:k8s]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 706b197 into openshift:master Jul 1, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 1, 2021

@hexfusion: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1969633 has not been moved to the MODIFIED state.

In response to this:

Bug 1969633: pkg/operator/targetconfigcontroller: wait for kcm-o to generate certs before rollout

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion deleted the wait-for-kcm branch July 1, 2021 18:26
@hexfusion
Copy link
Contributor Author

/cherry-pick release-4.8

@openshift-cherrypick-robot

@hexfusion: new pull request created: #621

In response to this:

/cherry-pick release-4.8

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants