Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOSTEDCP-1081: Perform etcd recovery when etcd member data is lost #2900

Merged
merged 1 commit into from Aug 11, 2023

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Aug 10, 2023

What this PR does / why we need it:
When an etcd member's data has been lost, the only recourse we currently have is to do a hacky manual recovery. With this change, it is as simple as deleting the bad member's pvc and pod to restore a member.

Automates the steps described in https://issues.redhat.com/browse/HOSTEDCP-1081

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1081

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 10, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 10, 2023

@csrwng: This pull request references HOSTEDCP-1081 which is a valid jira issue.

In response to this:

What this PR does / why we need it:
When an etcd member's data has been lost, the only recourse we currently have is to do a hacky manual recovery. With this change, it is as simple as deleting the bad member's pvc and pod to restore a member.

Automates the steps described in https://issues.redhat.com/browse/HOSTEDCP-1081

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1081

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 10, 2023

@csrwng: This pull request references HOSTEDCP-1081 which is a valid jira issue.

In response to this:

What this PR does / why we need it:
When an etcd member's data has been lost, the only recourse we currently have is to do a hacky manual recovery. With this change, it is as simple as deleting the bad member's pvc and pod to restore a member.

Automates the steps described in https://issues.redhat.com/browse/HOSTEDCP-1081

Which issue(s) this PR fixes (optional, use fixes #<issue_number>(, fixes #<issue_number>, ...) format, where issue_number might be a GitHub issue, or a Jira story:
Fixes #HOSTEDCP-1081

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Aug 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-area labels Aug 10, 2023
@sjenning
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2023
When an etcd member's data has been lost, the only recourse we currently
have is to do a hacky manual recovery. With this change, it is as simple
as deleting the bad member's pvc and pod to restore a member.
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 10, 2023
@sjenning
Copy link
Contributor

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2023

@csrwng: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-kubevirt-aws-ovn 5f0a4ef link true /test e2e-kubevirt-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sjenning
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2023
@openshift-merge-robot openshift-merge-robot merged commit cb96cd6 into openshift:main Aug 11, 2023
12 of 13 checks passed
@csrwng csrwng deleted the etcd-recovery branch November 15, 2023 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants