Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1837540: Use restore pod yaml from the backup when restoring #436

Conversation

retroflexer
Copy link
Contributor

Currently when we restore a cluster from backup, we use restore-pod.yaml from the currently installed image. This is problematic especially when switching across releases (minor or micro release versions).

@retroflexer retroflexer force-pushed the extract-restore-pod-from-backup branch 2 times, most recently from 0130f6b to 82ad2ef Compare September 8, 2020 16:06
@retroflexer retroflexer force-pushed the extract-restore-pod-from-backup branch 3 times, most recently from 32a01db to 35535f9 Compare September 8, 2020 17:18
@retroflexer retroflexer changed the title Use restore pod yaml from the backup when restoring Bug 1837540: Use restore pod yaml from the backup when restoring Sep 8, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Sep 8, 2020
@openshift-ci-robot
Copy link

@retroflexer: This pull request references Bugzilla bug 1837540, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1837540: Use restore pod yaml from the backup when restoring

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Sep 8, 2020
@hexfusion
Copy link
Contributor

@retroflexer can we remove RESTORE_ETCD_POD_YAML, I don't think we need that anymore right?

RESTORE_ETCD_POD_YAML="${CONFIG_FILE_DIR}/static-pod-resources/etcd-certs/configmaps/restore-etcd-pod/pod.yaml"

@retroflexer retroflexer force-pushed the extract-restore-pod-from-backup branch from 35535f9 to 32c5b95 Compare September 9, 2020 16:25
@retroflexer
Copy link
Contributor Author

/test e2e-upgrade

@hexfusion
Copy link
Contributor

hexfusion commented Sep 9, 2020

/hold

I would like to understand disruptive failure;
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/436/pull-ci-openshift-cluster-etcd-operator-master-e2e-disruptive/1303731232332320768

[sig-etcd][Feature:DisasterRecovery][Disruptive] [Feature:EtcdRecovery] Cluster should restore itself after quorum loss [Serial] [Suite:openshift] expand_less	11m54s
fail [github.com/openshift/origin/test/extended/dr/common.go:299]: Unexpected error:
    <*errors.errorString | 0xc0009ed4c0>: {
        s: "failed running \"sudo -i /bin/bash -cx '/usr/local/bin/cluster-restore.sh /home/core/backup'\": <nil> (exit code 1, stderr + /usr/local/bin/cluster-restore.sh /home/core/backup\n)",
    }
    failed running "sudo -i /bin/bash -cx '/usr/local/bin/cluster-restore.sh /home/core/backup'": <nil> (exit code 1, stderr + /usr/local/bin/cluster-restore.sh /home/core/backup
    )
occurred

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 9, 2020
@retroflexer retroflexer force-pushed the extract-restore-pod-from-backup branch from 32c5b95 to 8b146ce Compare September 9, 2020 20:43
@retroflexer
Copy link
Contributor Author

/retest

@hexfusion hexfusion added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Sep 9, 2020
@retroflexer
Copy link
Contributor Author

@retroflexer can we remove RESTORE_ETCD_POD_YAML, I don't think we need that anymore right?

RESTORE_ETCD_POD_YAML="${CONFIG_FILE_DIR}/static-pod-resources/etcd-certs/configmaps/restore-etcd-pod/pod.yaml"

@retroflexer retroflexer closed this Sep 9, 2020
@openshift-ci-robot
Copy link

@retroflexer: This pull request references Bugzilla bug 1837540. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

Bug 1837540: Use restore pod yaml from the backup when restoring

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer retroflexer reopened this Sep 9, 2020
@openshift-ci-robot
Copy link

@retroflexer: This pull request references Bugzilla bug 1837540, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1837540: Use restore pod yaml from the backup when restoring

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2020

@retroflexer: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-disruptive 8b146ce link /test e2e-disruptive

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hexfusion
Copy link
Contributor

Based on manual testing results from @retroflexer this is working as expected.
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 10, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, retroflexer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2020
@hexfusion
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2020
@openshift-merge-robot openshift-merge-robot merged commit 0806334 into openshift:master Sep 10, 2020
@openshift-ci-robot
Copy link

@retroflexer: All pull requests linked via external trackers have merged:

Bugzilla bug 1837540 has been moved to the MODIFIED state.

In response to this:

Bug 1837540: Use restore pod yaml from the backup when restoring

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Contributor Author

/cherry-pick 4.5

@openshift-cherrypick-robot

@retroflexer: cannot checkout 4.5: error checking out 4.5: exit status 1. output: error: pathspec '4.5' did not match any file(s) known to git

In response to this:

/cherry-pick 4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Contributor Author

/cherry-pick release-4.5

@openshift-cherrypick-robot

@retroflexer: new pull request created: #439

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Contributor Author

/cherry-pick release-4.4

@openshift-cherrypick-robot

@retroflexer: new pull request created: #441

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants