Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1813743: bindata/etcd: backup and restore all static pods #257

Merged
merged 3 commits into from Mar 15, 2020

Conversation

hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Mar 12, 2020

Disaster recovery involves restoring to previous state. This state is defined not only by etcd but also by the static-pod resources on disk. In order to traverse time properly both need to match the actual state.

The PR does the following.

  • takes a snapshot of etcd state file and backup of the last modified revision of static pod resources:
    kube-apiserver-pod
    kube-controller-manager-pod
    kube-scheduler-pod
    etcd-pod

Assumptions: we need to make the assumption that the latest revision is last modified . We will read the revision from static manifest itself in future z-stream. The reason for this is if we go back in time we dont want to worry about the revisions on disk for all nodes. Meaning if we have a backup of rev 2 but the cluster is currently at 4. We will not remove the old revisions on each node. Instead, we restore etcd state to rev 2, make sure rev 2 is on disk and then force a new rev for each static pod operator.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 12, 2020
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2020
@hexfusion hexfusion force-pushed the update-dr branch 2 times, most recently from 92748f4 to d6567fe Compare March 12, 2020 18:36
@hexfusion hexfusion changed the title [wip] bindata/etcd: backup and restore all static pods bindata/etcd: backup and restore all static pods Mar 12, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2020
@hexfusion
Copy link
Contributor Author

/skip

@retroflexer
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020
@retroflexer
Copy link
Contributor

/retest

@retroflexer
Copy link
Contributor

Needs description and a BZ.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Contributor Author

/retest

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020
@hexfusion
Copy link
Contributor Author

/retest

@retroflexer
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, retroflexer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion
Copy link
Contributor Author

/test all

@hexfusion
Copy link
Contributor Author

/retest

@hexfusion
Copy link
Contributor Author

/skip

@hexfusion
Copy link
Contributor Author

/retest

1 similar comment
@hexfusion
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

18 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 78de9c6 into openshift:master Mar 15, 2020
@retroflexer
Copy link
Contributor

/cherry-pick release-4.4

@openshift-cherrypick-robot

@retroflexer: new pull request created: #263

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion changed the title bindata/etcd: backup and restore all static pods Bug 1813743: bindata/etcd: backup and restore all static pods Mar 16, 2020
@openshift-ci-robot
Copy link

@hexfusion: All pull requests linked via external trackers have merged. Bugzilla bug 1813743 has been moved to the MODIFIED state.

In response to this:

Bug 1813743: bindata/etcd: backup and restore all static pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants