Bug 1813743: bindata/etcd: backup and restore all static pods #257

hexfusion · 2020-03-12T17:52:17Z

Disaster recovery involves restoring to previous state. This state is defined not only by etcd but also by the static-pod resources on disk. In order to traverse time properly both need to match the actual state.

The PR does the following.

takes a snapshot of etcd state file and backup of the last modified revision of static pod resources:
kube-apiserver-pod
kube-controller-manager-pod
kube-scheduler-pod
etcd-pod

Assumptions: we need to make the assumption that the latest revision is last modified . We will read the revision from static manifest itself in future z-stream. The reason for this is if we go back in time we dont want to worry about the revisions on disk for all nodes. Meaning if we have a backup of rev 2 but the cluster is currently at 4. We will not remove the old revisions on each node. Instead, we restore etcd state to rev 2, make sure rev 2 is on disk and then force a new rev for each static pod operator.

bindata/etcd/etcd-snapshot-restore.sh

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion · 2020-03-12T22:25:50Z

/skip

retroflexer · 2020-03-13T08:25:17Z

/lgtm

retroflexer · 2020-03-13T08:25:24Z

/retest

retroflexer · 2020-03-13T08:27:40Z

Needs description and a BZ.

openshift-bot · 2020-03-13T09:46:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-13T09:59:54Z

/retest

Please review the full test history for this PR and help us cut down flakes.

hexfusion · 2020-03-13T12:16:05Z

/retest

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion · 2020-03-13T12:42:09Z

/retest

retroflexer · 2020-03-13T12:48:32Z

/lgtm

openshift-ci-robot · 2020-03-13T12:48:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, retroflexer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hexfusion · 2020-03-13T15:45:41Z

/test all

hexfusion · 2020-03-13T15:49:51Z

/retest

hexfusion · 2020-03-13T17:06:08Z

/skip

hexfusion · 2020-03-13T17:25:08Z

/retest

hexfusion · 2020-03-13T18:13:44Z

/retest

openshift-bot · 2020-03-13T22:20:58Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-13T22:46:10Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T02:40:15Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T06:46:13Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T10:53:13Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T11:45:15Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T14:21:17Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T18:41:15Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T18:54:07Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T21:04:32Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T22:47:57Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T23:13:55Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-14T23:52:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T00:05:54Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T02:02:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T02:28:55Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T03:33:55Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T03:46:54Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-03-15T06:22:58Z

/retest

Please review the full test history for this PR and help us cut down flakes.

retroflexer · 2020-03-15T21:55:43Z

/cherry-pick release-4.4

openshift-cherrypick-robot · 2020-03-15T21:55:53Z

@retroflexer: new pull request created: #263

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2020-03-16T00:28:12Z

@hexfusion: All pull requests linked via external trackers have merged. Bugzilla bug 1813743 has been moved to the MODIFIED state.

In response to this:

Bug 1813743: bindata/etcd: backup and restore all static pods

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 12, 2020

openshift-ci-robot requested review from deads2k and soltysh March 12, 2020 17:53

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2020

hexfusion force-pushed the update-dr branch from dd910f5 to c5d90a7 Compare March 12, 2020 17:53

retroflexer reviewed Mar 12, 2020

View reviewed changes

bindata/etcd/etcd-snapshot-restore.sh Show resolved Hide resolved

hexfusion force-pushed the update-dr branch 2 times, most recently from 92748f4 to d6567fe Compare March 12, 2020 18:36

alaypatel07 and others added 2 commits March 12, 2020 14:42

restore-pod.yaml: fix temporary restore directory name

2a591ce

bindata/etcd: backup and restore all static pods

c2e55b2

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

hexfusion force-pushed the update-dr branch from a0214aa to c2e55b2 Compare March 12, 2020 21:44

hexfusion changed the title ~~[wip] bindata/etcd: backup and restore all static pods~~ bindata/etcd: backup and restore all static pods Mar 12, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2020

hexfusion mentioned this pull request Mar 12, 2020

restore-pod.yaml: fix temporary restore directory name #256

Closed

openshift-ci-robot assigned retroflexer Mar 13, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020

bindata/etcd: backup last modified directory

3426c59

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2020

openshift-merge-robot merged commit 78de9c6 into openshift:master Mar 15, 2020

openshift-cherrypick-robot mentioned this pull request Mar 15, 2020

[release-4.4] Bug 1813744: bindata/etcd: backup and restore all static pods #263

Merged

hexfusion changed the title ~~bindata/etcd: backup and restore all static pods~~ Bug 1813743: bindata/etcd: backup and restore all static pods Mar 16, 2020

Bug 1813743: bindata/etcd: backup and restore all static pods #257

Bug 1813743: bindata/etcd: backup and restore all static pods #257

Conversation

hexfusion commented Mar 12, 2020 • edited

hexfusion commented Mar 12, 2020

retroflexer commented Mar 13, 2020

retroflexer commented Mar 13, 2020

retroflexer commented Mar 13, 2020

openshift-bot commented Mar 13, 2020

openshift-bot commented Mar 13, 2020

hexfusion commented Mar 13, 2020

hexfusion commented Mar 13, 2020

retroflexer commented Mar 13, 2020

openshift-ci-robot commented Mar 13, 2020

hexfusion commented Mar 13, 2020

hexfusion commented Mar 13, 2020

hexfusion commented Mar 13, 2020

hexfusion commented Mar 13, 2020

hexfusion commented Mar 13, 2020

openshift-bot commented Mar 13, 2020

openshift-bot commented Mar 13, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 14, 2020

openshift-bot commented Mar 15, 2020

openshift-bot commented Mar 15, 2020

openshift-bot commented Mar 15, 2020

openshift-bot commented Mar 15, 2020

openshift-bot commented Mar 15, 2020

openshift-bot commented Mar 15, 2020

retroflexer commented Mar 15, 2020

openshift-cherrypick-robot commented Mar 15, 2020

openshift-ci-robot commented Mar 16, 2020

hexfusion commented Mar 12, 2020 •

edited