From 1ec6649e03727c4187bee783d1cce2b96e3e342c Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Wed, 18 May 2016 19:48:17 +0200 Subject: [PATCH 01/11] Add section "Backup and Restore" to the Cluster Administration Guide. * _build_cfg.yml: ...here. * admin_guide/backup_restore.adoc: New file. --- _build_cfg.yml | 2 + admin_guide/backup_restore.adoc | 197 ++++++++++++++++++++++++++++++++ 2 files changed, 199 insertions(+) create mode 100644 admin_guide/backup_restore.adoc diff --git a/_build_cfg.yml b/_build_cfg.yml index c17028aa7e83..a411e29a9122 100644 --- a/_build_cfg.yml +++ b/_build_cfg.yml @@ -441,6 +441,8 @@ Topics: - Name: Building Dependency Trees File: building_dependency_trees Distros: openshift-origin,openshift-enterprise + - Name: Backup and Restore + File: backup_restore - Name: Troubleshooting Networking File: sdn_troubleshooting Distros: openshift-origin,openshift-enterprise diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc new file mode 100644 index 000000000000..be45b8061439 --- /dev/null +++ b/admin_guide/backup_restore.adoc @@ -0,0 +1,197 @@ += Backup and Restore +{product-author} +{product-version} +:data-uri: +:icons: font +:experimental: +:toc: macro +:toc-title: +:prewrap!: + +toc::[] + + +// REVIEWERS: READ THIS! +// +// In the following text, there are questions of the form: +// //??? QUESTION +// Please feel free to make a line-comment to answer them, in addition +// to any other (line-)comments on the correctness of the text. +// +// - Usually, the question pertains to the text preceding it. +// Questions pertaining to following text are explicitly noted. +// +// - There are bunch of questions at the end. +// +// Thanks for your cooperation on this (experimental) method of +// refining the documentation. Hopefully it will bear good fruit. + +== Overview + +In {product-title}, you can +_back up_, saving state to separate storage, +and _restore_, recreating state from separate storage, +at the cluster level. +There is also some preliminary support for +xref:project-backup[per-project backup]. +The full state of a cluster installation includes: + +- etcd data on each master +- API objects +- registry storage +- volume storage + +This topic does not cover how to back up and restore +link:../install_config/persistent_storage/index.html[persistent storage], +as those topics are left to the underlying storage provider. + + +[[backup-restore-prerequisites]] +== Prerequisites + +As the restore procedure involves a +link:#cluster-restore[complete reinstallation], +it is a good idea to save all the files used in the initial installation. +This may include: + +- *_~/.config/openshift/installer.cfg.yml_* (from the +link:../install_config/install/quick_install.html[Quick Installation] +method) +- ansible playbooks and inventory files (from the +link:../install_config/install/advanced_install.html[Advanced Installation] +method) +- *_/etc/yum.repos.d/ose.repo_* (from the +link:../install_config/install/disconnected_install.html[Disconnected Installation] +method) +//??? Other files? + +Install packages that provide various utility commands: + +---- +# yum install etcd +---- + + +[[cluster-backup]] +== Cluster Backup + +. Save all the certificates and keys, on each master: ++ +---- +# cd /etc/origin/master +# tar cf /tmp/certs-and-keys-$(hostname).tar \ + master.proxy-client.crt \ + master.proxy-client.key \ + proxyca.crt \ + proxyca.key \ + master.server.crt \ + master.server.key \ + ca.crt \ + ca.key \ + master.etcd-client.crt \ + master.etcd-client.key \ + master.etcd-ca.crt +---- +//??? What is missing? +//??? What is unnecessary? + +. Create an *etcd* backup: ++ +---- +# etcdctl backup \ + --data-dir /var/lib/origin/etcd \ + --backup-dir /var/lib/origin/etcd.bak +---- ++ +[NOTE] +==== +- If *etcd* is running on more than one host, + the various instances regularly synchronize their data, + so creating a backup for one of them is sufficient. +- If *etcd* is running on an independent host (not on a master), + use *_/var/lib/etcd_* as the `--data-dir` argument, + and *_/var/lib/etcd.bak_* as the `--backup-dir` argument. +==== + +. Create a template for all cluster API objects: ++ +==== +---- +$ oc export all \ + --exact \//<1> + --all-namespaces \ + --as-template=cluster.template +---- +<1> Preserve fields that may be cluster specific, +such as service `portalIP` values or generated names. +==== +//??? pkg/cmd/cli/cmd/export.go line 76 says: +// cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource") +// What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?) + + +[[cluster-restore]] +== Cluster Restore + +//??? (for this section) Is the ordering (API objects, then etcd) correct? + +. Reinstall {product-title}. +//??? Is there a better way to "zero out" the cluster? +This should be done in the +link:../install_config/install/index.html[same way] +that {product-title} was previously installed. + +. Restore the certificates and keys, on each master: ++ +---- +# cd /etc/origin/master +# tar xvf /tmp/certs-and-keys-$(hostname).tar +---- + +. Create the API objects for the cluster: ++ +---- +$ oc process -f cluster.template +---- +//??? Other flags? + +. Restore from the *etcd* backup on each master: ++ +---- +# etcdctl restore \ + --backup-dir /var/lib/origin/etcd.bak \ + --data-dir /var/lib/origin/etcd +---- +//??? This is a guess based on (!). +// What am i missing? + + +// --------------------------------------------------------------------- +//??? Does the cluster need to be "quiescent" for backup/restore/both? +//??? Generally, what are the required conditions for a successful backup/restore? +//??? Are there other considerations for special configurations? +//??? (meta) Is this documentation on the right track? + + +[[project-backup]] +== Project Backup + +A future release of {product-title} will feature specific +support for per-project backup and restore. + +For now, to back up API objects at the project level, +use `oc export` for each object to be saved. +For example, to save the deployment configuration `frontend` in YAML format: + +---- +$ oc export dc frontend -o yaml > dc-frontend.yaml +---- + +//??? Scare quotes in next sentence because annotations are not included. +// For this reason, i don't want to include it... +// +// To back up "all" of the project: +// +// ---- +// $ oc export all -o yaml > project.yaml +// ---- From 1356de541dd208ccb1090bcb6c8010b7d231364f Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Thu, 7 Jul 2016 19:59:57 +0200 Subject: [PATCH 02/11] =?UTF-8?q?Rework=20to=20=E2=80=98s/As=20...=20it=20?= =?UTF-8?q?is=20a=20good=20idea/Because=20...=20we=20recommend/=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- admin_guide/backup_restore.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index be45b8061439..2a02d80a60db 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -49,9 +49,9 @@ as those topics are left to the underlying storage provider. [[backup-restore-prerequisites]] == Prerequisites -As the restore procedure involves a +Because the restore procedure involves a link:#cluster-restore[complete reinstallation], -it is a good idea to save all the files used in the initial installation. +we recommend that you save all the files used in the initial installation. This may include: - *_~/.config/openshift/installer.cfg.yml_* (from the From 6c91b61b6fb38b6bedc2917562e4525245f2d40e Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Thu, 7 Jul 2016 20:02:04 +0200 Subject: [PATCH 03/11] Replace comma overuse w/ parens --- admin_guide/backup_restore.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index 2a02d80a60db..cb308ac22ea5 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -29,8 +29,8 @@ toc::[] == Overview In {product-title}, you can -_back up_, saving state to separate storage, -and _restore_, recreating state from separate storage, +_back up_ (saving state to separate storage) +and _restore_ (recreating state from separate storage) at the cluster level. There is also some preliminary support for xref:project-backup[per-project backup]. From 6a595d33c3d6aec6cc7175c4cb8594ca95768f91 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Thu, 7 Jul 2016 20:42:56 +0200 Subject: [PATCH 04/11] Add table to describe the various ETCD_DATA_DIR possibilities --- admin_guide/backup_restore.adoc | 35 +++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index cb308ac22ea5..b9aa4610f321 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -71,6 +71,24 @@ Install packages that provide various utility commands: # yum install etcd ---- +Note the location of the *etcd* data directory +(or `$ETCD_DATA_DIR` in the following sections), +which depends on how *etcd* is deployed. + +[options="header",cols="1,2"] +|=== +| Deployment | Data Directory + +|all-in-one VM +|*_/var/lib/openshift/openshift.local.etcd_* + +|external (not on master) +|*_/var/lib/etcd_* + +|embedded (on master) +|*_/var/lib/origin/etcd_* +|=== + [[cluster-backup]] == Cluster Backup @@ -99,18 +117,15 @@ Install packages that provide various utility commands: + ---- # etcdctl backup \ - --data-dir /var/lib/origin/etcd \ - --backup-dir /var/lib/origin/etcd.bak + --data-dir $ETCD_DATA_DIR \ + --backup-dir $ETCD_DATA_DIR.bak ---- + [NOTE] ==== -- If *etcd* is running on more than one host, - the various instances regularly synchronize their data, - so creating a backup for one of them is sufficient. -- If *etcd* is running on an independent host (not on a master), - use *_/var/lib/etcd_* as the `--data-dir` argument, - and *_/var/lib/etcd.bak_* as the `--backup-dir` argument. +If *etcd* is running on more than one host, +the various instances regularly synchronize their data, +so creating a backup for one of them is sufficient. ==== . Create a template for all cluster API objects: @@ -159,8 +174,8 @@ $ oc process -f cluster.template + ---- # etcdctl restore \ - --backup-dir /var/lib/origin/etcd.bak \ - --data-dir /var/lib/origin/etcd + --backup-dir $ETCD_DATA_DIR.bak \ + --data-dir $ETCD_DATA_DIR ---- //??? This is a guess based on (!). // What am i missing? From 0935ca37de3e9b139e47623e233adb48119e8412 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Thu, 7 Jul 2016 21:06:30 +0200 Subject: [PATCH 05/11] =?UTF-8?q?Replace=20=E2=80=98etcdctl=20restore?= =?UTF-8?q?=E2=80=99=20w/=20commands=20snarfed=20from=20downgrading=20doc?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- admin_guide/backup_restore.adoc | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index b9aa4610f321..c5aab3872aed 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -163,22 +163,26 @@ that {product-title} was previously installed. # tar xvf /tmp/certs-and-keys-$(hostname).tar ---- -. Create the API objects for the cluster: +. Restore from the *etcd* backup: + ---- -$ oc process -f cluster.template +# mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig +# cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR +# chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR +# chown -R etcd:etcd $ETCD_DATA_DIR ---- -//??? Other flags? +// etcd 3.x will support: +// # etcdctl restore \ +// --backup-dir $ETCD_DATA_DIR.bak \ +// --data-dir $ETCD_DATA_DIR +// See also: -. Restore from the *etcd* backup on each master: +. Create the API objects for the cluster: + ---- -# etcdctl restore \ - --backup-dir $ETCD_DATA_DIR.bak \ - --data-dir $ETCD_DATA_DIR +$ oc process -f cluster.template ---- -//??? This is a guess based on (!). -// What am i missing? +//??? Other flags? // --------------------------------------------------------------------- From 108a3f572a00b236a0580c1c32163b96c0cd2c26 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Thu, 21 Jul 2016 19:02:22 +0200 Subject: [PATCH 06/11] =?UTF-8?q?Fix=20=E2=80=98oc=20export=20--as-templat?= =?UTF-8?q?e=E2=80=99=20usage?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The ‘--as-template’ arg is the ‘metadata.name’ of the Template object. This change also adds a filename, w/ extension .yaml, to capture the output. --- admin_guide/backup_restore.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index c5aab3872aed..e1e55c242b42 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -135,10 +135,12 @@ so creating a backup for one of them is sufficient. $ oc export all \ --exact \//<1> --all-namespaces \ - --as-template=cluster.template + --as-template=mycluster \//<2> + > mycluster.template.yaml ---- <1> Preserve fields that may be cluster specific, such as service `portalIP` values or generated names. +<2> The output file has `kind: Template` and `metadata.name: mycluster`. ==== //??? pkg/cmd/cli/cmd/export.go line 76 says: // cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource") From 190a6b79df1811cde840e5e5637319e820cbfeb7 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Mon, 25 Jul 2016 09:37:44 +0200 Subject: [PATCH 07/11] =?UTF-8?q?fix=20cluster=20restore=20command:=20use?= =?UTF-8?q?=20=E2=80=98oc=20create=E2=80=99;=20use=20same=20name=20as=20sa?= =?UTF-8?q?ved=20file?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- admin_guide/backup_restore.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index e1e55c242b42..e79cde2af163 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -182,7 +182,7 @@ that {product-title} was previously installed. . Create the API objects for the cluster: + ---- -$ oc process -f cluster.template +$ oc create -f mycluster.template.yaml ---- //??? Other flags? From 5116b5a90166af7a87838a6a7bbeb6bb92dcccbb Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Mon, 25 Jul 2016 12:26:56 +0200 Subject: [PATCH 08/11] =?UTF-8?q?add=20IMPORTANT=20w/=20(WIP)=20fine-grain?= =?UTF-8?q?ed=20list=20of=20objects=20included=20in=20=E2=80=98oc=20export?= =?UTF-8?q?=20all=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- admin_guide/backup_restore.adoc | 47 ++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index e79cde2af163..0ec065241868 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -142,10 +142,55 @@ $ oc export all \ such as service `portalIP` values or generated names. <2> The output file has `kind: Template` and `metadata.name: mycluster`. ==== ++ +[IMPORTANT] +==== //??? pkg/cmd/cli/cmd/export.go line 76 says: // cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource") // What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?) - +The object types included in `oc export all` are: + +* BuildConfig +* Build +* ??? componentstatuses (aka 'cs') +* ??? configmaps +* ??? daemonsets (aka 'ds') +* DeploymentConfig +* ??? deployments +* ??? events (aka 'ev') +* ??? endpoints (aka 'ep') +* ??? horizontalpodautoscalers (aka 'hpa') +* ??? imagestreamimages (aka 'isimage') +* ImageStream +* ImageStreamTag +* ??? ingress (aka 'ing') +* ??? groups +* ??? jobs +* ??? limitranges (aka 'limits') +* ??? nodes (aka 'no') +* ??? namespaces (aka 'ns') +* Pod +* ??? persistentvolumes (aka 'pv') +* ??? persistentvolumeclaims (aka 'pvc') +* ??? policies +* ??? projects +* ??? quota +* ??? resourcequotas (aka 'quota') +* ??? replicasets (aka 'rs') +* ReplicationController +* ??? rolebindings +* Route +* ??? secrets +* ??? serviceaccounts +* Service +* ??? users + +*NB: WIP* + +The above list is made from playing w/ the docs' team OSE 3.2 instance. +We still need to further rationalize (and reconcile) it w/ +link:https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113[this comment]. +==== [[cluster-restore]] == Cluster Restore From aac6fcc27fc77e1e33b41b1c4462ccb3145e6089 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Tue, 26 Jul 2016 14:46:25 +0200 Subject: [PATCH 09/11] Use imperative for "save initial installation files" step --- admin_guide/backup_restore.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index 0ec065241868..b729c9160256 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -51,7 +51,7 @@ as those topics are left to the underlying storage provider. Because the restore procedure involves a link:#cluster-restore[complete reinstallation], -we recommend that you save all the files used in the initial installation. +save all the files used in the initial installation. This may include: - *_~/.config/openshift/installer.cfg.yml_* (from the From 141ba9a4988f0394ccdd05540371932977f55925 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Tue, 26 Jul 2016 15:19:22 +0200 Subject: [PATCH 10/11] =?UTF-8?q?Add=20(conditional)=20step=20to=20stop=20?= =?UTF-8?q?etcd=20prior=20to=20=E2=80=98etcdctl=20backup=E2=80=99?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- admin_guide/backup_restore.adoc | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index b729c9160256..c9dc23d07b9c 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -113,6 +113,15 @@ which depends on how *etcd* is deployed. //??? What is missing? //??? What is unnecessary? +. If *etcd* is running on more than one host, stop it on each host: ++ +---- +# sudo systemctl stop etcd +---- ++ +Although this step is not strictly necessary, +doing so ensures that the *etcd* data is fully synchronized. + . Create an *etcd* backup: + ---- From af7cbd5a9cdc8b179ec8bd83fb7530f162bd2541 Mon Sep 17 00:00:00 2001 From: Thien-Thi Nguyen Date: Fri, 29 Jul 2016 16:03:14 +0200 Subject: [PATCH 11/11] Update list of "oc export all" object types --- admin_guide/backup_restore.adoc | 70 +++++++++++++++++---------------- 1 file changed, 36 insertions(+), 34 deletions(-) diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc index c9dc23d07b9c..bedaa166d698 100644 --- a/admin_guide/backup_restore.adoc +++ b/admin_guide/backup_restore.adoc @@ -159,40 +159,42 @@ such as service `portalIP` values or generated names. // What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?) The object types included in `oc export all` are: -* BuildConfig -* Build -* ??? componentstatuses (aka 'cs') -* ??? configmaps -* ??? daemonsets (aka 'ds') -* DeploymentConfig -* ??? deployments -* ??? events (aka 'ev') -* ??? endpoints (aka 'ep') -* ??? horizontalpodautoscalers (aka 'hpa') -* ??? imagestreamimages (aka 'isimage') -* ImageStream -* ImageStreamTag -* ??? ingress (aka 'ing') -* ??? groups -* ??? jobs -* ??? limitranges (aka 'limits') -* ??? nodes (aka 'no') -* ??? namespaces (aka 'ns') -* Pod -* ??? persistentvolumes (aka 'pv') -* ??? persistentvolumeclaims (aka 'pvc') -* ??? policies -* ??? projects -* ??? quota -* ??? resourcequotas (aka 'quota') -* ??? replicasets (aka 'rs') -* ReplicationController -* ??? rolebindings -* Route -* ??? secrets -* ??? serviceaccounts -* Service -* ??? users +---- +y BuildConfig +y Build +no-? componentstatuses (aka 'cs') +no-4 configmaps +no-? daemonsets (aka 'ds') +y DeploymentConfig +no-? deployments +no-4 events (aka 'ev') +no-4 endpoints (aka 'ep') +no-2 horizontalpodautoscalers (aka 'hpa') +no-1 imagestreamimages (aka 'isimage') +y ImageStream +y ImageStreamTag +no-? ingress (aka 'ing') +no-2 groups +no-? jobs +no-2 limitranges (aka 'limits') +no-? nodes (aka 'no') +no-1 namespaces (aka 'ns') +y Pod +no-? persistentvolumes (aka 'pv') +no-3 persistentvolumeclaims (aka 'pvc') +no-2 policies +no-1 projects +no-2 quota +no-2 resourcequotas (aka 'quota') +no-? replicasets (aka 'rs') +y ReplicationController +no-2 rolebindings +y Route +no-3 secrets +no-2 serviceaccounts +y Service +no-2 users +---- *NB: WIP*