diff --git a/_build_cfg.yml b/_build_cfg.yml index c17028aa7e83..a411e29a9122 100644 --- a/_build_cfg.yml +++ b/_build_cfg.yml @@ -441,6 +441,8 @@ Topics: - Name: Building Dependency Trees File: building_dependency_trees Distros: openshift-origin,openshift-enterprise + - Name: Backup and Restore + File: backup_restore - Name: Troubleshooting Networking File: sdn_troubleshooting Distros: openshift-origin,openshift-enterprise diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc new file mode 100644 index 000000000000..bedaa166d698 --- /dev/null +++ b/admin_guide/backup_restore.adoc @@ -0,0 +1,274 @@ += Backup and Restore +{product-author} +{product-version} +:data-uri: +:icons: font +:experimental: +:toc: macro +:toc-title: +:prewrap!: + +toc::[] + + +// REVIEWERS: READ THIS! +// +// In the following text, there are questions of the form: +// //??? QUESTION +// Please feel free to make a line-comment to answer them, in addition +// to any other (line-)comments on the correctness of the text. +// +// - Usually, the question pertains to the text preceding it. +// Questions pertaining to following text are explicitly noted. +// +// - There are bunch of questions at the end. +// +// Thanks for your cooperation on this (experimental) method of +// refining the documentation. Hopefully it will bear good fruit. + +== Overview + +In {product-title}, you can +_back up_ (saving state to separate storage) +and _restore_ (recreating state from separate storage) +at the cluster level. +There is also some preliminary support for +xref:project-backup[per-project backup]. +The full state of a cluster installation includes: + +- etcd data on each master +- API objects +- registry storage +- volume storage + +This topic does not cover how to back up and restore +link:../install_config/persistent_storage/index.html[persistent storage], +as those topics are left to the underlying storage provider. + + +[[backup-restore-prerequisites]] +== Prerequisites + +Because the restore procedure involves a +link:#cluster-restore[complete reinstallation], +save all the files used in the initial installation. +This may include: + +- *_~/.config/openshift/installer.cfg.yml_* (from the +link:../install_config/install/quick_install.html[Quick Installation] +method) +- ansible playbooks and inventory files (from the +link:../install_config/install/advanced_install.html[Advanced Installation] +method) +- *_/etc/yum.repos.d/ose.repo_* (from the +link:../install_config/install/disconnected_install.html[Disconnected Installation] +method) +//??? Other files? + +Install packages that provide various utility commands: + +---- +# yum install etcd +---- + +Note the location of the *etcd* data directory +(or `$ETCD_DATA_DIR` in the following sections), +which depends on how *etcd* is deployed. + +[options="header",cols="1,2"] +|=== +| Deployment | Data Directory + +|all-in-one VM +|*_/var/lib/openshift/openshift.local.etcd_* + +|external (not on master) +|*_/var/lib/etcd_* + +|embedded (on master) +|*_/var/lib/origin/etcd_* +|=== + + +[[cluster-backup]] +== Cluster Backup + +. Save all the certificates and keys, on each master: ++ +---- +# cd /etc/origin/master +# tar cf /tmp/certs-and-keys-$(hostname).tar \ + master.proxy-client.crt \ + master.proxy-client.key \ + proxyca.crt \ + proxyca.key \ + master.server.crt \ + master.server.key \ + ca.crt \ + ca.key \ + master.etcd-client.crt \ + master.etcd-client.key \ + master.etcd-ca.crt +---- +//??? What is missing? +//??? What is unnecessary? + +. If *etcd* is running on more than one host, stop it on each host: ++ +---- +# sudo systemctl stop etcd +---- ++ +Although this step is not strictly necessary, +doing so ensures that the *etcd* data is fully synchronized. + +. Create an *etcd* backup: ++ +---- +# etcdctl backup \ + --data-dir $ETCD_DATA_DIR \ + --backup-dir $ETCD_DATA_DIR.bak +---- ++ +[NOTE] +==== +If *etcd* is running on more than one host, +the various instances regularly synchronize their data, +so creating a backup for one of them is sufficient. +==== + +. Create a template for all cluster API objects: ++ +==== +---- +$ oc export all \ + --exact \//<1> + --all-namespaces \ + --as-template=mycluster \//<2> + > mycluster.template.yaml +---- +<1> Preserve fields that may be cluster specific, +such as service `portalIP` values or generated names. +<2> The output file has `kind: Template` and `metadata.name: mycluster`. +==== ++ +[IMPORTANT] +==== +//??? pkg/cmd/cli/cmd/export.go line 76 says: +// cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource") +// What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?) +The object types included in `oc export all` are: + +---- +y BuildConfig +y Build +no-? componentstatuses (aka 'cs') +no-4 configmaps +no-? daemonsets (aka 'ds') +y DeploymentConfig +no-? deployments +no-4 events (aka 'ev') +no-4 endpoints (aka 'ep') +no-2 horizontalpodautoscalers (aka 'hpa') +no-1 imagestreamimages (aka 'isimage') +y ImageStream +y ImageStreamTag +no-? ingress (aka 'ing') +no-2 groups +no-? jobs +no-2 limitranges (aka 'limits') +no-? nodes (aka 'no') +no-1 namespaces (aka 'ns') +y Pod +no-? persistentvolumes (aka 'pv') +no-3 persistentvolumeclaims (aka 'pvc') +no-2 policies +no-1 projects +no-2 quota +no-2 resourcequotas (aka 'quota') +no-? replicasets (aka 'rs') +y ReplicationController +no-2 rolebindings +y Route +no-3 secrets +no-2 serviceaccounts +y Service +no-2 users +---- + +*NB: WIP* + +The above list is made from playing w/ the docs' team OSE 3.2 instance. +We still need to further rationalize (and reconcile) it w/ +link:https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113[this comment]. +==== + +[[cluster-restore]] +== Cluster Restore + +//??? (for this section) Is the ordering (API objects, then etcd) correct? + +. Reinstall {product-title}. +//??? Is there a better way to "zero out" the cluster? +This should be done in the +link:../install_config/install/index.html[same way] +that {product-title} was previously installed. + +. Restore the certificates and keys, on each master: ++ +---- +# cd /etc/origin/master +# tar xvf /tmp/certs-and-keys-$(hostname).tar +---- + +. Restore from the *etcd* backup: ++ +---- +# mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig +# cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR +# chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR +# chown -R etcd:etcd $ETCD_DATA_DIR +---- +// etcd 3.x will support: +// # etcdctl restore \ +// --backup-dir $ETCD_DATA_DIR.bak \ +// --data-dir $ETCD_DATA_DIR +// See also: + +. Create the API objects for the cluster: ++ +---- +$ oc create -f mycluster.template.yaml +---- +//??? Other flags? + + +// --------------------------------------------------------------------- +//??? Does the cluster need to be "quiescent" for backup/restore/both? +//??? Generally, what are the required conditions for a successful backup/restore? +//??? Are there other considerations for special configurations? +//??? (meta) Is this documentation on the right track? + + +[[project-backup]] +== Project Backup + +A future release of {product-title} will feature specific +support for per-project backup and restore. + +For now, to back up API objects at the project level, +use `oc export` for each object to be saved. +For example, to save the deployment configuration `frontend` in YAML format: + +---- +$ oc export dc frontend -o yaml > dc-frontend.yaml +---- + +//??? Scare quotes in next sentence because annotations are not included. +// For this reason, i don't want to include it... +// +// To back up "all" of the project: +// +// ---- +// $ oc export all -o yaml > project.yaml +// ----