openshift · tnguyen-rh · May 18, 2016 · Jul 7, 2016 · Jul 7, 2016 · Jul 7, 2016
diff --git a/_build_cfg.yml b/_build_cfg.yml
@@ -441,6 +441,8 @@ Topics:
   - Name: Building Dependency Trees
     File: building_dependency_trees
     Distros: openshift-origin,openshift-enterprise
+  - Name: Backup and Restore
+    File: backup_restore
   - Name: Troubleshooting Networking
     File: sdn_troubleshooting
     Distros: openshift-origin,openshift-enterprise

diff --git a/admin_guide/backup_restore.adoc b/admin_guide/backup_restore.adoc
@@ -0,0 +1,274 @@
+= Backup and Restore
+{product-author}
+{product-version}
+:data-uri:
+:icons: font
+:experimental:
+:toc: macro
+:toc-title:
+:prewrap!:
+
+toc::[]
+
+
+// REVIEWERS: READ THIS!
+//
+// In the following text, there are questions of the form:
+//   //??? QUESTION
+// Please feel free to make a line-comment to answer them, in addition
+// to any other (line-)comments on the correctness of the text.
+//
+// - Usually, the question pertains to the text preceding it.
+//   Questions pertaining to following text are explicitly noted.
+//
+// - There are bunch of questions at the end.
+//
+// Thanks for your cooperation on this (experimental) method of
+// refining the documentation.  Hopefully it will bear good fruit.
+
+== Overview
+
+In {product-title}, you can
+_back up_ (saving state to separate storage)
+and _restore_ (recreating state from separate storage)
+at the cluster level.
+There is also some preliminary support for
+xref:project-backup[per-project backup].
+The full state of a cluster installation includes:
+
+- etcd data on each master
+- API objects
+- registry storage
+- volume storage
+
+This topic does not cover how to back up and restore
+link:../install_config/persistent_storage/index.html[persistent storage],
+as those topics are left to the underlying storage provider.
+
+
+[[backup-restore-prerequisites]]
+== Prerequisites
+
+Because the restore procedure involves a
+link:#cluster-restore[complete reinstallation],
+save all the files used in the initial installation.
+This may include:
+
+- *_~/.config/openshift/installer.cfg.yml_* (from the
+link:../install_config/install/quick_install.html[Quick Installation]
+method)
+- ansible playbooks and inventory files (from the
+link:../install_config/install/advanced_install.html[Advanced Installation]
+method)
+- *_/etc/yum.repos.d/ose.repo_* (from the
+link:../install_config/install/disconnected_install.html[Disconnected Installation]
+method)
+//??? Other files?
+
+Install packages that provide various utility commands:
+
+----
+# yum install etcd
+----
+
+Note the location of the *etcd* data directory
+(or `$ETCD_DATA_DIR` in the following sections),
+which depends on how *etcd* is deployed.
+
+[options="header",cols="1,2"]
+|===
+| Deployment | Data Directory
+
+|all-in-one VM
+|*_/var/lib/openshift/openshift.local.etcd_*
+
+|external (not on master)
+|*_/var/lib/etcd_*
+
+|embedded (on master)
+|*_/var/lib/origin/etcd_*
+|===
+
+
+[[cluster-backup]]
+== Cluster Backup
+
+. Save all the certificates and keys, on each master:
++
+----
+# cd /etc/origin/master
+# tar cf /tmp/certs-and-keys-$(hostname).tar \
+    master.proxy-client.crt \
+    master.proxy-client.key \
+    proxyca.crt \
+    proxyca.key \
+    master.server.crt \
+    master.server.key \
+    ca.crt \
+    ca.key \
+    master.etcd-client.crt \
+    master.etcd-client.key \
+    master.etcd-ca.crt
+----
+//??? What is missing?
+//??? What is unnecessary?
+
+. If *etcd* is running on more than one host, stop it on each host:
++
+----
+# sudo systemctl stop etcd
+----
++
+Although this step is not strictly necessary,
+doing so ensures that the *etcd* data is fully synchronized.
+
+. Create an *etcd* backup:
++
+----
+# etcdctl backup \
+    --data-dir $ETCD_DATA_DIR \
+    --backup-dir $ETCD_DATA_DIR.bak
+----
++
+[NOTE]
+====
+If *etcd* is running on more than one host,
+the various instances regularly synchronize their data,
+so creating a backup for one of them is sufficient.
+====
+
+. Create a template for all cluster API objects:
++
+====
+----
+$ oc export all \
+    --exact \//<1>
+    --all-namespaces \
+    --as-template=mycluster \//<2>
+    > mycluster.template.yaml
+----
+<1> Preserve fields that may be cluster specific,
+such as service `portalIP` values or generated names.
+<2> The output file has `kind: Template` and `metadata.name: mycluster`.
+====
++
+[IMPORTANT]
+====
+//??? pkg/cmd/cli/cmd/export.go line 76 says:
+//    cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource")
+//    What does "deprecated" mean for the user?  (Can ‘all’ be used, anyway?)
+The object types included in `oc export all` are:
+
+----
+y	BuildConfig
+y	Build
+no-?	componentstatuses (aka 'cs')
+no-4	configmaps
+no-?	daemonsets (aka 'ds')
+y	DeploymentConfig
+no-?	deployments
+no-4	events (aka 'ev')
+no-4	endpoints (aka 'ep')
+no-2	horizontalpodautoscalers (aka 'hpa')
+no-1	imagestreamimages (aka 'isimage')
+y	ImageStream
+y	ImageStreamTag
+no-?	ingress (aka 'ing')
+no-2	groups
+no-?	jobs
+no-2	limitranges (aka 'limits')
+no-?	nodes (aka 'no')
+no-1	namespaces (aka 'ns')
+y	Pod
+no-?	persistentvolumes (aka 'pv')
+no-3	persistentvolumeclaims (aka 'pvc')
+no-2	policies
+no-1	projects
+no-2	quota
+no-2	resourcequotas (aka 'quota')
+no-?	replicasets (aka 'rs')
+y	ReplicationController
+no-2	rolebindings
+y	Route
+no-3	secrets
+no-2	serviceaccounts
+y	Service
+no-2	users
+----
+
+*NB: WIP*
+
+The above list is made from playing w/ the docs' team OSE 3.2 instance.
+We still need to further rationalize (and reconcile) it w/
+link:https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113[this comment].
+====
+
+[[cluster-restore]]
+== Cluster Restore
+
+//??? (for this section) Is the ordering (API objects, then etcd) correct?
+
+. Reinstall {product-title}.
+//??? Is there a better way to "zero out" the cluster?
+This should be done in the
+link:../install_config/install/index.html[same way]
+that {product-title} was previously installed.
+
+. Restore the certificates and keys, on each master:
++
+----
+# cd /etc/origin/master
+# tar xvf /tmp/certs-and-keys-$(hostname).tar
+----
+
+. Restore from the *etcd* backup:
++
+----
+# mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig
+# cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR
+# chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR
+# chown -R etcd:etcd $ETCD_DATA_DIR
+----
+// etcd 3.x will support:
+// # etcdctl restore \
+//     --backup-dir $ETCD_DATA_DIR.bak \
+//     --data-dir $ETCD_DATA_DIR
+// See also: <https://lwn.net/Articles/631630/>
+
+. Create the API objects for the cluster:
++
+----
+$ oc create -f mycluster.template.yaml
+----
+//??? Other flags?
+
+
+// ---------------------------------------------------------------------
+//??? Does the cluster need to be "quiescent" for backup/restore/both?
+//??? Generally, what are the required conditions for a successful backup/restore?
+//??? Are there other considerations for special configurations?
+//??? (meta) Is this documentation on the right track?
+
+
+[[project-backup]]
+== Project Backup
+
+A future release of {product-title} will feature specific
+support for per-project backup and restore.
+
+For now, to back up API objects at the project level,
+use `oc export` for each object to be saved.
+For example, to save the deployment configuration `frontend` in YAML format:
+
+----
+$ oc export dc frontend -o yaml > dc-frontend.yaml
+----
+
+//??? Scare quotes in next sentence because annotations are not included.
+//    For this reason, i don't want to include it...
+//
+// To back up "all" of the project:
+//
+// ----
+// $ oc export all -o yaml > project.yaml
+// ----