-
Couldn't load subscription status.
- Fork 1.8k
[WIP] Bug 1259544 -- document backup/restore #2140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1ec6649
1356de5
6c91b61
6a595d3
0935ca3
108a3f5
190a6b7
5116b5a
aac6fcc
141ba9a
af7cbd5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,274 @@ | ||
| = Backup and Restore | ||
| {product-author} | ||
| {product-version} | ||
| :data-uri: | ||
| :icons: font | ||
| :experimental: | ||
| :toc: macro | ||
| :toc-title: | ||
| :prewrap!: | ||
|
|
||
| toc::[] | ||
|
|
||
|
|
||
| // REVIEWERS: READ THIS! | ||
| // | ||
| // In the following text, there are questions of the form: | ||
| // //??? QUESTION | ||
| // Please feel free to make a line-comment to answer them, in addition | ||
| // to any other (line-)comments on the correctness of the text. | ||
| // | ||
| // - Usually, the question pertains to the text preceding it. | ||
| // Questions pertaining to following text are explicitly noted. | ||
| // | ||
| // - There are bunch of questions at the end. | ||
| // | ||
| // Thanks for your cooperation on this (experimental) method of | ||
| // refining the documentation. Hopefully it will bear good fruit. | ||
|
|
||
| == Overview | ||
|
|
||
| In {product-title}, you can | ||
| _back up_ (saving state to separate storage) | ||
| and _restore_ (recreating state from separate storage) | ||
| at the cluster level. | ||
| There is also some preliminary support for | ||
| xref:project-backup[per-project backup]. | ||
| The full state of a cluster installation includes: | ||
|
|
||
| - etcd data on each master | ||
| - API objects | ||
| - registry storage | ||
| - volume storage | ||
|
||
|
|
||
| This topic does not cover how to back up and restore | ||
| link:../install_config/persistent_storage/index.html[persistent storage], | ||
| as those topics are left to the underlying storage provider. | ||
|
|
||
|
|
||
| [[backup-restore-prerequisites]] | ||
| == Prerequisites | ||
|
|
||
| Because the restore procedure involves a | ||
| link:#cluster-restore[complete reinstallation], | ||
| save all the files used in the initial installation. | ||
| This may include: | ||
|
|
||
| - *_~/.config/openshift/installer.cfg.yml_* (from the | ||
| link:../install_config/install/quick_install.html[Quick Installation] | ||
| method) | ||
| - ansible playbooks and inventory files (from the | ||
| link:../install_config/install/advanced_install.html[Advanced Installation] | ||
| method) | ||
| - *_/etc/yum.repos.d/ose.repo_* (from the | ||
| link:../install_config/install/disconnected_install.html[Disconnected Installation] | ||
| method) | ||
| //??? Other files? | ||
|
|
||
| Install packages that provide various utility commands: | ||
|
|
||
| ---- | ||
| # yum install etcd | ||
| ---- | ||
|
|
||
| Note the location of the *etcd* data directory | ||
| (or `$ETCD_DATA_DIR` in the following sections), | ||
| which depends on how *etcd* is deployed. | ||
|
|
||
| [options="header",cols="1,2"] | ||
| |=== | ||
| | Deployment | Data Directory | ||
|
|
||
| |all-in-one VM | ||
| |*_/var/lib/openshift/openshift.local.etcd_* | ||
|
|
||
| |external (not on master) | ||
| |*_/var/lib/etcd_* | ||
|
|
||
| |embedded (on master) | ||
| |*_/var/lib/origin/etcd_* | ||
| |=== | ||
|
|
||
|
|
||
| [[cluster-backup]] | ||
| == Cluster Backup | ||
|
|
||
| . Save all the certificates and keys, on each master: | ||
| + | ||
| ---- | ||
| # cd /etc/origin/master | ||
| # tar cf /tmp/certs-and-keys-$(hostname).tar \ | ||
| master.proxy-client.crt \ | ||
| master.proxy-client.key \ | ||
| proxyca.crt \ | ||
| proxyca.key \ | ||
| master.server.crt \ | ||
| master.server.key \ | ||
| ca.crt \ | ||
| ca.key \ | ||
| master.etcd-client.crt \ | ||
| master.etcd-client.key \ | ||
| master.etcd-ca.crt | ||
| ---- | ||
| //??? What is missing? | ||
| //??? What is unnecessary? | ||
|
|
||
| . If *etcd* is running on more than one host, stop it on each host: | ||
| + | ||
| ---- | ||
| # sudo systemctl stop etcd | ||
| ---- | ||
| + | ||
| Although this step is not strictly necessary, | ||
| doing so ensures that the *etcd* data is fully synchronized. | ||
|
|
||
| . Create an *etcd* backup: | ||
| + | ||
| ---- | ||
| # etcdctl backup \ | ||
| --data-dir $ETCD_DATA_DIR \ | ||
| --backup-dir $ETCD_DATA_DIR.bak | ||
| ---- | ||
| + | ||
| [NOTE] | ||
| ==== | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quick feedback. One of the users asked us if this etcd backup can be done without stopping service or not. I feel that it will be FAQ, so could you please add it to the doc? NOTE: Although I thought There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. QA team says alright in https://bugzilla.redhat.com/show_bug.cgi?id=1259544#c55, so it is good to me. |
||
| If *etcd* is running on more than one host, | ||
| the various instances regularly synchronize their data, | ||
| so creating a backup for one of them is sufficient. | ||
| ==== | ||
|
|
||
| . Create a template for all cluster API objects: | ||
| + | ||
| ==== | ||
| ---- | ||
| $ oc export all \ | ||
| --exact \//<1> | ||
| --all-namespaces \ | ||
| --as-template=mycluster \//<2> | ||
| > mycluster.template.yaml | ||
| ---- | ||
| <1> Preserve fields that may be cluster specific, | ||
| such as service `portalIP` values or generated names. | ||
| <2> The output file has `kind: Template` and `metadata.name: mycluster`. | ||
| ==== | ||
| + | ||
| [IMPORTANT] | ||
| ==== | ||
| //??? pkg/cmd/cli/cmd/export.go line 76 says: | ||
| // cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource") | ||
| // What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?) | ||
| The object types included in `oc export all` are: | ||
|
|
||
| ---- | ||
| y BuildConfig | ||
| y Build | ||
| no-? componentstatuses (aka 'cs') | ||
| no-4 configmaps | ||
| no-? daemonsets (aka 'ds') | ||
| y DeploymentConfig | ||
| no-? deployments | ||
| no-4 events (aka 'ev') | ||
| no-4 endpoints (aka 'ep') | ||
| no-2 horizontalpodautoscalers (aka 'hpa') | ||
| no-1 imagestreamimages (aka 'isimage') | ||
| y ImageStream | ||
| y ImageStreamTag | ||
| no-? ingress (aka 'ing') | ||
| no-2 groups | ||
| no-? jobs | ||
| no-2 limitranges (aka 'limits') | ||
| no-? nodes (aka 'no') | ||
| no-1 namespaces (aka 'ns') | ||
| y Pod | ||
| no-? persistentvolumes (aka 'pv') | ||
| no-3 persistentvolumeclaims (aka 'pvc') | ||
| no-2 policies | ||
| no-1 projects | ||
| no-2 quota | ||
| no-2 resourcequotas (aka 'quota') | ||
| no-? replicasets (aka 'rs') | ||
| y ReplicationController | ||
| no-2 rolebindings | ||
| y Route | ||
| no-3 secrets | ||
| no-2 serviceaccounts | ||
| y Service | ||
| no-2 users | ||
| ---- | ||
|
|
||
| *NB: WIP* | ||
|
|
||
| The above list is made from playing w/ the docs' team OSE 3.2 instance. | ||
| We still need to further rationalize (and reconcile) it w/ | ||
| link:https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113[this comment]. | ||
| ==== | ||
|
|
||
| [[cluster-restore]] | ||
| == Cluster Restore | ||
|
|
||
| //??? (for this section) Is the ordering (API objects, then etcd) correct? | ||
|
|
||
| . Reinstall {product-title}. | ||
| //??? Is there a better way to "zero out" the cluster? | ||
| This should be done in the | ||
| link:../install_config/install/index.html[same way] | ||
| that {product-title} was previously installed. | ||
|
|
||
| . Restore the certificates and keys, on each master: | ||
| + | ||
| ---- | ||
| # cd /etc/origin/master | ||
| # tar xvf /tmp/certs-and-keys-$(hostname).tar | ||
| ---- | ||
|
|
||
| . Restore from the *etcd* backup: | ||
| + | ||
| ---- | ||
| # mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig | ||
| # cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR | ||
| # chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR | ||
| # chown -R etcd:etcd $ETCD_DATA_DIR | ||
| ---- | ||
| // etcd 3.x will support: | ||
| // # etcdctl restore \ | ||
| // --backup-dir $ETCD_DATA_DIR.bak \ | ||
| // --data-dir $ETCD_DATA_DIR | ||
| // See also: <https://lwn.net/Articles/631630/> | ||
|
|
||
| . Create the API objects for the cluster: | ||
| + | ||
| ---- | ||
| $ oc create -f mycluster.template.yaml | ||
| ---- | ||
| //??? Other flags? | ||
|
|
||
|
|
||
| // --------------------------------------------------------------------- | ||
| //??? Does the cluster need to be "quiescent" for backup/restore/both? | ||
| //??? Generally, what are the required conditions for a successful backup/restore? | ||
|
||
| //??? Are there other considerations for special configurations? | ||
| //??? (meta) Is this documentation on the right track? | ||
|
|
||
|
|
||
| [[project-backup]] | ||
| == Project Backup | ||
|
|
||
| A future release of {product-title} will feature specific | ||
| support for per-project backup and restore. | ||
|
|
||
| For now, to back up API objects at the project level, | ||
| use `oc export` for each object to be saved. | ||
| For example, to save the deployment configuration `frontend` in YAML format: | ||
|
|
||
| ---- | ||
| $ oc export dc frontend -o yaml > dc-frontend.yaml | ||
| ---- | ||
|
|
||
| //??? Scare quotes in next sentence because annotations are not included. | ||
| // For this reason, i don't want to include it... | ||
| // | ||
| // To back up "all" of the project: | ||
| // | ||
| // ---- | ||
| // $ oc export all -o yaml > project.yaml | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does not backup cluster objects like namespaces, projects, and other cluster objects. So if you go to restore you will get errors where first the namespace needs to be created. |
||
| // ---- | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a lot of the "backup"s need to be "back up":
http://grammarist.com/usage/back-up-backup/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bfallonf Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bfallonf Good catch. I've converted the verb-context "backup" to "back up".