Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions _build_cfg.yml
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,8 @@ Topics:
- Name: Building Dependency Trees
File: building_dependency_trees
Distros: openshift-origin,openshift-enterprise
- Name: Backup and Restore
File: backup_restore
- Name: Troubleshooting Networking
File: sdn_troubleshooting
Distros: openshift-origin,openshift-enterprise
Expand Down
274 changes: 274 additions & 0 deletions admin_guide/backup_restore.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,274 @@
= Backup and Restore

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of the "backup"s need to be "back up":

http://grammarist.com/usage/back-up-backup/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bfallonf Good catch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bfallonf Good catch. I've converted the verb-context "backup" to "back up".

{product-author}
{product-version}
:data-uri:
:icons: font
:experimental:
:toc: macro
:toc-title:
:prewrap!:

toc::[]


// REVIEWERS: READ THIS!
//
// In the following text, there are questions of the form:
// //??? QUESTION
// Please feel free to make a line-comment to answer them, in addition
// to any other (line-)comments on the correctness of the text.
//
// - Usually, the question pertains to the text preceding it.
// Questions pertaining to following text are explicitly noted.
//
// - There are bunch of questions at the end.
//
// Thanks for your cooperation on this (experimental) method of
// refining the documentation. Hopefully it will bear good fruit.

== Overview

In {product-title}, you can
_back up_ (saving state to separate storage)
and _restore_ (recreating state from separate storage)
at the cluster level.
There is also some preliminary support for
xref:project-backup[per-project backup].
The full state of a cluster installation includes:

- etcd data on each master
- API objects
- registry storage
- volume storage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bullet/listed items should be capitalized, with maybe an exception for etcd


This topic does not cover how to back up and restore
link:../install_config/persistent_storage/index.html[persistent storage],
as those topics are left to the underlying storage provider.


[[backup-restore-prerequisites]]
== Prerequisites

Because the restore procedure involves a
link:#cluster-restore[complete reinstallation],
save all the files used in the initial installation.
This may include:

- *_~/.config/openshift/installer.cfg.yml_* (from the
link:../install_config/install/quick_install.html[Quick Installation]
method)
- ansible playbooks and inventory files (from the
link:../install_config/install/advanced_install.html[Advanced Installation]
method)
- *_/etc/yum.repos.d/ose.repo_* (from the
link:../install_config/install/disconnected_install.html[Disconnected Installation]
method)
//??? Other files?

Install packages that provide various utility commands:

----
# yum install etcd
----

Note the location of the *etcd* data directory
(or `$ETCD_DATA_DIR` in the following sections),
which depends on how *etcd* is deployed.

[options="header",cols="1,2"]
|===
| Deployment | Data Directory

|all-in-one VM
|*_/var/lib/openshift/openshift.local.etcd_*

|external (not on master)
|*_/var/lib/etcd_*

|embedded (on master)
|*_/var/lib/origin/etcd_*
|===


[[cluster-backup]]
== Cluster Backup

. Save all the certificates and keys, on each master:
+
----
# cd /etc/origin/master
# tar cf /tmp/certs-and-keys-$(hostname).tar \
master.proxy-client.crt \
master.proxy-client.key \
proxyca.crt \
proxyca.key \
master.server.crt \
master.server.key \
ca.crt \
ca.key \
master.etcd-client.crt \
master.etcd-client.key \
master.etcd-ca.crt
----
//??? What is missing?
//??? What is unnecessary?

. If *etcd* is running on more than one host, stop it on each host:
+
----
# sudo systemctl stop etcd
----
+
Although this step is not strictly necessary,
doing so ensures that the *etcd* data is fully synchronized.

. Create an *etcd* backup:
+
----
# etcdctl backup \
--data-dir $ETCD_DATA_DIR \
--backup-dir $ETCD_DATA_DIR.bak
----
+
[NOTE]
====
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick feedback. One of the users asked us if this etcd backup can be done without stopping service or not. I feel that it will be FAQ, so could you please add it to the doc?

NOTE: Although I thought etcdctl backup can do live backup, etcd's documentation stopped service.
https://coreos.com/etcd/docs/latest/etcd-live-cluster-reconfiguration.html#etcd-disaster-recovery-on-coreos

Let’s assume a 3-node cluster with no living members. First, stop the etcd2 service on all the members:

$ sudo systemctl stop etcd2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nak3
Thanks for the tip. I have installed:
141ba9a
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA team says alright in https://bugzilla.redhat.com/show_bug.cgi?id=1259544#c55, so it is good to me.

If *etcd* is running on more than one host,
the various instances regularly synchronize their data,
so creating a backup for one of them is sufficient.
====

. Create a template for all cluster API objects:
+
====
----
$ oc export all \
--exact \//<1>
--all-namespaces \
--as-template=mycluster \//<2>
> mycluster.template.yaml
----
<1> Preserve fields that may be cluster specific,
such as service `portalIP` values or generated names.
<2> The output file has `kind: Template` and `metadata.name: mycluster`.
====
+
[IMPORTANT]
====
//??? pkg/cmd/cli/cmd/export.go line 76 says:
// cmd.Flags().Bool("all", true, "DEPRECATED: all is ignored, specifying a resource without a name selects all the instances of that resource")
// What does "deprecated" mean for the user? (Can ‘all’ be used, anyway?)
The object types included in `oc export all` are:

----
y BuildConfig
y Build
no-? componentstatuses (aka 'cs')
no-4 configmaps
no-? daemonsets (aka 'ds')
y DeploymentConfig
no-? deployments
no-4 events (aka 'ev')
no-4 endpoints (aka 'ep')
no-2 horizontalpodautoscalers (aka 'hpa')
no-1 imagestreamimages (aka 'isimage')
y ImageStream
y ImageStreamTag
no-? ingress (aka 'ing')
no-2 groups
no-? jobs
no-2 limitranges (aka 'limits')
no-? nodes (aka 'no')
no-1 namespaces (aka 'ns')
y Pod
no-? persistentvolumes (aka 'pv')
no-3 persistentvolumeclaims (aka 'pvc')
no-2 policies
no-1 projects
no-2 quota
no-2 resourcequotas (aka 'quota')
no-? replicasets (aka 'rs')
y ReplicationController
no-2 rolebindings
y Route
no-3 secrets
no-2 serviceaccounts
y Service
no-2 users
----

*NB: WIP*

The above list is made from playing w/ the docs' team OSE 3.2 instance.
We still need to further rationalize (and reconcile) it w/
link:https://github.com/kubernetes/kubernetes/pull/28955#issuecomment-232737113[this comment].
====

[[cluster-restore]]
== Cluster Restore

//??? (for this section) Is the ordering (API objects, then etcd) correct?

. Reinstall {product-title}.
//??? Is there a better way to "zero out" the cluster?
This should be done in the
link:../install_config/install/index.html[same way]
that {product-title} was previously installed.

. Restore the certificates and keys, on each master:
+
----
# cd /etc/origin/master
# tar xvf /tmp/certs-and-keys-$(hostname).tar
----

. Restore from the *etcd* backup:
+
----
# mv $ETCD_DATA_DIR $ETCD_DATA_DIR.orig
# cp -Rp $ETCD_DATA_DIR.bak $ETCD_DATA_DIR
# chcon -R --reference $ETCD_DATA_DIR.orig $ETCD_DATA_DIR
# chown -R etcd:etcd $ETCD_DATA_DIR
----
// etcd 3.x will support:
// # etcdctl restore \
// --backup-dir $ETCD_DATA_DIR.bak \
// --data-dir $ETCD_DATA_DIR
// See also: <https://lwn.net/Articles/631630/>

. Create the API objects for the cluster:
+
----
$ oc create -f mycluster.template.yaml
----
//??? Other flags?


// ---------------------------------------------------------------------
//??? Does the cluster need to be "quiescent" for backup/restore/both?
//??? Generally, what are the required conditions for a successful backup/restore?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is QE testing arround this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sferich888
The QA contact on the BZ was originally @vikram-redhat and now @xltian -- however, because this is a sort of "non-feature", i'm not sure if there is/was a testing plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There needs to be a test plan for this, before we publish.
On May 26, 2016 6:52 PM, "Thien-Thi Nguyen" notifications@github.com
wrote:

In admin_guide/backup_restore.adoc
#2140 (comment)
:

+//??? Other flags?
+
+. Restore from the etcd backup on each master:
++
+----
+# etcdctl restore \

  • --backup-dir /var/lib/origin/openshift.local.etcd.bak \
  • --data-dir /var/lib/origin/openshift.local.etcd
    +----
    +//??? This is a guess based on https://lwn.net/Articles/631630/ (!).
    +// What am i missing?

+// ---------------------------------------------------------------------
+//??? Does the cluster need to be "quiescent" for backup/restore/both?
+//??? Generally, what are the required conditions for a successful backup/restore?

@sferich888 https://github.com/sferich888
The QA contact on the BZ was originally @vikram-redhat
https://github.com/vikram-redhat and now @xltian
https://github.com/xltian -- however, because this is a sort of
"non-feature", i'm not sure if there is/was a testing plan.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/openshift/openshift-docs/pull/2140/files/503f520bb05a7700f9587f3835174762f0b9c43c#r64834607,
or mute the thread
https://github.com/notifications/unsubscribe/ABUZmgl_ZwE4gvkndwlJRMFOgBFAayNVks5qFiQggaJpZM4IjZF_
.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 this needs to be QE tested on a multi-master environment with 50+ projects, containing many different types of objects.

//??? Are there other considerations for special configurations?
//??? (meta) Is this documentation on the right track?


[[project-backup]]
== Project Backup

A future release of {product-title} will feature specific
support for per-project backup and restore.

For now, to back up API objects at the project level,
use `oc export` for each object to be saved.
For example, to save the deployment configuration `frontend` in YAML format:

----
$ oc export dc frontend -o yaml > dc-frontend.yaml
----

//??? Scare quotes in next sentence because annotations are not included.
// For this reason, i don't want to include it...
//
// To back up "all" of the project:
//
// ----
// $ oc export all -o yaml > project.yaml
Copy link

@rjhowe rjhowe Aug 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not backup cluster objects like namespaces, projects, and other cluster objects.

So if you go to restore you will get errors where first the namespace needs to be created.

// ----