Skip to content


Merge pull request #2952 from csrwng/etcd_recovery
Browse files Browse the repository at this point in the history
HOSTEDCP-1075: Document instructions for recovering etcd cluster from lost quorum
  • Loading branch information
openshift-merge-robot committed Sep 11, 2023
2 parents d5d15ac + 886d520 commit 74718b2
Showing 1 changed file with 185 additions and 2 deletions.
187 changes: 185 additions & 2 deletions docs/content/how-to/
Expand Up @@ -9,7 +9,7 @@ Etcd pods for hosted clusters run as part of a statefulset (etcd). The statefuls
Execute into a running etcd pod:

$ oc rsh etcd-0
$ oc rsh -n ${CONTROL_PLANE_NAMESPACE} -c etcd etcd-0

Setup the etcdctl environment:
Expand All @@ -32,7 +32,7 @@ etcdctl endpoint health --cluster -w table
If a single etcd member of a 3-node cluster has corrupted data, it will most likely start crash looping, as in:

$ oc get pods -l app=etcd -n $CONTROL_PLANE_NAMESPACE
$ oc get pods -l app=etcd -n ${CONTROL_PLANE_NAMESPACE}
etcd-0 2/2 Running 0 64m
etcd-1 2/2 Running 0 45m
Expand All @@ -53,4 +53,187 @@ NAME READY STATUS RESTARTS AGE
etcd-0 2/2 Running 0 67m
etcd-1 2/2 Running 0 48m
etcd-2 2/2 Running 0 2m2s

### Recovery from Quorum Loss

If multiple members of the etcd cluster have lost data or are in a crashloop state, then etcd must be restored from a snapshot. The following procedure requires down time for the control plane as the etcd database is restored.

NOTE: The following instructions require the `oc` and `jq` binaries.

0. Setup environment variables that point to your hosted cluster:


1. Pause reconciliation on the HostedCluster (setting CLUSTER_NAME and CLUSTER_NAMESPACE to values that correspond to your hosted cluster):

oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge

2. Scale down API servers:

oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0

3. Take a snapshot of etcd data using one of the following methods:

a. Use a previously backed up snapshot

b. Take a snapshot from a running etcd pod (PREFERRED but requires available etcd pod):

# List etcd pods
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd

# If a pod is available:

# 1. take a snapshot of its database and save it locally
# Set ETCD_POD to the name of the pod that is available
oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl \
--cacert /etc/etcd/tls/etcd-ca/ca.crt \
--cert /etc/etcd/tls/client/etcd-client.crt \
--key /etc/etcd/tls/client/etcd-client.key \
--endpoints=https://localhost:2379 \
snapshot save /var/lib/snapshot.db

# 2. Verify that the snapshot is good
oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db
# 3. Make a local copy of the snapshot
oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db

c. Make a copy of the snapshot db from etcd persistent storage:

# List etcd pods
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd

# Find a pod that is running and set its name as the value of ETCD_POD

# Copy the snapshot db from it
oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db

4. Scale down the etcd statefulset:

oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0

5. Delete volumes for 2nd and 3rd members:
oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2

5. Create pod to access the first etcd member's data:

# Save etcd image
ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')
# Create pod that will allow access to etcd data:
cat << EOF | oc apply -n ${CONTROL_PLANE_NAMESPACE} -f -
apiVersion: apps/v1
kind: Deployment
name: etcd-data
replicas: 1
app: etcd-data
app: etcd-data
- name: access
image: $ETCD_IMAGE
- name: data
mountPath: /var/lib
- /usr/bin/bash
- -c
- |-
while true; do
sleep 1000
- name: data
claimName: data-etcd-0

6. Clear previous data and restore snapshot

# Wait for the etcd-data pod to start running
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
# Get the name of the etcd-data pod
DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)
# Copy local snapshot into the pod
oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db
# Remove old data
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data
# Restore snapshot
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- etcdutl snapshot restore /var/lib/restored.snap.db \
--data-dir=/var/lib/data --skip-hash-check \
--name etcd-0 \
--initial-cluster-token=etcd-cluster \
--initial-cluster etcd-0=https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-1=https://etcd-1.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-2=https://etcd-2.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380 \
--initial-advertise-peer-urls https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380
# Remove snapshot from etcd-0 data directory
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db

7. Delete data access deployment:

oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data

8. Scale up etcd cluster:
oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3

Wait for the all etcd member pods to come up and report available:
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w

9. Scale apiservers back up:

oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=3
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=3
oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=3

10. Remove hosted cluster pause:

oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge

0 comments on commit 74718b2

Please sign in to comment.